|
There is (or there was, it depends on you )
aspseek (www.aspseek.org). This
internet search engine project is now paralyzed, but it function, just out of
the box, with RedHat 9 and even (using legacy software that is part of the
distribution), using the rpm, with Fedora Core 3. Until, at least 1
million sites, and using a single machine, it is quite fast (below 0.3
seconds in an Athlon 2.4). This software has some nice features like the
possibility to stop and restart crawling as you need (you don't need to go "non
stop") , is easy to maintain (in recrawling, it downloads only the
modified sites), highlights the search terms in the cached pages and the
statistics of the operations are very easy to obtain. On the other hand,
the project, as already mentioned, is paralyzed and and unless someone
undertakes it, in some time, with more modern linux and mysql
distributions, at least for us, non programmers, it wil not function
anymore. It also only indexes html files, without the nice plugins of
Nutch.
|
No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.362 / Virus Database: 267.12.8/162 - Release Date: 5/11/2005
