On Monday 26 November 2007, jian chen wrote: > I think lot of the open source java crawlers are pretty much dead > projects. They haven't been updated for a long time.
I have had only time to check the jira of Heritrix. That seems pretty alive to me. In addition it seems to crawl faster than Nutch - but that is not my personal experience. Of course with Heritrix you loose all Nutch post processing, indexing and stuff like that. > 3) Runs in Eclipse directly. No need to install Cygwin. But there is still the possibility to run the crawler outside eclipse, right? After all, why should anyone want to use a crawler in production, that needs eclipse running to work?;) Isabel -- Without freedom of choice there is no creativity. -- Kirk, "The return of the Archons", stardate 3157.4 |\ _,,,---,,_ Web: <http://www.isabel-drost.de> /,`.-'`' -. ;-;;,_ |,4- ) )-,_..;\ ( `'-' '---''(_/--' `-'\_) (fL) IM: <xmpp://[EMAIL PROTECTED]>
signature.asc
Description: This is a digitally signed message part.
