On Monday 26 November 2007, jian chen wrote:
> I think lot of the open source java crawlers are pretty much dead
> projects. They haven't been updated for a long time.

I have had only time to check the jira of Heritrix. That seems pretty alive to 
me. In addition it seems to crawl faster than Nutch - but that is not my 
personal experience. Of course with Heritrix you loose all Nutch post 
processing, indexing and stuff like that.


> 3) Runs in Eclipse directly. No need to install Cygwin.

But there is still the possibility to run the crawler outside eclipse, right? 
After all, why should anyone want to use a crawler in production, that needs 
eclipse running to work?;)

Isabel


-- 
Without freedom of choice there is no creativity.               -- Kirk, "The 
return of the 
Archons", stardate 3157.4
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[EMAIL PROTECTED]>

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to