I chose Nutch because of its close Lucene integration but you have to realize that it is still beta-ware; I've run into bugs, slow crawls, and most of all, poor documentation. Did you include heritrix in your due diligence? I have a few comments about my decision here: http://nutch.wordpress.com/tag/introduction/
----- Original Message ---- From: Koe Black <[EMAIL PROTECTED]> To: [email protected] Sent: Monday, August 13, 2007 5:02:52 PM Subject: Nudge based custom search engine set-up Hello All, After reading and testing out different lucene based technologies we came to the conclusion of configuring our search engine the following way. *Nudge for webcrawling and indexing (on websites we do not own) *Hadoop for index file system *Nudge API or Solr API for index access for our application. At this point we are not sure about advantages/disadvantages of using Nudge API or Solr API. Any feedback is appreciated. Also, any negative/positive experience using these technologies in production environments is needed. Thank you Armen ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz ____________________________________________________________________________________ Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545433
