I am evaluating nutch+lucene as a crawl and search solution.

However, I am finding major bugs in nutch right off the bat.

In particular, NUTCH-119: nutch is not crawling relative URLs.  I have some 
discussion of it here:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg08644.html

Most of the links off www.variety.com, one of my main test sites, have relative 
URLs.  It seems incredible that nutch, which is capable of mapreduce, cannot 
fetch these URLs.

It could be that I would fix this bug if, for other reasons, I decide to go 
with nutch+lucene.  Has anyone tried fixing this problem?  Is it intractable?  
Or are the developers, who are just volunteers anyway, more interested in 
fixing other problems?

Could someone outline the issue for me a bit more clearly so I would know how 
to evaluate it?




      
____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the 
Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/ 
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to