Hi 

Is there anyway to improve nutch's crawling of keywords that might seem
irrelevant.  For example I'm crawling a set of about 30,000 pages.  Each
page has a code on it that is a reference to the product that the page
details.  The code can be anything alphanumeric and any length.  Our
google mini seems to pick them all up (e.g. Put any code in and it finds
the page with it) but nutch doesn't seem to get all of them, especially
the numeric only ones.

Any ideas?

Cheers
Aled

************************************************************************
This e-mail and any attachments are strictly confidential and intended solely 
for the addressee. They may contain information which is covered by legal, 
professional or other privilege. If you are not the intended addressee, you 
must not copy the e-mail or the attachments, or use them for any purpose or 
disclose their contents to any other person. To do so may be unlawful. If you 
have received this transmission in error, please notify us as soon as possible 
and delete the message and attachments from all places in your computer where 
they are stored. 

Although we have scanned this e-mail and any attachments for viruses, it is 
your responsibility to ensure that they are actually virus free.
 

Reply via email to