Doug Cutting wrote:
Xin-Yi Liu wrote:

is there a way to update these anchor texts without
refetching every single page in your web db?


Not at present. I hope to soon add a tool that will do this. It would, given a set of segments, rewrite all of their fetcher/ data to include the db's current score and inlinks for each url.

I have implemented and committed this tool, named UpdateSegmentsFromDb. Please tell me if you have problems with it. It has worked well for me on some small test collections. CrawlTool has been updated to use it.


Probably it should be added to bin/nutch as a named command, and it should also be mentioned in the tutorial.

Cheers,

Doug


------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to