Hello Kamil,
Do you want to generate a fetchlist with urls that are present in WebDB
but where not fetched till now?
I am not sure what you are trying to achive but, you can generate any
fetchlist you want using latest tool by Andrzej Bialecki
(http://issues.apache.org/jira/browse/NUTCH-68) (have not tried it myself).
There was also (some time ago) discussion on the nutch mailing list
about refetchonly param for fetchlist generator - some ideas are still
not implemented but you can read how it works currently.
Regards
Piotr
Kamil Wnuk wrote:
Hi All,
I have recently started using nutch and I am looking for a method of
prioritizing urls injected during an ongoing crawl process (similar to
the "whole-web crawl" scenario described in the tutorial) so that they
are guaranteed to be included at the top of the next fetchlist
generated. The purpose of this is so that I can give nutch the urls
of newly created web pages that I want indexed as quickly as possible.
I have looked through the nutch documentation and the mailing list
archives and have not been able to find a solution. Does a good
method for doing this exist?
Thanks in advance,
Kamil
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general