1. I have an existing db with ~ 2 millions of pages.
   How to fetch a urlfile only, without fetching the last
fest inserted new links?
The system is not build for this question.
You might try to use a url-filter which only allows the urls from your urlfile and then create a new segment. If the time between your last crawl and now is not to big (smaller default fetch interval) then this could work.


2. How to get the number of pages in DB?
try /bin/nutch readdb -dumppageurl
and count the lines (unix: wc)

Matthias
--
http://www.eventax.com - eventax GmbH
http://www.umkreisfinder.de - Die Suchmaschine für Lokales und Events


------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to