The system is not build for this question.1. I have an existing db with ~ 2 millions of pages. How to fetch a urlfile only, without fetching the last fest inserted new links?
You might try to use a url-filter which only allows the urls from your urlfile and then create a new segment. If the time between your last crawl and now is not to big (smaller default fetch interval) then this could work.
2. How to get the number of pages in DB?
try /bin/nutch readdb -dumppageurl and count the lines (unix: wc)
Matthias -- http://www.eventax.com - eventax GmbH http://www.umkreisfinder.de - Die Suchmaschine für Lokales und Events
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
