> How can I see all the webpages nutch crawled? In other words, I want to
> know which urls nutch has crawled.
>
> Are all the urls ever crawled stored in crawlDB?
Run /usr/local/nutch/bin/nutch readdb with the dump
option and it will dump all the urls out into a new
directory and you can peruse it at your leisure.
Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn>
<out_dir> [<min>] | -url <url>)
<crawldb> directory name where crawldb is located
-stats print overall statistics to System.out
-dump <out_dir> dump the whole db to a text file in <out_dir>
-url <url> print information on <url> to System.out
-topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by score to
<out_dir>
[<min>] skip records with scores below this value.
This can significantly improve performance.
Or, you can write your own class that outputs
whatever you want from the database...
JohnM
--
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services