There's 'nutch readdb' command ->

[EMAIL PROTECTED]:~> nutch readdb
Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN
<nnnn> <out_dir> [<min>] | -url <url>)
       <crawldb>       directory name where crawldb is located
       -stats  print overall statistics to System.out
       -dump <out_dir> dump the whole db to a text file in <out_dir>
       -url <url>      print information on <url> to System.out
       -topN <nnnn> <out_dir> [<min>]  dump top <nnnn> urls sorted by
score to <out_dir>
               [<min>] skip records with scores below this value.
                       This can significantly improve performance.

Is this what you're looking for?

Rgrds, Thomas

On 7/25/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Is there any way to find out what web pages on a specific domain have
been crawled by Nutch ?
In other words is there any way to get the list of urls that were
downloaded and processed by Nutch ?


Reply via email to