Hi C.B., OK, this requires another entry to the wiki page, if you have not already found it, it can be found here [1]. Thanks for pointing this out.
In a nutshell I think this page [2] will give you the best description of your 'unknowns' below. I realise that this may seem like I am passing the buck, but reading this page paying particular attention to this section here [3] should sort the majority of the grey areas. [1] http://wiki.apache.org/nutch/bin/nutch_readdb [2] http://en.wikipedia.org/wiki/HTTP_404 [3] http://en.wikipedia.org/wiki/HTTP_404#Overview e.g. 410 Gone If you would like to update the wiki please do so, if not I will get it sorted shortly. Thanks On Thu, Jul 7, 2011 at 11:04 PM, Cam Bazz <[email protected]> wrote: > Hello, > > When I run the readdb -stats command on my crawldb, I get: > > status 1 (db_unfetched): 199820 > status 2 (db_fetched): 257384 > status 3 (db_gone): 557 > status 4 (db_redir_temp): 40265 > status 5 (db_redir_perm): 6152 > CrawlDb statistics: done > > I understand the db_unfetched and db_fetched, but what are the other stats? > > Best regards, > -C.B. > -- *Lewis*

