Hi!

I did a crawl on a single seed for 30 rounds and it has crawled around 16k
seeds. I have checked (readdb -stats) and it showed 2116 seeds as
unfetched. I ran the fetcher again with option 'all' but it does not fetch
anything and the unfetched list remains same.

I have dumped only the fields (baseURL, status, protocolStatus) and can be
found at (
https://raw.github.com/salvager/NutchDev/master/runtime/local/table_fields/part-r-00000
).

The file clearly shows that urls with status 1 have the protocolStatus(NOT
FOUND). Those seeds are never moved to status (db_gone) that is status 3 if
i am correct.

Did anyone had a similar problem ? Any ideas on how to fix it ?

PS : I have made patch which dumps only particular fields through command
line (Example: ./bin/nutch readdb -dump table_fields -fields
"status,protocolStatus"). baseUrl is dumped by default along with other
fields requested. I can upload if anyone is interested.


Thanks,

-- 
Kiran Chitturi

Reply via email to