Hi, I haven't been watching nutch development progress for some time (so my answer may not be accurate) but I don't think there is such a tool/report. Anyway, your contribution would be warmly welcomed! :-)
On the other hand, based on your short description of features you are looking for, my personal opinion is that you are looking for tool which should provide exact information about something that is very variable (mutable) in its nature and heavy dependend on Nutch setup. For example the size on parsed document (for example html document) can be limited to specific size. So can be the number of links extracted from document ... etc,etc ... Such variables have fatal impact on the crawl result and thus on the resul of your report as well. Just my 2 cents. Regards, Lukas On 12/2/06, karthik085 <[EMAIL PROTECTED]> wrote:
Hello, How do I check that all pages have been fetched? Is there a command or tool, that says like: these are the number of pages in the website, the number of pages fetched, pages filtered... give a report. If errors, how many and give a brief description... I understand analyzing log and readdb with stats/dumppageurl is one option. But, it is time consuming and requires unwanted manual work. If there is a tool/command that did the above option, I could just easily parse the report for my web services. -- View this message in context: http://www.nabble.com/Nutch-Data-Testing-tf2742246.html#a7651128 Sent from the Nutch - User mailing list archive at Nabble.com.
