Hi I got nutch working on my cluster after making necessary changes to my crawlfilter file. It seems to be working.
I fired a crawl command two days back to the nutch cluster to crawl a list of 20 websites to a depth of 8. As of now I think it's fetching at depth 3 and the segment files generated are almost 70MB in size. I opened the files and I could see valid URLs. The first two fetch phases also took a very long time to complete. This third fetch phase is taking even longer than the first two. Is this normal or is something going terribly wrong? -- Abhijit Bera Associate Software Engineer - Web Enterprise Division Geodesic Information Systems Ltd. Please show concern for the environment. Print this e-mail only if required. I use Ubuntu Linux. --Disclaimer-- This email and any files transmitted with it are confidential and intended solely for the use of the entity to which they are addressed. If you have received this email in error please notify the sender immediately. Please note that any views presented in the email are solely those of the author and do not necessarily represent those of Geodesic. While all care has been taken to avoid viruses the recipient is advised to check this email and attachments for presence of viruses. Geodesic accepts no liability on this account. Mails may be stored for monitoring and review
