I didn't set it with the crawl command, and from what I can see in the code, it defaults to Integer.MAX_VALUE, which should be more than enough. I'm only looking at about 2100 pages
And I never experienced this problem before with 0.8. I've checked the nutch-default.xml, and I can't see any settings that would make it fetch a url but not index it, especially only at higher depths. Any other ideas? Thanks, Shawna ----- Original Message ---- From: Briggs <[EMAIL PROTECTED]> To: [email protected] Sent: Monday, May 14, 2007 4:18:46 PM Subject: Re: Problem crawling in Nutch 0.9 Just curious, did you happen to limit the number of urls using the "topN" switch? On 5/14/07, Annona Keene <[EMAIL PROTECTED]> wrote: > I recently upgraded to 0.9, and I've started encountering a problem. I began > with a single url and crawled with a depth of 10, assuming I would get every > page on my site. This same configuration worked for me in 0.8. However, I > noticed a particular url that I was especially interested in was not in the > index. So I added the url explicitly and crawled again. And it still was not > in the index. So I checked the logs, and it is being fetched. So I tried a > lower depth, and it worked. With a depth of 6, the url does appear in the > index. Any ideas on what would be causing this? I'm very confused. > > Thanks, > Ann > > > > > > ____________________________________________________________________________________Pinpoint > customers who are looking for what you sell. > http://searchmarketing.yahoo.com/ -- "Conscious decisions by conscious minds are what make reality real" ____________________________________________________________________________________Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
