Re: Problem crawling in Nutch 0.9

Annona Keene Tue, 15 May 2007 09:49:06 -0700

I didn't set it with the crawl command, and from what I can see in the code, it 
defaults to Integer.MAX_VALUE, which should be more than enough. I'm only 
looking at about 2100 pages

And I never experienced this problem before with 0.8. I've checked the 
nutch-default.xml, and I can't see any settings that would make it fetch a url 
but not index it, especially only at higher depths. 

Any other ideas?

Thanks,
Shawna

----- Original Message ----
From: Briggs <[EMAIL PROTECTED]>
To: [email protected]
Sent: Monday, May 14, 2007 4:18:46 PM
Subject: Re: Problem crawling in Nutch 0.9

Just curious, did you happen to limit the number of urls using the
"topN" switch?

On 5/14/07, Annona Keene <[EMAIL PROTECTED]> wrote:
> I recently upgraded to 0.9, and I've started encountering a problem. I began 
> with a single url and crawled with a depth of 10, assuming I would get every 
> page on my site. This same configuration worked for me in 0.8.  However, I 
> noticed a particular url that I was especially interested in was not in the 
> index. So I added the url explicitly and crawled again. And it still was not 
> in the index. So I checked the logs, and it is being fetched. So I tried a 
> lower depth, and it worked. With a depth of 6, the url does appear in the 
> index. Any ideas on what would be causing this? I'm very confused.
>
> Thanks,
> Ann
>
>
>
>
>
> ____________________________________________________________________________________Pinpoint
>  customers who are looking for what you sell.
> http://searchmarketing.yahoo.com/

-- 
"Conscious decisions by conscious minds are what make reality real"

____________________________________________________________________________________Got
 a little couch potato? 
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz

Re: Problem crawling in Nutch 0.9

Reply via email to