But how, Ray, if you have only 1 URL per host?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Raymond Balmès <[email protected]>
> To: [email protected]
> Sent: Tuesday, May 26, 2009 4:11:27 PM
> Subject: Re: threads get stuck in spinwaiting
> 
> Observing what my crawls do, I believe Ken must be right.
> Towards the end of the crawl (when the fetchqueues.totalSize="xxxx" counts
> down) in some cases I'm only fetching on two sites roughly , so indeed the
> politeness starts to play a role there at least it should.
> 
> -Ray-
> 
> 2009/5/26 Raymond Balmès 
> 
> > Please read this too :
> >
> > 
> http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/
> >
> > Interesting build from ken.
> >
> > 2009/5/26 Raymond Balmès 
> >
> >  yes already reported in multiple-threads.
> >> I noted that if one does a "recrawl" you don't get this behavior... no
> >> idea why.
> >>
> >> -Raymond-
> >>
> >> 2009/5/26 Larsson85 
> >>
> >>
> >>> When I try to do my crawl it seems like the threads get stuck in som
> >>> spinwaiting mode. At first the crawl goes as planned, and I couldnt be
> >>> happier. But after som time, it starts reporting more of these
> >>> spinwaiting
> >>> messages.
> >>>
> >>> I print a log here to show you what it looks like. As you can see it gets
> >>> stuck, and the queue decrease by 1 all the time. I've tried doing a
> >>> smaller
> >>> crawl, and what happends is that it counts down untill the
> >>> "fetchQueues.totalSize" reaches 0, and then the crawl is done.
> >>>
> >>> But the problem is that this countdown is very slow,there's no effective
> >>> crawling going on, not using eather bandwith or cpu power. Basicly, this
> >>> costs way to much time, I cant let it go on like this for hours to be
> >>> done.
> >>> How can I fix this?
> >>>
> >>>
> >>> after about an hour of crawling this is what the log looks like
> >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
> >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
> >>>  - fetching http://home.swipnet.se/~w-147200/
> >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
> >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
> >>>  - fetching http://biphome.spray.se/alarsson/
> >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
> >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
> >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
> >>>  - fetching http://home.swipnet.se/~w-31853/html/
> >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2523
> >>>
> >>> ....
> >>>
> >>> --
> >>> View this message in context:
> >>> 
> http://www.nabble.com/threads-get-stuck-in-spinwaiting-tp23723825p23723825.html
> >>> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >

Reply via email to