Ah, I see.  Sorry, I mixed up two different people whose names start with "R".
I was referring to generate.max.per.host = 1 from 
https://issues.apache.org/jira/browse/NUTCH-721


So what I described in my previous emails (Queues A, B, C) is how things work 
when you have multiple URLs per host and hosts have different numbers of URLs 
or differ in speed enough to create a scenario like the one I described.  In my 
Nutch experience, this scenario *always* happens.  You'll find my messages 
about this on this list via markmail.org from about 12-13 months ago.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Raymond Balmès <[email protected]>
> To: [email protected]
> Sent: Wednesday, May 27, 2009 9:40:34 AM
> Subject: Re: threads get stuck in spinwaiting
> 
> @otis
> or did you mean nutch/host. I have only one server for my tests.
> 
> @larsson
> my spinwaiting phase is usually less  than 30minutes.
> 
> Something I noticed as well is the speed in the beginning is so fast that I
> can't read the screen. Not sure when the standard.out occurs at start_fetch
> or fetch_complete.
> 
> -Raymond-
> 2009/5/27 Raymond Balmès 
> 
> > I have many URLs per host of course. Need to get all the pages of the
> > sites, don't understand the question.
> >
> > -Raymond
> >
> > 2009/5/26 Otis Gospodnetic 
> >
> >
> >> But how, Ray, if you have only 1 URL per host?
> >>
> >> Otis
> >> --
> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>
> >>
> >>
> >> ----- Original Message ----
> >> > From: Raymond Balmès 
> >> > To: [email protected]
> >> > Sent: Tuesday, May 26, 2009 4:11:27 PM
> >> > Subject: Re: threads get stuck in spinwaiting
> >> >
> >> > Observing what my crawls do, I believe Ken must be right.
> >> > Towards the end of the crawl (when the fetchqueues.totalSize="xxxx"
> >> counts
> >> > down) in some cases I'm only fetching on two sites roughly , so indeed
> >> the
> >> > politeness starts to play a role there at least it should.
> >> >
> >> > -Ray-
> >> >
> >> > 2009/5/26 Raymond Balmès
> >> >
> >> > > Please read this too :
> >> > >
> >> > >
> >> >
> >> 
> http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/
> >> > >
> >> > > Interesting build from ken.
> >> > >
> >> > > 2009/5/26 Raymond Balmès
> >> > >
> >> > >  yes already reported in multiple-threads.
> >> > >> I noted that if one does a "recrawl" you don't get this behavior...
> >> no
> >> > >> idea why.
> >> > >>
> >> > >> -Raymond-
> >> > >>
> >> > >> 2009/5/26 Larsson85
> >> > >>
> >> > >>
> >>  > >>> When I try to do my crawl it seems like the threads get stuck in
> >> som
> >> > >>> spinwaiting mode. At first the crawl goes as planned, and I couldnt
> >> be
> >> > >>> happier. But after som time, it starts reporting more of these
> >> > >>> spinwaiting
> >> > >>> messages.
> >> > >>>
> >> > >>> I print a log here to show you what it looks like. As you can see it
> >> gets
> >> > >>> stuck, and the queue decrease by 1 all the time. I've tried doing a
> >> > >>> smaller
> >> > >>> crawl, and what happends is that it counts down untill the
> >> > >>> "fetchQueues.totalSize" reaches 0, and then the crawl is done.
> >> > >>>
> >> > >>> But the problem is that this countdown is very slow,there's no
> >> effective
> >> > >>> crawling going on, not using eather bandwith or cpu power. Basicly,
> >> this
> >> > >>> costs way to much time, I cant let it go on like this for hours to
> >> be
> >> > >>> done.
> >> > >>> How can I fix this?
> >> > >>>
> >> > >>>
> >> > >>> after about an hour of crawling this is what the log looks like
> >> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
> >> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2526
> >> > >>>  - fetching http://home.swipnet.se/~w-147200/
> >> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
> >> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2525
> >> > >>>  - fetching http://biphome.spray.se/alarsson/
> >> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
> >> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
> >> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2524
> >> > >>>  - fetching http://home.swipnet.se/~w-31853/html/
> >> > >>>  -activeThreads=1000, spinWaiting=1000, fetchQueues.totalSize=2523
> >> > >>>
> >> > >>> ....
> >> > >>>
> >> > >>> --
> >> > >>> View this message in context:
> >> > >>>
> >> >
> >> 
> http://www.nabble.com/threads-get-stuck-in-spinwaiting-tp23723825p23723825.html
> >> > >>> Sent from the Nutch - User mailing list archive at Nabble.com.
> >> > >>>
> >> > >>>
> >> > >>
> >> > >
> >>
> >>
> >

Reply via email to