In my case, the fetch got to that point in 45 minutes and is stuck another
75 minutes with those mappers.
The log just keeps printing:

org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0,
fetchQueues.totalSize=0

org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0,
fetchQueues.totalSize=0

org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0,
fetchQueues.totalSize=0

....



On Wed, Dec 4, 2013 at 4:31 PM, feng lu <[email protected]> wrote:

> I see that it use a while loop to wait for threads to exit and will wait 1
> second between each check. so even if fetcher thread was finished, the
> whole fetcher process will take little longer to exit.
>
> code structure like this.
>
>  do {                                          // wait for threads to exit
>       pagesLastSec = pages.get();
>       bytesLastSec = (int)bytes.get();
>
>       try {
>         Thread.sleep(1000);
>       } catch (InterruptedException e) {}
>
>       ....
>       reportStatus(pagesLastSec, bytesLastSec);   // your print output is
> coming here
>
>       LOG.info("-activeThreads=" + activeThreads + ", spinWaiting=" +
> spinWaiting.get()
>           + ", fetchQueues.totalSize=" + fetchQueues.getTotalSize());
>
>       if (!feeder.isAlive() && fetchQueues.getTotalSize() < 5) {
>         fetchQueues.dump();
>       }
>       ....
>       // check timelimit
>       if (!feeder.isAlive()) {
>         int hitByTimeLimit = fetchQueues.checkTimelimit();
>         if (hitByTimeLimit != 0) reporter.incrCounter("FetcherStatus",
>             "hitByTimeLimit", hitByTimeLimit);
>       }
>
>       // some requests seem to hang, despite all intentions
>       if ((System.currentTimeMillis() - lastRequestStart.get()) > timeout)
> {
>         if (LOG.isWarnEnabled()) {
>           LOG.warn("Aborting with "+activeThreads+" hung threads.");
>         }
>         return;
>       }
>
>     } while (activeThreads.get() > 0);
>
>
> On Wed, Dec 4, 2013 at 7:57 PM, Amit Sela <[email protected]> wrote:
>
> > In the fetch phase, I notice that some of the mappers take much longer to
> > finish.
> > In the running task mapreduce admin screen it shows
> >
> > *1 threads, 1 queues, 0 URLs queued, *
> >
> > So why those tasks are not complete ?
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>

Reply via email to