Maybe u problem caused by this issue [0]

or you can refer to this

http://permalink.gmane.org/gmane.comp.search.nutch.devel/36673

hopes that can help you.

[0] https://issues.apache.org/jira/browse/NUTCH-1182


On Fri, May 10, 2013 at 2:56 PM, Tejas Patil <[email protected]>wrote:

> Hey Urs,
> Please see the logs/hadoop.log file and share the all the stack traces of
> the exception.
> The current stack trace you shared doesn't highlight the actual problem. It
> just hints that the fetch job failed.
>
>
> On Thu, May 9, 2013 at 11:31 PM, AC Nutch <[email protected]> wrote:
>
> > If I'm not mistaken 140 threads is way way way on the high side. Unless
> you
> > have some massive servers, I can't see them handling that. I can barely
> get
> > my servers to handle more than 15ish. Perhaps try decreasing that and see
> > if that fixes the issue.
> >
> > Alex
> >
> >
> > On Thu, May 9, 2013 at 7:17 PM, Urs Hofer <[email protected]> wrote:
> >
> > > Dear Feng Lu
> > >
> > > I'm not sure, but the problem is more the last exception:
> > >
> > > >>> java.io.IOException: Job failed!
> > > >>>        at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
> > > >>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1332)
> > > >>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1368)
> > > >>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > >>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1341)
> > >
> > > The others might me annoying, but this one does stop the execution of
> the
> > > script…
> > >
> > > Best
> > > urs
> > >
> > >
> > >
> > >
> > >
> > > Am 09.05.2013 um 17:14 schrieb feng lu <[email protected]>:
> > >
> > > > Hi Urs
> > > >
> > > > Did u use Nutch 1.6?
> > > >
> > > > <
> > > > 2013-05-09 03:57:56,967 WARN  fetcher.Fetcher - Attempting to finish
> > item
> > > > from unknown queue: org.apache.nutch.fetcher.
> > > > Fetcher$FetchItem@307322f4
> > > >>
> > > >
> > > > This cause by call the FetchItemQueues#finishFetchItem method, but
> > > current
> > > > queues can not find the queueID of the FetchItem, one reason is that
> > the
> > > > queue is deleted by reap empty queues in FetchItemQueues#getFetchItem
> > > > method when that queue is empty, because the FetchItem is unblocked
> > when
> > > it
> > > > crawl finished, but when after that throw an Exception , that
> FetchItem
> > > > will be unblock again. but this time, that queue was emptyed. so it
> > will
> > > > throw this WARN.
> > > >
> > > > <
> > > > 2013-05-09 03:57:56,969 ERROR fetcher.Fetcher - fetcher
> > caught:java.lang.
> > > > NullPointerException
> > > >>
> > > >
> > > > There are two places can log this exception, one is in Fetcher#output
> > > > method, another is in Fetcher#run method. see your log order , maybe
> it
> > > is
> > > > log by Fetcher#run. I can not found any reason that can cause this
> NPE.
> > > Can
> > > > you reproduce this Exception and provide more detailed log.
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > On Thu, May 9, 2013 at 6:45 PM, Tejas Patil <
> [email protected]
> > > >wrote:
> > > >
> > > >> Hey Urs,
> > > >> Please see the logs/hadoop.log file and share the stack trace of the
> > > >> exception.
> > > >>
> > > >>
> > > >> On Thu, May 9, 2013 at 3:36 AM, Urs Hofer <[email protected]>
> wrote:
> > > >>
> > > >>> Dear List
> > > >>>
> > > >>> i'm currently running the nutch crawl script with solr4.
> > > >>> additionally, I'm using the urlmeta plugin and I'm parsing keyword
> > and
> > > >>> description
> > > >>> metadata as well. the crawl script runs in local mode. currently,
> I'm
> > > >>> seeding about 500 domains.
> > > >>>
> > > >>> the crawl script runs without problems the first time. on a
> recrawl,
> > it
> > > >>> dies with the error
> > > >>>
> > > >>> 2013-05-09 03:57:56,967 WARN  fetcher.Fetcher - Attempting to
> finish
> > > item
> > > >>> from unknown queue:
> > org.apache.nutch.fetcher.Fetcher$FetchItem@307322f4
> > > >>> 2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread
> > > >>> FetcherThread, activeThreads=27
> > > >>> 2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread
> > > >>> FetcherThread, activeThreads=1
> > > >>> 2013-05-09 03:57:56,969 ERROR fetcher.Fetcher - fetcher
> > > >>> caught:java.lang.NullPointerException
> > > >>> 2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread
> > > >>> FetcherThread, activeThreads=2
> > > >>> 2013-05-09 03:57:56,969 WARN  fetcher.Fetcher - Attempting to
> finish
> > > item
> > > >>> from unknown queue:
> > org.apache.nutch.fetcher.Fetcher$FetchItem@504e3c43
> > > >>> 2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread
> > > >>> FetcherThread, activeThreads=0
> > > >>> 2013-05-09 03:57:57,942 ERROR fetcher.Fetcher - Fetcher:
> > > >>> java.io.IOException: Job failed!
> > > >>>        at
> > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
> > > >>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1332)
> > > >>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1368)
> > > >>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > >>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1341)
> > > >>>
> > > >>>
> > > >>> after the fetcher part.
> > > >>>
> > > >>> I have a very liberal regex-urlfilter configuration:
> > > >>>
> > > >>> -^(file|ftp|mailto):
> > > >>>
> > > >>>
> > > >>
> > >
> >
> -\.(mp3|MP3|gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|ppt|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP|js|JS)$
> > > >>> +.
> > > >>>
> > > >>> But I am restricting the crawl to db.ignore.external.links = true
> > > >>> Can it be because I've removed the line
> > > >>>
> > > >>> # skip URLs containing certain characters as probable queries, etc.
> > > >>> -.*[?*!@=].*
> > > >>>
> > > >>> in regex-urlfilter? I got several seed entries like
> > > index.php?language=fr
> > > >>>
> > > >>> Since I'm new to nutch, I don't know where to search and how to
> > > continue.
> > > >>> Thanks for any help
> > > >>> Best regards
> > > >>> Urs Hofer
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Don't Grow Old, Grow Up... :-)
> > >
> > >
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to