Re: UpdateDb job fails everytime

Sebastian Nagel Thu, 15 Sep 2016 04:42:08 -0700

Sorry, the correct link is:
https://issues.apache.org/jira/browse/NUTCH


On 09/15/2016 01:34 PM, Sebastian Nagel wrote:
> Hi,
> 
> this looks like a bug in Nutch 2.x.
> 
> Please, open an issue at http://issues.apache.org/jira/NUTCH
> and add information about the exact Nutch version and the
> configuration.  Invalid URLs should normally be filtered out
> or corrected by URL normalizers during the parsing step.
> 
> Thanks,
> Sebastian
> 
> On 09/15/2016 08:58 AM, shubham.gupta wrote:
>> Hey,
>>
>> Whenever the update job is executed the following errors occur:
>>
>> INFO mapreduce.Job: Task Id : attempt_1473832356852_0104_m_000000_2, Status 
>> : FAILED
>> Error: java.net.MalformedURLException: no protocol:
>> http%3A%2F%2Fwww.smh.com.au%2Fact-news%2Fcanberra-weather-warm-april-expected-after-record-breaking-march-temperatures-20160401-gnw2pg.html&title=Canberra+weather%3A+warm+April+expected+after+record+breaking+March+temperatures&source=The+Sydney+Morning+Herald&summary=Canberra+can+expect+warmer+than+average+temperatures+to+continue+for+April+after+enjoying+its+equal+second+warmest+March+on+record
>>
>>     at java.net.URL.<init>(URL.java:586)
>>     at java.net.URL.<init>(URL.java:483)
>>     at java.net.URL.<init>(URL.java:432)
>>     at org.apache.nutch.util.TableUtil.reverseUrl(TableUtil.java:43)
>>     at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:96)
>>     at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:38)
>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>     at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>
>>
>> Job Counters
>>         Failed map tasks=4
>>         Launched map tasks=4
>>         Other local map tasks=4
>>         Total time spent by all maps in occupied slots (ms)=417438
>>         Total time spent by all reduces in occupied slots (ms)=0
>>         Total time spent by all map tasks (ms)=59634
>>         Total vcore-seconds taken by all map tasks=59634
>>         Total megabyte-seconds taken by all map tasks=213012648
>> Exception in thread "main" java.lang.RuntimeException: job failed: 
>> name=[]update-table,
>> jobid=job_1473832356852_0104
>>     at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
>>     at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:111)
>>     at org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:140)
>>     at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:174)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>     at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:178)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>     at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>
>> This leads to no new updation of urls in the corresponding tables.
>> Please help.
>> Thanks in advance
>>
>

Re: UpdateDb job fails everytime

Reply via email to