Sorry, the correct link is: https://issues.apache.org/jira/browse/NUTCH
On 09/15/2016 01:34 PM, Sebastian Nagel wrote: > Hi, > > this looks like a bug in Nutch 2.x. > > Please, open an issue at http://issues.apache.org/jira/NUTCH > and add information about the exact Nutch version and the > configuration. Invalid URLs should normally be filtered out > or corrected by URL normalizers during the parsing step. > > Thanks, > Sebastian > > On 09/15/2016 08:58 AM, shubham.gupta wrote: >> Hey, >> >> Whenever the update job is executed the following errors occur: >> >> INFO mapreduce.Job: Task Id : attempt_1473832356852_0104_m_000000_2, Status >> : FAILED >> Error: java.net.MalformedURLException: no protocol: >> http%3A%2F%2Fwww.smh.com.au%2Fact-news%2Fcanberra-weather-warm-april-expected-after-record-breaking-march-temperatures-20160401-gnw2pg.html&title=Canberra+weather%3A+warm+April+expected+after+record+breaking+March+temperatures&source=The+Sydney+Morning+Herald&summary=Canberra+can+expect+warmer+than+average+temperatures+to+continue+for+April+after+enjoying+its+equal+second+warmest+March+on+record >> >> at java.net.URL.<init>(URL.java:586) >> at java.net.URL.<init>(URL.java:483) >> at java.net.URL.<init>(URL.java:432) >> at org.apache.nutch.util.TableUtil.reverseUrl(TableUtil.java:43) >> at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:96) >> at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:38) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >> >> >> Job Counters >> Failed map tasks=4 >> Launched map tasks=4 >> Other local map tasks=4 >> Total time spent by all maps in occupied slots (ms)=417438 >> Total time spent by all reduces in occupied slots (ms)=0 >> Total time spent by all map tasks (ms)=59634 >> Total vcore-seconds taken by all map tasks=59634 >> Total megabyte-seconds taken by all map tasks=213012648 >> Exception in thread "main" java.lang.RuntimeException: job failed: >> name=[]update-table, >> jobid=job_1473832356852_0104 >> at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119) >> at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:111) >> at org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:140) >> at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:174) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:178) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) >> >> This leads to no new updation of urls in the corresponding tables. >> Please help. >> Thanks in advance >> >

