Re: Indexing Nutch 1.11 indexing Fails

Jason S Sun, 24 Jan 2016 13:53:36 -0800

Hi Sebastian!

I agree its a weird problem, but interestingly, I got it to work on a
cluster with hadoop-1.2.1. This has exposed a fundamental misunderstanding
on my part about how Nutch and Hadoop work, as I was under the impression
that Nutch needed to be run on exactly the same version of Hadoop as the
version of the hadoop jar it is distributed with, but after I got
everything else except indexing to run on 2.7.1, I started wondering if it
would work on an older version and it did.  Perhaps this is related to
something specific about the 2.x.x branch of Hadoop?


I did have a problem with the -addBinaryContent -base64 flags, which is
what necessitated the upgrade, but I changed the Solr field from "binary"
to "string" and everything worked ok.  Thats also strange, because the
binary field works ok in local mode.

org.apache.solr.common.SolrException: ERROR:
[doc=http://example.com/index.html] Error adding field
'binaryContent'='abc123'


I would like to help with the Nutch project, i'm frustrated that I don't
really know enough about java to contribute code or understand whats going
on inside, but if someone is interested, I could give you direct access to
the cluster where I am getting this error, I used a very "out of the box"
configuration to get these errors:

Ubuntu 14 on a Digital Ocean server. Tried open jdk 7 and 8, as well as
binary from Oracle.
Hadoop 2.4.0, 2.4.1 and 2.7.1 were all tried with same behavior, binary or
built with maven, same issue.
Set up according to the "Pseudo distributed" tutorial on hadoop website.
Lib Snappy or not, no difference.
Nutch 1.11, built with ant, installed ant with apt.
Copied schema.xml and mapping file to Solr config folder.
Set crawler name and solr url in config file.
Ran a crawl. (I used dmoz.org)

Shouldn't be hard to reproduce.

I don't specifically need Hadoop 2, as the cluster is dedicated for
crawling, but I can see the advantages of Hadoop 2.  For me, the issue is
resolved and I will continue to use 1.2.1, unless I run into some other
issue.

Thank you very much for your help, get in touch by PM if you want to have a
look in the box.

Jason









On Sun, Jan 24, 2016 at 8:07 PM, Sebastian Nagel <wastl.na...@googlemail.com
> wrote:

> Hi Jason,
>
> sorry, that was a misunderstanding: the patch of NUTCH-2191
> will not fix your problem. But Markus mentioned in the discussion
> that he has to remove http* jars to fix dependency problems.
> What I want to say is that our plugin system does not provide
> complete isolation although every plugin has its own class loader.
>
> However, your problem seems really weired. After a look into the
> code of httpcore where the exception is raised:
>
>
> https://github.com/apache/httpcore/blob/4.3.x/httpcore/src/main/java/org/apache/http/impl/io/DefaultHttpRequestWriterFactory.java#L52
>
> The field INSTANCE is referenced and should be defined there:
>
>
> https://github.com/apache/httpcore/blob/4.3.x/httpcore/src/main/java/org/apache/http/message/BasicLineFormatter.java#L65
>
> Older versions (4.2.x) are missing this field:
>
> https://github.com/apache/httpcore/blob/4.2.x/httpcore/src/main/java/org/apache/http/message/BasicLineFormatter.java
>
> It's the same library (httpcore)! It's hardly possible that two class
> files are taken from different versions of the same library.
>
> > I have added -verbose:class to mapred.child.java.opts, but i don't see
> any
> > difference in the output, I am uploading another zip of the log
>
> Ok. Sorry, but I have to try to find out how to set -verbose:class in
> (pseudo-)distributed mode. Does anyone know how to do this?
>
> > In the past, I just copied nutch-1.9/lib to hadoop-1.2.1/lib, and if
> there
> > was a a conflict, I kept the version of the file distributed with Nutch.
> > Now the Nutch and Hadoop file structures are vastly different, so I don't
> > understand, is this a problem with my configuration or with Nutch?
>
> That's not necessary. Everything to run the Nutch jobs is contained
> in apache-nutch-1.11.job. However, since you are using Hadoop 2.7.1
> hadoop-1.2.1/lib or jars from there shouldn't be on the class path.
> But it may be a good idea to make sure the class path isn't tainted.
>
> Cheers,
> Sebastian
>
>
> On 01/24/2016 01:29 AM, Jason S wrote:
> > Hi Sebastian,
> >
> > I had a look at NUTCH-2191 and the suggestions in there didn't help with
> > this issue.
> >
> > When I apply the patch, I get a build error in 1.11 and trunk:
> >
> > BUILD FAILED
> > /root/src/nutch-trunk/build.xml:116: The following error occurred while
> > executing this line:
> > /root/src/nutch-trunk/src/plugin/build.xml:54: The following error
> occurred
> > while executing this line:
> > /root/src/nutch-trunk/src/plugin/protocol-htmlunit/build.xml:39:
> > /root/src/nutch-trunk/src/plugin/protocol-htmlunit/src/test does not
> exist.
> >
> > I'm not sure where to find the protocol-html-unit plugin.
> >
> > Also, removing the http*.jar, jersey*.jar and jetty*.jar as suggested
> > doesn't work.  I just keep getting the same error as above.
> >
> > I have added -verbose:class to mapred.child.java.opts, but i don't see
> any
> > difference in the output, I am uploading another zip of the log
> > directories. The logs are here:
> > https://s3.amazonaws.com/nutch-hadoop-error/hadoop-nutch-error2.tgz
> >
> > I have searched my system, and I don't find any of the http*.jar files in
> > hadoop, although one of them is in /usr/share/java, but deleting it
> doesn't
> > seem to make a difference.
> >
> > In the past, I just copied nutch-1.9/lib to hadoop-1.2.1/lib, and if
> there
> > was a a conflict, I kept the version of the file distributed with Nutch.
> > Now the Nutch and Hadoop file structures are vastly different, so I don't
> > understand, is this a problem with my configuration or with Nutch?
> >
> > Thanks,
> >
> > Jason
> >
> >
> >
> >
> >
> > On Sat, Jan 23, 2016 at 10:05 PM, Sebastian Nagel <
> > wastl.na...@googlemail.com> wrote:
> >
> >> Hi Jason,
> >>
> >> this looks like a library dependency version conflict, probably
> >> between httpcore and httpclient. The class on top of the stack
> >> belong to these libs:
> >>  org.apache.http.impl.io.DefaultHttpRequestWriterFactory  -> httpcore
> >>  org.apache.http.impl.conn.ManagedHttpClientConnectionFactory  ->
> >> httpclient
> >>
> >> You mentioned that indexing to Solr works in local mode.
> >> Is it possible that the mapreduce tasks get a wrong httpcore (or
> >> httpclient)
> >> lib? They should use those from the apache-nutch-1.11.job,
> >> from classes/plugins/indexer-solr/ strictly speaking.
> >>
> >> We know that there are problems because the plugin class loader
> >> asks first its parent, see [1] for the most recent discussion.
> >>
> >> Can you try to add -verbose:class so that you can see in the logs from
> >> which jar the classes are loaded? Sorry, I didn't try this in
> >> (pseudo-)distributed mode yet. According to the documentation
> >> it should be possible to set this option in "mapred.child.java.opts"
> >> in your mapred-site.xml (check also other *.java.opts properties)?
> >>
> >> Cheers,
> >> Sebastian
> >>
> >> [1] https://issues.apache.org/jira/browse/NUTCH-2191
> >>
> >>
> >> On 01/23/2016 04:09 PM, Jason S wrote:
> >>> I'm not sure if it is ok to attach files to a list email, if anyone
> wants
> >>> to look at some log files, they're here:
> >>>
> >>> https://s3.amazonaws.com/nutch-hadoop-error/hadoop-nutch-error.tgz
> >>>
> >>> This crawl was done on Ubuntu 15.10 and Open Jdk 8, however, I have
> also
> >>> had the error with Ubuntu 14, Open Jdk 7 and Oracle Jdk 7, Hadoop in
> >> single
> >>> server mode and on a cluster with a master and 5 slaves.
> >>>
> >>> This crawl had minimal changes made to the config files, only
> >>> http.agent.name and sol.server.url were changed.  Nutch was built with
> >> ant,
> >>> "ant clean runtime".
> >>>
> >>> Entire log directory with an entire
> >>> inject/generate/fetch/parse/updatedb/index cycle is in there.  As
> >> indicated
> >>> in my previous messages, everything works fine until indexer, and same
> >> data
> >>> indexes fine in local mode.
> >>>
> >>> Thanks in advance,
> >>>
> >>> Jason
> >>>
> >>>
> >>> On Sat, Jan 23, 2016 at 11:43 AM, Jason S <jason.stu...@gmail.com>
> >> wrote:
> >>>
> >>>> Bump.
> >>>>
> >>>> Is there anyone who can help me with this?
> >>>>
> >>>> I'm not familiar enough with Nutch source code to label this as a bug
> >> but
> >>>> it seems to be the case, unless I have made some mistake being new to
> >>>> Hadoop 2.  I have been running Nutch on Hadoop 1.X for years and never
> >> had
> >>>> any problems like this.  Have I overlooked something in my setup?
> >>>>
> >>>> I believe the error I posted is the one causing the indexing job to
> >> fail,
> >>>> I can confirm quite a few things that are not causing the problem.
> >>>>
> >>>> -- I have used Nutch with minimal changes to default configs, and Solr
> >>>> with exactly the unmodified Schema and solrindex-mapping files
> provided
> >> in
> >>>> the config.
> >>>>
> >>>> -- Same error occurs on hadoop 2.4.0, 2.4.1, 2.7.1
> >>>>
> >>>> -- Solr 4.10.2, and solr 4.10.4 makes no difference
> >>>>
> >>>> -- Building Nutch and Solr with Open JDK or Oracle JDK makes no
> >> difference
> >>>>
> >>>> It seems like Nutch/Hadoop never connects to Solr before it fails,
> Solr
> >>>> logging in verbose mode creates 0 lines of output when the indexer job
> >> runs
> >>>> on Hadoop.
> >>>>
> >>>> All data/settings/everything the same works fine in local mode.
> >>>>
> >>>> Short of dumping segments to local mode and indexing them that way, or
> >>>> trying another indexer, i'm baffled.
> >>>>
> >>>> Many thanks if someone could help me out.
> >>>>
> >>>> Jason
> >>>>
> >>>>
> >>>> On Thu, Jan 21, 2016 at 10:29 PM, Jason S <jason.stu...@gmail.com>
> >> wrote:
> >>>>
> >>>>> Hi Markus,
> >>>>>
> >>>>> I guess that is part of my question, from the data in the top-level
> >> logs,
> >>>>> how can I tell where to look?  I have spent a couple days trying to
> >>>>> understand hadoop 2 logging , i'm still not really very sure.
> >>>>>
> >>>>> For example, I found this error here:
> >>>>>
> >>>>>
> >>>>>
> >>
> ~/hadoop-2.4.0/logs/userlogs/application_1453403905213_0001/container_1453403905213_0001_01_000041/syslog
> >>>>>
> >>>>> At first I thought the "no such field" error meant I was trying to
> put
> >>>>> data in Solr where the field didn't exist in the schema, but the same
> >> data
> >>>>> indexes fine in local mode.  Also, there are no errors in Solr logs.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Jason
> >>>>>
> >>>>> ### syslog error ###
> >>>>>
> >>>>> 2016-01-21 14:21:14,211 INFO [main]
> >>>>> org.apache.nutch.plugin.PluginRepository: Nutch Content Parser
> >>>>> (org.apache.nutch.parse.Parser)
> >>>>>
> >>>>> 2016-01-21 14:21:14,211 INFO [main]
> >>>>> org.apache.nutch.plugin.PluginRepository: Nutch Scoring
> >>>>> (org.apache.nutch.scoring.ScoringFilter)
> >>>>>
> >>>>> 2016-01-21 14:21:14,637 INFO [main]
> >>>>> org.apache.nutch.indexer.anchor.AnchorIndexingFilter: Anchor
> >> deduplication
> >>>>> is: on
> >>>>>
> >>>>> 2016-01-21 14:21:14,668 INFO [main]
> >>>>> org.apache.nutch.indexer.IndexWriters: Adding
> >>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> >>>>>
> >>>>> 2016-01-21 14:21:14,916 FATAL [main]
> >> org.apache.hadoop.mapred.YarnChild:
> >>>>> Error running child : java.lang.NoSuchFieldError: INSTANCE
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:52)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:56)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<clinit>(DefaultHttpRequestWriterFactory.java:46)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<init>(ManagedHttpClientConnectionFactory.java:72)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<init>(ManagedHttpClientConnectionFactory.java:84)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<clinit>(ManagedHttpClientConnectionFactory.java:59)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager$InternalConnectionFactory.<init>(PoolingHttpClientConnectionManager.java:493)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:149)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:138)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:114)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:726)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.nutch.indexwriter.solr.SolrUtils.getSolrServer(SolrUtils.java:57)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.open(SolrIndexWriter.java:58)
> >>>>>
> >>>>> at org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:75)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:39)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:484)
> >>>>>
> >>>>> at
> >> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414)
> >>>>>
> >>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> >>>>>
> >>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> >>>>>
> >>>>> at java.security.AccessController.doPrivileged(Native Method)
> >>>>>
> >>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
> >>>>>
> >>>>> at
> >>>>>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> >>>>>
> >>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> >>>>>
> >>>>>
> >>>>> 2016-01-21 14:21:14,927 INFO [main]
> >>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
> ReduceTask
> >>>>> metrics system...
> >>>>>
> >>>>> 2016-01-21 14:21:14,928 INFO [main]
> >>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
> >>>>> system stopped.
> >>>>>
> >>>>> 2016-01-21 14:21:14,928 INFO [main]
> >>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
> >>>>> system shutdown complete.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Jan 21, 2016 at 9:47 PM, Markus Jelsma <
> >>>>> markus.jel...@openindex.io> wrote:
> >>>>>
> >>>>>> Hi Jason - these are the top-level job logs but to really know
> what's
> >>>>>> going on, we need the actual reducer task logs.
> >>>>>> Markus
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -----Original message-----
> >>>>>>> From:Jason S <jason.stu...@gmail.com>
> >>>>>>> Sent: Thursday 21st January 2016 20:35
> >>>>>>> To: user@nutch.apache.org
> >>>>>>> Subject: Indexing Nutch 1.11 indexing Fails
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I am having a problem indexing segments in Nutch 1.11 on Hadoop.
> >>>>>>>
> >>>>>>> The cluster seems to be configured correctly and every part of the
> >>>>>> crawl
> >>>>>>> process is working flawlessly, however this is my first attempt at
> >>>>>> hadoop
> >>>>>>> 2, so perhaps my memory settings aren't perfect.  I'm also not sure
> >>>>>> where
> >>>>>>> to look in the log files for more information.
> >>>>>>>
> >>>>>>> The same data can be indexed with Nutch in local mode, so I don't
> >>>>>> think it
> >>>>>>> is a problem with the Solr configuration, and I have had Nutch
> 1.0.9
> >>>>>> with
> >>>>>>> Hadoop 1.2.1 on this same cluster and everything worked ok.
> >>>>>>>
> >>>>>>> Please let me know if I can send more information, I have spent
> >> several
> >>>>>>> days working on this with no success or clue why it is happening.
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Jason
> >>>>>>>
> >>>>>>> ### Command ###
> >>>>>>>
> >>>>>>> /root/hadoop-2.4.0/bin/hadoop jar
> >>>>>>> /root/src/apache-nutch-1.11/build/apache-nutch-1.11.job
> >>>>>>> org.apache.nutch.indexer.IndexingJob crawl/crawldb -linkdb
> >> crawl/linkdb
> >>>>>>> crawl/segments/20160121113335
> >>>>>>>
> >>>>>>> ### Error ###
> >>>>>>>
> >>>>>>> 16/01/21 14:20:47 INFO mapreduce.Job:  map 100% reduce 19%
> >>>>>>> 16/01/21 14:20:48 INFO mapreduce.Job:  map 100% reduce 26%
> >>>>>>> 16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000001_0, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000002_0, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:20:48 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000000_0, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:20:49 INFO mapreduce.Job:  map 100% reduce 0%
> >>>>>>> 16/01/21 14:20:54 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000004_0, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:20:55 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000002_1, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:20:56 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000001_1, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:00 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000000_1, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:01 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000004_1, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:02 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000002_2, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:07 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000003_0, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:08 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000004_2, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:08 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000001_2, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:11 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000000_2, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:15 INFO mapreduce.Job: Task Id :
> >>>>>>> attempt_1453403905213_0001_r_000003_1, Status : FAILED
> >>>>>>> Error: INSTANCE
> >>>>>>> 16/01/21 14:21:16 INFO mapreduce.Job:  map 100% reduce 100%
> >>>>>>> 16/01/21 14:21:16 INFO mapreduce.Job: Job job_1453403905213_0001
> >> failed
> >>>>>>> with state FAILED due to: Task failed
> >> task_1453403905213_0001_r_000004
> >>>>>>> Job failed as tasks failed. failedMaps:0 failedReduces:1
> >>>>>>>
> >>>>>>> 16/01/21 14:21:16 INFO mapreduce.Job: Counters: 39
> >>>>>>> File System Counters
> >>>>>>> FILE: Number of bytes read=0
> >>>>>>> FILE: Number of bytes written=5578886
> >>>>>>> FILE: Number of read operations=0
> >>>>>>> FILE: Number of large read operations=0
> >>>>>>> FILE: Number of write operations=0
> >>>>>>> HDFS: Number of bytes read=2277523
> >>>>>>> HDFS: Number of bytes written=0
> >>>>>>> HDFS: Number of read operations=80
> >>>>>>> HDFS: Number of large read operations=0
> >>>>>>> HDFS: Number of write operations=0
> >>>>>>> Job Counters
> >>>>>>> Failed reduce tasks=15
> >>>>>>> Killed reduce tasks=2
> >>>>>>> Launched map tasks=20
> >>>>>>> Launched reduce tasks=17
> >>>>>>> Data-local map tasks=19
> >>>>>>> Rack-local map tasks=1
> >>>>>>> Total time spent by all maps in occupied slots (ms)=334664
> >>>>>>> Total time spent by all reduces in occupied slots (ms)=548199
> >>>>>>> Total time spent by all map tasks (ms)=167332
> >>>>>>> Total time spent by all reduce tasks (ms)=182733
> >>>>>>> Total vcore-seconds taken by all map tasks=167332
> >>>>>>> Total vcore-seconds taken by all reduce tasks=182733
> >>>>>>> Total megabyte-seconds taken by all map tasks=257021952
> >>>>>>> Total megabyte-seconds taken by all reduce tasks=561355776
> >>>>>>> Map-Reduce Framework
> >>>>>>> Map input records=18083
> >>>>>>> Map output records=18083
> >>>>>>> Map output bytes=3140643
> >>>>>>> Map output materialized bytes=3178436
> >>>>>>> Input split bytes=2812
> >>>>>>> Combine input records=0
> >>>>>>> Spilled Records=18083
> >>>>>>> Failed Shuffles=0
> >>>>>>> Merged Map outputs=0
> >>>>>>> GC time elapsed (ms)=1182
> >>>>>>> CPU time spent (ms)=56070
> >>>>>>> Physical memory (bytes) snapshot=6087245824
> >>>>>>> Virtual memory (bytes) snapshot=34655649792
> >>>>>>> Total committed heap usage (bytes)=5412749312
> >>>>>>> File Input Format Counters
> >>>>>>> Bytes Read=2274711
> >>>>>>> 16/01/21 14:21:16 ERROR indexer.IndexingJob: Indexer:
> >>>>>> java.io.IOException:
> >>>>>>> Job failed!
> >>>>>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
> >>>>>>> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
> >>>>>>> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
> >>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>>>>>> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)
> >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>>>>> at
> >>>>>>>
> >>>>>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >>>>>>> at
> >>>>>>>
> >>>>>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>>>>> at java.lang.reflect.Method.invoke(Method.java:606)
> >>>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: Indexing Nutch 1.11 indexing Fails

Reply via email to