Hello everyone,

I have added a set of seeds to crawl using this command
*
./bin/crawl /largeSeeds 1 http://localhost:8983/solr/ddcd 4

*I encountered an error**in index phase, which says*

*"Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id com.angelfire.www:http/rock/babesintoyland/ to the index; possible analysis error."
*
*Where in the error(Exception writing document id *$ID* to the index; possible analysis error.) $ID is one of the followings:

com.angelfire.www:http/rock/babesintoyland/
com.classicbands.www:http/steppenwolf.html
net.classiccat.www:http/albeniz_i/
com.all-reviews.www:http/videos-2/multiplicity.htm
com.donnathebuffalo.www:http/
com.musicbizacademy.www:http/
com.allrightnow.www:http/fws/
com.blinddivine.www:http/
edu.mit.shakespeare:http/tempest/full.html
com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml
1.75.62.198:http/www1/sistine/0-Tour.html
com.musicbizacademy.www:http/
com.musicbizacademy.www:http/
com.blinddivine.www:http/
edu.mit.shakespeare:http/tempest/full.html
com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml
*

*Full stack trace of error is as follows
****************************LOG START**********************
*16/02/08 20:54:51 INFO mapreduce.Job: Task Id : attempt_1454932871058_0013_m_000003_2, Status : FAILED Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml to the index; possible analysis error. at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84)
    at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:84)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:48) at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:43) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:120) at org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:69)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

16/02/08 20:54:53 INFO mapreduce.Job: Task Id : attempt_1454932871058_0013_m_000000_2, Status : FAILED Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id 1.75.62.198:http/www1/sistine/0-Tour.html to the index; possible analysis error. at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84)
    at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:84)
at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:48) at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:43) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:120) at org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:69)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

16/02/08 20:54:55 INFO mapreduce.Job:  map 100% reduce 0%
16/02/08 20:54:55 INFO mapreduce.Job: Job job_1454932871058_0013 failed with state FAILED due to: Task failed task_1454932871058_0013_m_000004
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/02/08 20:54:55 INFO mapreduce.Job: Counters: 14
    Job Counters
        Failed map tasks=19
        Killed map tasks=4
        Launched map tasks=23
        Other local map tasks=17
        Data-local map tasks=2
        Rack-local map tasks=4
        Total time spent by all maps in occupied slots (ms)=2182584
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=1091292
        Total vcore-seconds taken by all map tasks=1091292
        Total megabyte-seconds taken by all map tasks=4469932032
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
16/02/08 20:54:55 ERROR indexer.IndexingJob: SolrIndexerJob: java.lang.RuntimeException: job failed: name=[1]Indexer, jobid=job_1454932871058_0013
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:154)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:176)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Error running:
/home/c1/apache-nutch-2.3.1/runtime/deploy/bin/nutch index -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -D solr.server.url=http://ns613.mycyberhosting.com:8983/solr/ddcds -all -crawlId 1
Failed with exit value 255.
****************************LOG END**********************

*Please advise.
BR

--

------------------------------

*Cyber Infrastructure (P) Limited, [CIS] **(CMMI Level 3 Certified)*

Central India's largest Technology company.

*Ensuring the success of our clients and partners through our highly optimized Technology solutions.*

www.cisin.com | +Cisin <https://plus.google.com/+Cisin/> | Linkedin <https://www.linkedin.com/company/cyber-infrastructure-private-limited> | Offices: *Indore, India.* *Singapore. Silicon Valley, USA*.

DISCLAIMER: INFORMATION PRIVACY is important for us, If you are not the intended recipient, you should delete this message and are notified that any disclosure, copying or distribution of this message, or taking any action based on it, is strictly prohibited by Law.

Reply via email to