[CIS-CMMI-3] Unable to index id ... possible analysis error

Kshitij Shukla Mon, 08 Feb 2016 23:00:06 -0800

Hello everyone,

I have added a set of seeds to crawl using this command
*
./bin/crawl /largeSeeds 1 http://localhost:8983/solr/ddcd 4


*I encountered an error**in index phase, which says*

*"Error:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Exception writing document idcom.angelfire.www:http/rock/babesintoyland/ to the index; possibleanalysis error."

*Where in the error(Exception writing document id *$ID* to the index;possible analysis error.) $ID is one of the followings:


com.angelfire.www:http/rock/babesintoyland/
com.classicbands.www:http/steppenwolf.html
net.classiccat.www:http/albeniz_i/
com.all-reviews.www:http/videos-2/multiplicity.htm
com.donnathebuffalo.www:http/
com.musicbizacademy.www:http/
com.allrightnow.www:http/fws/
com.blinddivine.www:http/
edu.mit.shakespeare:http/tempest/full.html
com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml
1.75.62.198:http/www1/sistine/0-Tour.html
com.musicbizacademy.www:http/
com.musicbizacademy.www:http/
com.blinddivine.www:http/
edu.mit.shakespeare:http/tempest/full.html
com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml
*

*Full stack trace of error is as follows
****************************LOG START**********************

*16/02/08 20:54:51 INFO mapreduce.Job: Task Id :attempt_1454932871058_0013_m_000003_2, Status : FAILEDError:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Exception writing document idcom.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml tothe index; possible analysis error.atorg.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491)atorg.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)atorg.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)

    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)

atorg.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84)

    at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:84)

atorg.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:48)atorg.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:43)atorg.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)atorg.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)atorg.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)atorg.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:120)atorg.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:69)

    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)

atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)

    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

16/02/08 20:54:53 INFO mapreduce.Job: Task Id :attempt_1454932871058_0013_m_000000_2, Status : FAILEDError:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Exception writing document id 1.75.62.198:http/www1/sistine/0-Tour.htmlto the index; possible analysis error.atorg.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491)atorg.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)atorg.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)

    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)

atorg.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84)

    at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:84)

    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)

atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)

    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

16/02/08 20:54:55 INFO mapreduce.Job:  map 100% reduce 0%

16/02/08 20:54:55 INFO mapreduce.Job: Job job_1454932871058_0013 failedwith state FAILED due to: Task failed task_1454932871058_0013_m_000004

Job failed as tasks failed. failedMaps:1 failedReduces:0

16/02/08 20:54:55 INFO mapreduce.Job: Counters: 14
    Job Counters
        Failed map tasks=19
        Killed map tasks=4
        Launched map tasks=23
        Other local map tasks=17
        Data-local map tasks=2
        Rack-local map tasks=4
        Total time spent by all maps in occupied slots (ms)=2182584
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=1091292
        Total vcore-seconds taken by all map tasks=1091292
        Total megabyte-seconds taken by all map tasks=4469932032
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0

16/02/08 20:54:55 ERROR indexer.IndexingJob: SolrIndexerJob:java.lang.RuntimeException: job failed: name=[1]Indexer,jobid=job_1454932871058_0013

    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:154)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:176)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Error running:

/home/c1/apache-nutch-2.3.1/runtime/deploy/bin/nutch index -Dmapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -Dmapred.reduce.tasks.speculative.execution=false -Dmapred.map.tasks.speculative.execution=false -Dmapred.compress.map.output=true -Dsolr.server.url=http://ns613.mycyberhosting.com:8983/solr/ddcds -all-crawlId 1

Failed with exit value 255.
****************************LOG END**********************

*Please advise.
BR

--

------------------------------

*Cyber Infrastructure (P) Limited, [CIS] **(CMMI Level 3 Certified)*

Central India's largest Technology company.

*Ensuring the success of our clients and partners through our highlyoptimized Technology solutions.*

www.cisin.com | +Cisin <https://plus.google.com/+Cisin/> | Linkedin<https://www.linkedin.com/company/cyber-infrastructure-private-limited> |Offices: *Indore, India.* *Singapore. Silicon Valley, USA*.

DISCLAIMER: INFORMATION PRIVACY is important for us, If you are not theintended recipient, you should delete this message and are notified thatany disclosure, copying or distribution of this message, or taking anyaction based on it, is strictly prohibited by Law.

[CIS-CMMI-3] Unable to index id ... possible analysis error

Reply via email to