Hello everyone,
I have added a set of seeds to crawl using this command
*
./bin/crawl /largeSeeds 1 http://localhost:8983/solr/ddcd 4
*I encountered an error**in index phase, which says*
*"Error:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Exception writing document id
com.angelfire.www:http/rock/babesintoyland/ to the index; possible
analysis error."
*
*Where in the error(Exception writing document id *$ID* to the index;
possible analysis error.) $ID is one of the followings:
com.angelfire.www:http/rock/babesintoyland/
com.classicbands.www:http/steppenwolf.html
net.classiccat.www:http/albeniz_i/
com.all-reviews.www:http/videos-2/multiplicity.htm
com.donnathebuffalo.www:http/
com.musicbizacademy.www:http/
com.allrightnow.www:http/fws/
com.blinddivine.www:http/
edu.mit.shakespeare:http/tempest/full.html
com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml
1.75.62.198:http/www1/sistine/0-Tour.html
com.musicbizacademy.www:http/
com.musicbizacademy.www:http/
com.blinddivine.www:http/
edu.mit.shakespeare:http/tempest/full.html
com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml
*
*Full stack trace of error is as follows
****************************LOG START**********************
*16/02/08 20:54:51 INFO mapreduce.Job: Task Id :
attempt_1454932871058_0013_m_000003_2, Status : FAILED
Error:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Exception writing document id
com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml to
the index; possible analysis error.
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:84)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:48)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:43)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at
org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:120)
at
org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:69)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
16/02/08 20:54:53 INFO mapreduce.Job: Task Id :
attempt_1454932871058_0013_m_000000_2, Status : FAILED
Error:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Exception writing document id 1.75.62.198:http/www1/sistine/0-Tour.html
to the index; possible analysis error.
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:84)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:48)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:43)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at
org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:120)
at
org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:69)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
16/02/08 20:54:55 INFO mapreduce.Job: map 100% reduce 0%
16/02/08 20:54:55 INFO mapreduce.Job: Job job_1454932871058_0013 failed
with state FAILED due to: Task failed task_1454932871058_0013_m_000004
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/02/08 20:54:55 INFO mapreduce.Job: Counters: 14
Job Counters
Failed map tasks=19
Killed map tasks=4
Launched map tasks=23
Other local map tasks=17
Data-local map tasks=2
Rack-local map tasks=4
Total time spent by all maps in occupied slots (ms)=2182584
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1091292
Total vcore-seconds taken by all map tasks=1091292
Total megabyte-seconds taken by all map tasks=4469932032
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
16/02/08 20:54:55 ERROR indexer.IndexingJob: SolrIndexerJob:
java.lang.RuntimeException: job failed: name=[1]Indexer,
jobid=job_1454932871058_0013
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:154)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:176)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Error running:
/home/c1/apache-nutch-2.3.1/runtime/deploy/bin/nutch index -D
mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
mapred.reduce.tasks.speculative.execution=false -D
mapred.map.tasks.speculative.execution=false -D
mapred.compress.map.output=true -D
solr.server.url=http://ns613.mycyberhosting.com:8983/solr/ddcds -all
-crawlId 1
Failed with exit value 255.
****************************LOG END**********************
*Please advise.
BR
--
------------------------------
*Cyber Infrastructure (P) Limited, [CIS] **(CMMI Level 3 Certified)*
Central India's largest Technology company.
*Ensuring the success of our clients and partners through our highly
optimized Technology solutions.*
www.cisin.com | +Cisin <https://plus.google.com/+Cisin/> | Linkedin
<https://www.linkedin.com/company/cyber-infrastructure-private-limited> |
Offices: *Indore, India.* *Singapore. Silicon Valley, USA*.
DISCLAIMER: INFORMATION PRIVACY is important for us, If you are not the
intended recipient, you should delete this message and are notified that
any disclosure, copying or distribution of this message, or taking any
action based on it, is strictly prohibited by Law.