Hi - if you check Solr's logs, you'll notice a problem in Lucene docvalues that has been solved in 5.5. M.
-----Original message----- > From:Kshitij Shukla <[email protected]> > Sent: Tuesday 9th February 2016 8:00 > To: [email protected] > Subject: [CIS-CMMI-3] Unable to index id ... possible analysis error > > Hello everyone, > > I have added a set of seeds to crawl using this command > * > ./bin/crawl /largeSeeds 1 http://localhost:8983/solr/ddcd 4 > > *I encountered an error**in index phase, which says* > > *"Error: > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > Exception writing document id > com.angelfire.www:http/rock/babesintoyland/ to the index; possible > analysis error." > * > *Where in the error(Exception writing document id *$ID* to the index; > possible analysis error.) $ID is one of the followings: > > com.angelfire.www:http/rock/babesintoyland/ > com.classicbands.www:http/steppenwolf.html > net.classiccat.www:http/albeniz_i/ > com.all-reviews.www:http/videos-2/multiplicity.htm > com.donnathebuffalo.www:http/ > com.musicbizacademy.www:http/ > com.allrightnow.www:http/fws/ > com.blinddivine.www:http/ > edu.mit.shakespeare:http/tempest/full.html > com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml > 1.75.62.198:http/www1/sistine/0-Tour.html > com.musicbizacademy.www:http/ > com.musicbizacademy.www:http/ > com.blinddivine.www:http/ > edu.mit.shakespeare:http/tempest/full.html > com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml > * > > *Full stack trace of error is as follows > ****************************LOG START********************** > *16/02/08 20:54:51 INFO mapreduce.Job: Task Id : > attempt_1454932871058_0013_m_000003_2, Status : FAILED > Error: > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > Exception writing document id > com.collegetermpapers.www:http/TermPapers/Music/Shostokovich.shtml to > the index; possible analysis error. > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84) > at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:84) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:48) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:43) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:120) > at > org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:69) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > 16/02/08 20:54:53 INFO mapreduce.Job: Task Id : > attempt_1454932871058_0013_m_000000_2, Status : FAILED > Error: > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > Exception writing document id 1.75.62.198:http/www1/sistine/0-Tour.html > to the index; possible analysis error. > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:84) > at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:84) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:48) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:43) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:120) > at > org.apache.nutch.indexer.IndexingJob$IndexerMapper.map(IndexingJob.java:69) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > 16/02/08 20:54:55 INFO mapreduce.Job: map 100% reduce 0% > 16/02/08 20:54:55 INFO mapreduce.Job: Job job_1454932871058_0013 failed > with state FAILED due to: Task failed task_1454932871058_0013_m_000004 > Job failed as tasks failed. failedMaps:1 failedReduces:0 > > 16/02/08 20:54:55 INFO mapreduce.Job: Counters: 14 > Job Counters > Failed map tasks=19 > Killed map tasks=4 > Launched map tasks=23 > Other local map tasks=17 > Data-local map tasks=2 > Rack-local map tasks=4 > Total time spent by all maps in occupied slots (ms)=2182584 > Total time spent by all reduces in occupied slots (ms)=0 > Total time spent by all map tasks (ms)=1091292 > Total vcore-seconds taken by all map tasks=1091292 > Total megabyte-seconds taken by all map tasks=4469932032 > Map-Reduce Framework > CPU time spent (ms)=0 > Physical memory (bytes) snapshot=0 > Virtual memory (bytes) snapshot=0 > 16/02/08 20:54:55 ERROR indexer.IndexingJob: SolrIndexerJob: > java.lang.RuntimeException: job failed: name=[1]Indexer, > jobid=job_1454932871058_0013 > at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:154) > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:176) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > > Error running: > /home/c1/apache-nutch-2.3.1/runtime/deploy/bin/nutch index -D > mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D > mapred.reduce.tasks.speculative.execution=false -D > mapred.map.tasks.speculative.execution=false -D > mapred.compress.map.output=true -D > solr.server.url=http://ns613.mycyberhosting.com:8983/solr/ddcds -all > -crawlId 1 > Failed with exit value 255. > ****************************LOG END********************** > > *Please advise. > BR > > -- > > ------------------------------ > > *Cyber Infrastructure (P) Limited, [CIS] **(CMMI Level 3 Certified)* > > Central India's largest Technology company. > > *Ensuring the success of our clients and partners through our highly > optimized Technology solutions.* > > www.cisin.com | +Cisin <https://plus.google.com/+Cisin/> | Linkedin > <https://www.linkedin.com/company/cyber-infrastructure-private-limited> | > Offices: *Indore, India.* *Singapore. Silicon Valley, USA*. > > DISCLAIMER: INFORMATION PRIVACY is important for us, If you are not the > intended recipient, you should delete this message and are notified that > any disclosure, copying or distribution of this message, or taking any > action based on it, is strictly prohibited by Law. >

