I am going to start my own thread rather than being under javozzo's thread :)!
Hi, I am using Nutch 1.5.1 and Solr 3.6 and having problem with command SolrDeleteDuplicates. Looking at Hadoop logs: I am getting error: java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:388) at org.apache.hadoop.io.Text.set(Text.java:178) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(S olrDeleteDuplicates.java:270) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(S olrDeleteDuplicates.java:241) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.jav a:236) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Also had another question about updating Nutch to 1.6 and 1.7. I had tried updating to newer version of Nutch but got exception during deleting duplicates in SOLR. After lot of research online found that a field had changed. A few said digest field and others said that url field is no longer there. So here are my questions: 1: Is there a newer solr mapping file that needs to be used? 2: Can the SOLR index from 1.5.1 and index from newer version co-exist or we need to re-index from one version of Nutch? I will really appreciate any help with this. Thanks in advance, Madhvi Madhvi Arora AutomationDirect The #1 Best Mid-Sized Company to work for in Atlanta<http://www.ajc.com/business/topworkplaces/automationdirect-com-top-midsize-1421260.html> 2012

