On Tuesday 24 January 2012 15:25:24 Denis Sinner wrote: > Oh right, thanks, its because other application also added documents with > an "id" Field to the index (but the id there being constructed not just > out of an url)
That's not a problem, they're all strings anyway. > > I could index the url to something like "nutch_id" and change > org.apache.nutch.indexer.solr.SolrConstants ID_FIELD - not the best > solution thought Do you have URL's with unencoded spaces in the result? > > > Ah, this is a known problem which i cannot reproduce anymore. > > > > https://issues.apache.org/jira/browse/NUTCH-1100 > > > > It's triggered because Solr returns something the SolrInputFormat of > > Nutch cannot deal with. Can you please run the query in a browser and > > see if you find anything unusual in the returned results? > > > > INFO: [core_en] webapp=/solr path=/select > > params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&versio > > n=2} hits=52 status=0 QTime=2 > > > > It's likely in the id field, the other three fields are highly unlikely > > to contain garbage. > > > > Thanks > > > > On Tuesday 24 January 2012 14:13:27 you wrote: > >> hadoop.log: > >> > >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: content > >> dest: content 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - > >> source: site dest: site 2012-01-24 14:09:37,156 INFO > >> solr.SolrMappingReader - source: title dest: teaser 2012-01-24 > >> 14:09:37,156 INFO solr.SolrMappingReader - source: boost dest: boost > >> 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - source: tstamp > >> dest: changed 2012-01-24 14:09:37,156 INFO solr.SolrMappingReader - > >> source: tstamp dest: created 2012-01-24 14:09:37,370 INFO > >> solr.SolrWriter - Adding 2 documents 2012-01-24 14:09:38,095 INFO > >> solr.SolrIndexer - SolrIndexer: finished at 2012-01-24 14:09:38, > >> elapsed: 00:00:02 2012-01-24 14:09:38,097 INFO > >> solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at > >> 2012-01-24 14:09:38 2012-01-24 14:09:38,097 INFO > >> solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: > >> http://192.168.0.47:8080/solr/core_en/ 2012-01-24 14:09:38,457 WARN > >> mapred.LocalJobRunner - job_local_0010 java.lang.NullPointerException > >> > >> at org.apache.hadoop.io.Text.encode(Text.java:388) > >> at org.apache.hadoop.io.Text.set(Text.java:178) > >> at > >> > >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.nex > >> t( SolrDeleteDuplicates.java:284) at > >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.nex > >> t( SolrDeleteDuplicates.java:249) at > >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask. > >> ja va:192) at > >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:1 > >> 76 ) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > >> > >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > >> at > >> > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > >> > >> Solr (running out of eclipse with jetty): > >> > >> 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy onInit > >> INFO: SolrDeletionPolicy.onInit: commits:num=1 > >> > >> commit{dir=/Users/dkd- sinner/Documents/solr/SolrTypo3Plugin/solr/typo3c > >> ore > >> > >> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=2 > >> 5,f ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, > >> _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, > >> _2.fdx, _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, > >> _1.fdt, segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, > >> _2.nrm] 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy > >> updateCommits INFO: newest commit = 1326882792610 > >> 24.01.2012 14:09:37 org.apache.solr.update.processor.LogUpdateProcessor > >> finish INFO: > >> {add=[045756f6efde46c27a8e1016756bf99cc8153d51/nutch_external/http://www > >> .d kd.de/, > >> 5648ab376b909bc402c4ecbf45c26b4546e69f04/nutch_external/http://www.typo3 > >> -s olr.com/]} 0 71 24.01.2012 14:09:37 org.apache.solr.core.SolrCore > >> execute INFO: [core_en] webapp=/solr path=/update > >> params={wt=javabin&version=2} status=0 QTime=71 24.01.2012 14:09:37 > >> org.apache.solr.update.DirectUpdateHandler2 commit INFO: start > >> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=fa > >> ls e) 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy > >> onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2 > >> > >> commit{dir=/Users/dkd- sinner/Documents/solr/SolrTypo3Plugin/solr/typo3c > >> ore > >> > >> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=2 > >> 5,f ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd, > >> _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis, > >> _2.fdx, _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd, > >> _1.fdt, segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm, > >> _2.nrm] > >> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3c > >> ore > >> s/data/core_en/index,segFN=segments_q,version=1326882792614,generation= > >> 26,f ilenames=[_1.frq, _2.tii, _c.tii, _c.fdx, _c.tvx, _1.fnm, _2.tvx, > >> _c.fdt, _2.tvd, _c.tis, _c.nrm, _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, > >> _c.prx, _2.fdt, _2.frq, _2.fdx, _2.fnm, _1.prx, _1.fdx, _2.tis, _1.tvf, > >> _1.fdt, segments_q, _c.tvf, _c.tvd, _c.fnm, _1.tvd, _c.frq, _1.nrm, > >> _2.nrm] 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy > >> updateCommits INFO: newest commit = 1326882792614 > >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher <init> > >> INFO: Opening Searcher@2a44fec1 main > >> 24.01.2012 14:09:38 org.apache.solr.update.DirectUpdateHandler2 commit > >> INFO: end_commit_flush > >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher warm > >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main > >> > >> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si > >> ze= > >> > >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati > >> o=0 .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38 > >> org.apache.solr.search.SolrIndexSearcher warm > >> INFO: autowarming result for Searcher@2a44fec1 main > >> > >> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si > >> ze= > >> > >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati > >> o=0 .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38 > >> org.apache.solr.search.SolrIndexSearcher warm > >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main > >> > >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0 > >> ,wa > >> > >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0. > >> 00, cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38 > >> org.apache.solr.search.SolrIndexSearcher warm > >> INFO: autowarming result for Searcher@2a44fec1 main > >> > >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0 > >> ,wa > >> > >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0. > >> 00, cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38 > >> org.apache.solr.search.SolrIndexSearcher warm > >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main > >> > >> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s > >> ize > >> > >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitr > >> ati o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012 > >> 14:09:38 org.apache.solr.search.SolrIndexSearcher warm > >> INFO: autowarming result for Searcher@2a44fec1 main > >> > >> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s > >> ize > >> > >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitr > >> ati o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012 > >> 14:09:38 org.apache.solr.search.SolrIndexSearcher warm > >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main > >> > >> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size > >> =0, > >> > >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitr > >> ati o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 > >> 14:09:38 org.apache.solr.search.SolrIndexSearcher warm > >> INFO: autowarming result for Searcher@2a44fec1 main > >> > >> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size > >> =0, > >> > >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitr > >> ati o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 > >> 14:09:38 org.apache.solr.core.QuerySenderListener newSearcher INFO: > >> QuerySenderListener sending requests to Searcher@2a44fec1 main > >> 24.01.2012 14:09:38 org.apache.solr.core.QuerySenderListener > >> newSearcher INFO: QuerySenderListener done. > >> 24.01.2012 14:09:38 > >> org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListen > >> er buildSpellIndex INFO: Building spell index for spell checker: default > >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore registerSearcher > >> INFO: [core_en] Registered new searcher Searcher@2a44fec1 main > >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher close > >> INFO: Closing Searcher@3d78cd7b main > >> > >> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si > >> ze= > >> > >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati > >> o=0 .00,cumulative_inserts=0,cumulative_evictions=0} > >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, > >> wa > >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0 > >> .00, cumulative_inserts=0,cumulative_evictions=0} > >> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si > >> ze > >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hit > >> rati o=0.72,cumulative_inserts=22,cumulative_evictions=0} > >> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= > >> 0, > >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hit > >> rati o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012 > >> 14:09:38 org.apache.solr.update.processor.LogUpdateProcessor finish > >> INFO: {commit=} 0 212 > >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute > >> INFO: [core_en] webapp=/solr path=/update > >> params={waitSearcher=true&waitFlush=true&wt=javabin&commit=true&version= > >> 2} status=0 QTime=212 24.01.2012 14:09:38 org.apache.solr.core.SolrCore > >> execute > >> INFO: [core_en] webapp=/solr path=/select > >> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0 > >> QTime=2 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute > >> INFO: [core_en] webapp=/solr path=/select > >> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0 > >> QTime=1 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute > >> INFO: [core_en] webapp=/solr path=/select > >> params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&versi > >> on =2} hits=52 status=0 QTime=2 > >> > >>> Please post the Nutch and Solr logs. > >>> > >>> On Tuesday 24 January 2012 13:46:25 Denis Sinner wrote: > >>>> Hello, > >>>> > >>>> i have a setup Nutch crawler and try to index into a Solr Core where > >>>> information is written by other applications aswell. The data gets > >>>> indexed, but i get the following error: > >>>> > >>>> SolrDeleteDuplicates: starting at 2012-01-24 12:59:43 > >>>> SolrDeleteDuplicates: Solr url: http://192.168.0.47:8080/solr/core_en/ > >>>> Exception in thread "main" java.io.IOException: Job failed! > >>>> > >>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > >>>> at > >>>> > >>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDup > >>>> li ca tes.java:392) at > >>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDup > >>>> li ca tes.java:372) at > >>>> org.apache.nutch.crawl.Crawl.run(Crawl.java:153) > >>>> > >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >>>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > >>>> > >>>> If i index into an empty Core on the same Solr server, i don't get > >>>> this exception. Any hints how to solve it? I would be very Thankful. > >>>> > >>>> Thanks, > >>>> > >>>> Denis -- Markus Jelsma - CTO - Openindex

