On Tuesday 24 January 2012 15:25:24 Denis Sinner wrote:
> Oh right, thanks, its because other application also added documents with
> an "id" Field to the index (but the id there being constructed not just
> out of an url)

That's not a problem, they're all strings anyway.

> 
> I could index the url to something like "nutch_id" and change
> org.apache.nutch.indexer.solr.SolrConstants ID_FIELD - not the best
> solution thought

Do you have URL's with unencoded spaces in the result?

> 
> > Ah, this is a known problem which i cannot reproduce anymore.
> > 
> > https://issues.apache.org/jira/browse/NUTCH-1100
> > 
> > It's triggered because Solr returns something the SolrInputFormat of
> > Nutch cannot deal with. Can you please run the query in a browser and
> > see if you find anything unusual in the returned results?
> > 
> > INFO: [core_en] webapp=/solr path=/select
> > params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&versio
> > n=2} hits=52 status=0 QTime=2
> > 
> > It's likely in the id field, the other three fields are highly unlikely
> > to contain garbage.
> > 
> > Thanks
> > 
> > On Tuesday 24 January 2012 14:13:27 you wrote:
> >> hadoop.log:
> >> 
> >> 2012-01-24 14:09:37,156 INFO  solr.SolrMappingReader - source: content
> >> dest: content 2012-01-24 14:09:37,156 INFO  solr.SolrMappingReader -
> >> source: site dest: site 2012-01-24 14:09:37,156 INFO
> >> solr.SolrMappingReader - source: title dest: teaser 2012-01-24
> >> 14:09:37,156 INFO  solr.SolrMappingReader - source: boost dest: boost
> >> 2012-01-24 14:09:37,156 INFO  solr.SolrMappingReader - source: tstamp
> >> dest: changed 2012-01-24 14:09:37,156 INFO  solr.SolrMappingReader -
> >> source: tstamp dest: created 2012-01-24 14:09:37,370 INFO 
> >> solr.SolrWriter - Adding 2 documents 2012-01-24 14:09:38,095 INFO 
> >> solr.SolrIndexer - SolrIndexer: finished at 2012-01-24 14:09:38,
> >> elapsed: 00:00:02 2012-01-24 14:09:38,097 INFO 
> >> solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at
> >> 2012-01-24 14:09:38 2012-01-24 14:09:38,097 INFO
> >> solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url:
> >> http://192.168.0.47:8080/solr/core_en/ 2012-01-24 14:09:38,457 WARN
> >> mapred.LocalJobRunner - job_local_0010 java.lang.NullPointerException
> >> 
> >>    at org.apache.hadoop.io.Text.encode(Text.java:388)
> >>    at org.apache.hadoop.io.Text.set(Text.java:178)
> >>    at
> >> 
> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.nex
> >> t( SolrDeleteDuplicates.java:284) at
> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.nex
> >> t( SolrDeleteDuplicates.java:249) at
> >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.
> >> ja va:192) at
> >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:1
> >> 76 ) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> >> 
> >>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >>    at
> >> 
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >> 
> >> Solr (running out of eclipse with jetty):
> >> 
> >> 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy onInit
> >> INFO: SolrDeletionPolicy.onInit: commits:num=1
> >> 
> >>    commit{dir=/Users/dkd-
sinner/Documents/solr/SolrTypo3Plugin/solr/typo3c
> >>    ore
> >> 
> >> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=2
> >> 5,f ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd,
> >> _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis,
> >> _2.fdx, _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd,
> >> _1.fdt, segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm,
> >> _2.nrm] 24.01.2012 14:09:37 org.apache.solr.core.SolrDeletionPolicy
> >> updateCommits INFO: newest commit = 1326882792610
> >> 24.01.2012 14:09:37 org.apache.solr.update.processor.LogUpdateProcessor
> >> finish INFO:
> >> {add=[045756f6efde46c27a8e1016756bf99cc8153d51/nutch_external/http://www
> >> .d kd.de/,
> >> 5648ab376b909bc402c4ecbf45c26b4546e69f04/nutch_external/http://www.typo3
> >> -s olr.com/]} 0 71 24.01.2012 14:09:37 org.apache.solr.core.SolrCore
> >> execute INFO: [core_en] webapp=/solr path=/update
> >> params={wt=javabin&version=2} status=0 QTime=71 24.01.2012 14:09:37
> >> org.apache.solr.update.DirectUpdateHandler2 commit INFO: start
> >> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=fa
> >> ls e) 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy
> >> onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2
> >> 
> >>    commit{dir=/Users/dkd-
sinner/Documents/solr/SolrTypo3Plugin/solr/typo3c
> >>    ore
> >> 
> >> s/data/core_en/index,segFN=segments_p,version=1326882792610,generation=2
> >> 5,f ilenames=[_1.frq, _b.nrm, _b.tvx, _2.tii, _1.fnm, _2.tvx, _2.tvd,
> >> _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx, _b.prx, _2.fdt, _2.frq, _b.tis,
> >> _2.fdx, _2.fnm, _b.tii, _b.frq, _1.prx, _1.fdx, _2.tis, _1.tvf, _b.tvd,
> >> _1.fdt, segments_p, _b.fnm, _b.fdt, _b.tvf, _1.tvd, _b.fdx, _1.nrm,
> >> _2.nrm]
> >> commit{dir=/Users/dkd-sinner/Documents/solr/SolrTypo3Plugin/solr/typo3c
> >> ore
> >> s/data/core_en/index,segFN=segments_q,version=1326882792614,generation=
> >> 26,f ilenames=[_1.frq, _2.tii, _c.tii, _c.fdx, _c.tvx, _1.fnm, _2.tvx,
> >> _c.fdt, _2.tvd, _c.tis, _c.nrm, _1.tii, _2.tvf, _1.tvx, _1.tis, _2.prx,
> >> _c.prx, _2.fdt, _2.frq, _2.fdx, _2.fnm, _1.prx, _1.fdx, _2.tis, _1.tvf,
> >> _1.fdt, segments_q, _c.tvf, _c.tvd, _c.fnm, _1.tvd, _c.frq, _1.nrm,
> >> _2.nrm] 24.01.2012 14:09:38 org.apache.solr.core.SolrDeletionPolicy
> >> updateCommits INFO: newest commit = 1326882792614
> >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher <init>
> >> INFO: Opening Searcher@2a44fec1 main
> >> 24.01.2012 14:09:38 org.apache.solr.update.DirectUpdateHandler2 commit
> >> INFO: end_commit_flush
> >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher warm
> >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
> >> 
> >>    
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
> >>    ze=
> >> 
> >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> >> o=0 .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
> >> org.apache.solr.search.SolrIndexSearcher warm
> >> INFO: autowarming result for Searcher@2a44fec1 main
> >> 
> >>    
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
> >>    ze=
> >> 
> >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> >> o=0 .00,cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
> >> org.apache.solr.search.SolrIndexSearcher warm
> >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
> >> 
> >>    
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0
> >>    ,wa
> >> 
> >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.
> >> 00, cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
> >> org.apache.solr.search.SolrIndexSearcher warm
> >> INFO: autowarming result for Searcher@2a44fec1 main
> >> 
> >>    
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0
> >>    ,wa
> >> 
> >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.
> >> 00, cumulative_inserts=0,cumulative_evictions=0} 24.01.2012 14:09:38
> >> org.apache.solr.search.SolrIndexSearcher warm
> >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
> >> 
> >>    
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s
> >>    ize
> >> 
> >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitr
> >> ati o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012
> >> 14:09:38 org.apache.solr.search.SolrIndexSearcher warm
> >> INFO: autowarming result for Searcher@2a44fec1 main
> >> 
> >>    
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s
> >>    ize
> >> 
> >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hitr
> >> ati o=0.72,cumulative_inserts=22,cumulative_evictions=0} 24.01.2012
> >> 14:09:38 org.apache.solr.search.SolrIndexSearcher warm
> >> INFO: autowarming Searcher@2a44fec1 main from Searcher@3d78cd7b main
> >> 
> >>    
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size
> >>    =0,
> >> 
> >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitr
> >> ati o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012
> >> 14:09:38 org.apache.solr.search.SolrIndexSearcher warm
> >> INFO: autowarming result for Searcher@2a44fec1 main
> >> 
> >>    
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size
> >>    =0,
> >> 
> >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hitr
> >> ati o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012
> >> 14:09:38 org.apache.solr.core.QuerySenderListener newSearcher INFO:
> >> QuerySenderListener sending requests to Searcher@2a44fec1 main
> >> 24.01.2012 14:09:38 org.apache.solr.core.QuerySenderListener
> >> newSearcher INFO: QuerySenderListener done.
> >> 24.01.2012 14:09:38
> >> org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListen
> >> er buildSpellIndex INFO: Building spell index for spell checker: default
> >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore registerSearcher
> >> INFO: [core_en] Registered new searcher Searcher@2a44fec1 main
> >> 24.01.2012 14:09:38 org.apache.solr.search.SolrIndexSearcher close
> >> INFO: Closing Searcher@3d78cd7b main
> >> 
> >>    
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
> >>    ze=
> >> 
> >> 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> >> o=0 .00,cumulative_inserts=0,cumulative_evictions=0}
> >> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
> >> wa
> >> rmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0
> >> .00, cumulative_inserts=0,cumulative_evictions=0}
> >> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
> >> ze
> >> =0,warmupTime=0,cumulative_lookups=44,cumulative_hits=32,cumulative_hit
> >> rati o=0.72,cumulative_inserts=22,cumulative_evictions=0}
> >> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
> >> 0,
> >> warmupTime=0,cumulative_lookups=1136,cumulative_hits=618,cumulative_hit
> >> rati o=0.54,cumulative_inserts=518,cumulative_evictions=0} 24.01.2012
> >> 14:09:38 org.apache.solr.update.processor.LogUpdateProcessor finish
> >> INFO: {commit=} 0 212
> >> 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
> >> INFO: [core_en] webapp=/solr path=/update
> >> params={waitSearcher=true&waitFlush=true&wt=javabin&commit=true&version=
> >> 2} status=0 QTime=212 24.01.2012 14:09:38 org.apache.solr.core.SolrCore
> >> execute
> >> INFO: [core_en] webapp=/solr path=/select
> >> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0
> >> QTime=2 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
> >> INFO: [core_en] webapp=/solr path=/select
> >> params={fl=id&wt=javabin&q=*:*&rows=1&version=2} hits=52 status=0
> >> QTime=1 24.01.2012 14:09:38 org.apache.solr.core.SolrCore execute
> >> INFO: [core_en] webapp=/solr path=/select
> >> params={fl=id,boost,tstamp,digest&start=0&q=*:*&wt=javabin&rows=52&versi
> >> on =2} hits=52 status=0 QTime=2
> >> 
> >>> Please post the Nutch and Solr logs.
> >>> 
> >>> On Tuesday 24 January 2012 13:46:25 Denis Sinner wrote:
> >>>> Hello,
> >>>> 
> >>>> i have a setup Nutch crawler and try to index into a Solr Core where
> >>>> information is written by other applications aswell. The data gets
> >>>> indexed, but i get the following error:
> >>>> 
> >>>> SolrDeleteDuplicates: starting at 2012-01-24 12:59:43
> >>>> SolrDeleteDuplicates: Solr url: http://192.168.0.47:8080/solr/core_en/
> >>>> Exception in thread "main" java.io.IOException: Job failed!
> >>>> 
> >>>>  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> >>>>  at
> >>>> 
> >>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDup
> >>>> li ca tes.java:392) at
> >>>> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDup
> >>>> li ca tes.java:372) at
> >>>> org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
> >>>> 
> >>>>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>>>  at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
> >>>> 
> >>>> If i index into an empty Core on the same Solr server, i don't get
> >>>> this exception. Any hints how to solve it? I would be very Thankful.
> >>>> 
> >>>> Thanks,
> >>>> 
> >>>> Denis

-- 
Markus Jelsma - CTO - Openindex

Reply via email to