This should work around the problem in most cases. The parser can output two 
titles of which one is actually empty. This patch (in 1.4) skips empty titles.

If this doesn't work you really have two _valid_ titles coming from your 
document.

https://issues.apache.org/jira/browse/NUTCH-1004

> It looks like the issue I'm encountering is the same one as here.
> 
> http://lucene.472066.n3.nabble.com/multiple-values-encountered-for-non-mult
> iValued-field-title-td1446817.html
> 
> I'm not really sure what the linked bug is since that involves the HTML
> parser and I'm seeing this problem with a PDF file.
> 
> On Tue, Nov 1, 2011 at 3:41 PM, Bai Shen <[email protected]> wrote:
> > I'm getting an exception when I try to commit to Solr.  Looking at the
> > Solr log, it's showing that title is getting multiple values when it's
> > not a multivalue field.  None of my code does anything with the title,
> > so I'm not sure why this is happening.
> > 
> > How can I look at the pending commit and determine why and/or delete the
> > extraneous values?  The document in question is a pdf if that makes a
> > difference.

Reply via email to