Hi, Just as a side note, the latest 1.4 development version can be found at trunk SVN repository
https://svn.apache.org/repos/asf/nutch/trunk/ On Tue, Nov 1, 2011 at 8:47 PM, Bai Shen <[email protected]> wrote: > I'm running the latest version of 1.4 We just rebuilt it last week. Is > that patch included? > > And where would it get multiple titles from? How do I tell what the titles > are so I can see if they're valid or not? > > On Tue, Nov 1, 2011 at 4:33 PM, Markus Jelsma <[email protected] > >wrote: > > > This should work around the problem in most cases. The parser can output > > two > > titles of which one is actually empty. This patch (in 1.4) skips empty > > titles. > > > > If this doesn't work you really have two _valid_ titles coming from your > > document. > > > > https://issues.apache.org/jira/browse/NUTCH-1004 > > > > > It looks like the issue I'm encountering is the same one as here. > > > > > > > > > http://lucene.472066.n3.nabble.com/multiple-values-encountered-for-non-mult > > > iValued-field-title-td1446817.html > > > > > > I'm not really sure what the linked bug is since that involves the HTML > > > parser and I'm seeing this problem with a PDF file. > > > > > > On Tue, Nov 1, 2011 at 3:41 PM, Bai Shen <[email protected]> > > wrote: > > > > I'm getting an exception when I try to commit to Solr. Looking at > the > > > > Solr log, it's showing that title is getting multiple values when > it's > > > > not a multivalue field. None of my code does anything with the > title, > > > > so I'm not sure why this is happening. > > > > > > > > How can I look at the pending commit and determine why and/or delete > > the > > > > extraneous values? The document in question is a pdf if that makes a > > > > difference. > > > -- *Lewis*

