I think the channel/image/title idea was probably wrong.  It looks like the
extra title field is actually the http header Content-Disposition: inline;
filename="jobexport.xml".  I can email you the url privately of the
specific RSS feed I'm using for this issue, but since it's a client site
I'm not sure I'm allowed to post it publicly.

I'm using the default parser-plugins.xml which shows parse-tika before
feed.  I don't have feed in my plugin.includes, but if I modify
parser-plugins.xml and plugin.includes to try to favor the feed I still get
the same results.  I might be doing something wrong.




On Mon, Feb 24, 2014 at 2:20 PM, Sebastian Nagel <[email protected]
> wrote:

> Hi John,
>
> can you attach an (short) example document to reproduce the problem?
> I was not able to reproduce it with the example in
> http://de.wikipedia.org/wiki/RSS
> which contains channel/image/title.
>
> Which parser plugin is used: "feed" or "parse-tika"?
> (In doubt, please, add the value of property "plugin.includes")
>
> Sebastian
>
>
> On 02/24/2014 08:31 PM, John Lafitte wrote:
> > I am using Nutch 1.7 and Solr 4.6.1.  I'm having a problem with indexing
> > RSS that has channel/title then channel/image/title it tries to add both
> of
> > them then fails when doing solrindex because title isn't multivalued.
> >
> > I've used nutch indexchecker and I see the two titles being returned.
>  The
> > extra title is the value that in the content-disposition: filename http
> > header.  I only see one title when I run nutch readseg.  So I'm a little
> > confused why it's
> >
> > I have made title multivalued in the solr schema and it seems to work
> that
> > way, but it seems wrong to me.  Documents shouldn't have more than one
> > title.  What is the correct way to fix this?
> >
>
>

Reply via email to