Okay, I invoked it the way you mentioned and I get the same result.
 However, I tried it without index-more included and I no longer have the
additional title.  Why is index-more adding this?


On Mon, Feb 24, 2014 at 3:24 PM, Sebastian Nagel <[email protected]
> wrote:

> > I'm not sure I'm allowed to post it publicly.
> A minimalistic and anonymized example would be fine.
> However, if it's really the HTTP header it will
> be hard to make it reproducible.
>
> > I'm using the default parser-plugins.xml which shows parse-tika before
> > feed.  I don't have feed in my plugin.includes, but if I modify
> > parser-plugins.xml and plugin.includes to try to favor the feed I still
> get
> > the same results.  I might be doing something wrong.
>
> It's possible to set plugin.includes (and other properties) just for
> tools like indexchecker, parsechecker, etc:
>
> % bin/nutch indexchecker
> -Dplugin.includes="feed|index-(basic|more)|protocol-http" .../rss.xml
>
>
> On 02/24/2014 09:59 PM, John Lafitte wrote:
> > I think the channel/image/title idea was probably wrong.  It looks like
> the
> > extra title field is actually the http header Content-Disposition:
> inline;
> > filename="jobexport.xml".  I can email you the url privately of the
> > specific RSS feed I'm using for this issue, but since it's a client site
> > I'm not sure I'm allowed to post it publicly.
> >
> > I'm using the default parser-plugins.xml which shows parse-tika before
> > feed.  I don't have feed in my plugin.includes, but if I modify
> > parser-plugins.xml and plugin.includes to try to favor the feed I still
> get
> > the same results.  I might be doing something wrong.
> >
> >
> >
> >
> > On Mon, Feb 24, 2014 at 2:20 PM, Sebastian Nagel <
> [email protected]
> >> wrote:
> >
> >> Hi John,
> >>
> >> can you attach an (short) example document to reproduce the problem?
> >> I was not able to reproduce it with the example in
> >> http://de.wikipedia.org/wiki/RSS
> >> which contains channel/image/title.
> >>
> >> Which parser plugin is used: "feed" or "parse-tika"?
> >> (In doubt, please, add the value of property "plugin.includes")
> >>
> >> Sebastian
> >>
> >>
> >> On 02/24/2014 08:31 PM, John Lafitte wrote:
> >>> I am using Nutch 1.7 and Solr 4.6.1.  I'm having a problem with
> indexing
> >>> RSS that has channel/title then channel/image/title it tries to add
> both
> >> of
> >>> them then fails when doing solrindex because title isn't multivalued.
> >>>
> >>> I've used nutch indexchecker and I see the two titles being returned.
> >>  The
> >>> extra title is the value that in the content-disposition: filename http
> >>> header.  I only see one title when I run nutch readseg.  So I'm a
> little
> >>> confused why it's
> >>>
> >>> I have made title multivalued in the solr schema and it seems to work
> >> that
> >>> way, but it seems wrong to me.  Documents shouldn't have more than one
> >>> title.  What is the correct way to fix this?
> >>>
> >>
> >>
> >
>
>

Reply via email to