Hey Sebastian,

Opened an issue/took a stab at it:
https://issues.apache.org/jira/browse/NUTCH-1872

Thanks,
jce

On Tue, Oct 7, 2014 at 9:32 AM, Sebastian Nagel <[email protected]>
wrote:

> Hi,
>
> > Having looked at the wiki, NUTCH-655, and NUTCH-855, it seems like using
> > the urlmeta plugin out of the box would not achieve this, because the
> > metadata would be propagated to all outlinks (which presumably would
> > include its parent, et al.).
> >
> > Is this correct? If so, is there any built-in way to do this or do I need
> > to figure something out?
>
> Yes, that's right.
>
> But it would be easy to add the check in distributeScoreToOutlinks()
> of URLMetaScoringFilter. Maybe it's also a good idea to make this
> functionality generally available via a property and predefined
> match classes (eg, same prefix, same host, same domain). Feel free to
> open an issue for that feature.
>
> Thanks,
> Sebastian
>
> On 10/06/2014 11:00 PM, Jonathan Cooper-Ellis wrote:
> > Hello,
> >
> > I am interested in injecting metadata and propagating that to its
> children
> > only.
> >
> > For example, if I want to inject www.fakenews.com/boston along with some
> > metadata that is specific to Boston, so I don't want it to be propagated
> to
> > www.fakenews.com or www.fakenews.com/atlanta. It should only go to
> > www.fakenews.com/boston/.+
> >
> > Having looked at the wiki, NUTCH-655, and NUTCH-855, it seems like using
> > the urlmeta plugin out of the box would not achieve this, because the
> > metadata would be propagated to all outlinks (which presumably would
> > include its parent, et al.).
> >
> > Is this correct? If so, is there any built-in way to do this or do I need
> > to figure something out?
> >
> > Thanks,
> > jce
> >
>
>


-- 
Jonathan Cooper-Ellis
*Data Engineer*
myVBO, LLC dba Ziftr

Reply via email to