Hi Urs,

The plugin urlMeta can be used for that. You can add a custom feature to
entries in your seed list and configure the parameters used by urlMeta so
that the metadata value gets transferred  to the outlinks.  See discussion
on http://markmail.org/message/lyk7pnbovabvcezv

J.


On 2 May 2013 12:45, Urs Hofer <[email protected]> wrote:

> Hi all
>
> I'm new with nutch.
>
> I have a running System (Solr 4, Nutch 1.6), currently indexing about
> 360000 Documents. In order to execute kind of a source specific search,
> I'd like to store the original seed-url in Solr as well.
>
> My crawl is limited to the domain: db.ignore.external.links=true
>
> Currently, I'm solving the problem by limiting the search to the same
> domain
> as the seed-url. That works (mostly) quite fine.
>
> But I have several seed urls starting in the same domain, which cannot
> be seperated using this way.
>
> Any suggestions?
> Thanks
> Hofer
>
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to