Dear Julien

Thanks for the hint.

So, an entry in the seed file could look like:

http://www.nutch.org \t metatag=http://www.nutch.org

And the property <name>urlmeta.tags</name>
should have the value <value>metatag</value>

And the field should be added to the solr schema and
the solr mapping configuration.

Right?

This would be indeed exact the thing I need.

Best
Urs




Am 02.05.2013 um 14:30 schrieb Julien Nioche <[email protected]>:

> Hi Urs,
> 
> The plugin urlMeta can be used for that. You can add a custom feature to
> entries in your seed list and configure the parameters used by urlMeta so
> that the metadata value gets transferred  to the outlinks.  See discussion
> on http://markmail.org/message/lyk7pnbovabvcezv
> 
> J.
> 
> 
> On 2 May 2013 12:45, Urs Hofer <[email protected]> wrote:
> 
>> Hi all
>> 
>> I'm new with nutch.
>> 
>> I have a running System (Solr 4, Nutch 1.6), currently indexing about
>> 360000 Documents. In order to execute kind of a source specific search,
>> I'd like to store the original seed-url in Solr as well.
>> 
>> My crawl is limited to the domain: db.ignore.external.links=true
>> 
>> Currently, I'm solving the problem by limiting the search to the same
>> domain
>> as the seed-url. That works (mostly) quite fine.
>> 
>> But I have several seed urls starting in the same domain, which cannot
>> be seperated using this way.
>> 
>> Any suggestions?
>> Thanks
>> Hofer
>> 
>> 
>> 
> 
> 
> -- 
> *
> *Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble

Reply via email to