Hi Claudio,

Can't this be achieved using the subcollection plugin?

BTW contributions are definitely encouraged but better to open a JIRA issue
and include a patch (see http://wiki.apache.org/nutch/HowToContribute)

J.

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

On 26 November 2010 11:30, Claudio Martella <[email protected]>wrote:

> Hello list,
>
>
> in my current scenario we have multiple crawls that crawl different
> intranet sites: public site, wiki, fileserver, intranet etc.
> After the crawling jobs have finished, we send the indexes to our Solr
> server. The problem is that after the fact, it's impossible to decide
> which url belongs to what crawl job. That can be useful to define better
> the search within a subset of the items in the index, or to issue a
> selective delete when updating the crawls.
>
> For this reason i wrote this small plugin which adds a property
> index.static.fields to the nutch-site config.
>
> In my example it would be used like this:
>
> <property>
> <name>index.static.fields</name>
> <value>crawl:fileserver</value>
> </property>
>
> defining a lucene field "crawl" with content "fileserver", but it
> supports multiple fields in a comma-separated list.
>
> I attach the plugin to be unpacked in src/plugin/, I hope it helps
> somebody.
>
> Best,
> Claudio
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> [email protected] http://www.tis.bz.it
>
> Short information regarding use of personal data. According to Section 13
> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we
> process your personal data in order to fulfil contractual and fiscal
> obligations and also to send you information regarding our services and
> events. Your personal data are processed with and without electronic means
> and by respecting data subjects' rights, fundamental freedoms and dignity,
> particularly with regard to confidentiality, personal identity and the right
> to personal data protection. At any time and without formalities you can
> write an e-mail to [email protected] in order to object the processing of
> your personal data for the purpose of sending advertising materials and also
> to exercise the right to access personal data and other rights referred to
> in Section 7 of Decree 196/2003. The data controller is TIS Techno
> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the
> complete information on the web site www.tis.bz.it.
>
>

Reply via email to