Hi Claudio, Can't this be achieved using the subcollection plugin?
BTW contributions are definitely encouraged but better to open a JIRA issue and include a patch (see http://wiki.apache.org/nutch/HowToContribute) J. -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com On 26 November 2010 11:30, Claudio Martella <[email protected]>wrote: > Hello list, > > > in my current scenario we have multiple crawls that crawl different > intranet sites: public site, wiki, fileserver, intranet etc. > After the crawling jobs have finished, we send the indexes to our Solr > server. The problem is that after the fact, it's impossible to decide > which url belongs to what crawl job. That can be useful to define better > the search within a subset of the items in the index, or to issue a > selective delete when updating the crawls. > > For this reason i wrote this small plugin which adds a property > index.static.fields to the nutch-site config. > > In my example it would be used like this: > > <property> > <name>index.static.fields</name> > <value>crawl:fileserver</value> > </property> > > defining a lucene field "crawl" with content "fileserver", but it > supports multiple fields in a comma-separated list. > > I attach the plugin to be unpacked in src/plugin/, I hope it helps > somebody. > > Best, > Claudio > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax +39 0471 068 129 > [email protected] http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13 > of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we > process your personal data in order to fulfil contractual and fiscal > obligations and also to send you information regarding our services and > events. Your personal data are processed with and without electronic means > and by respecting data subjects' rights, fundamental freedoms and dignity, > particularly with regard to confidentiality, personal identity and the right > to personal data protection. At any time and without formalities you can > write an e-mail to [email protected] in order to object the processing of > your personal data for the purpose of sending advertising materials and also > to exercise the right to access personal data and other rights referred to > in Section 7 of Decree 196/2003. The data controller is TIS Techno > Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the > complete information on the web site www.tis.bz.it. > >

