Thank you Julien. I was trying to look fora some documentation on how to set this plugin up. Can anybody point me to a link where the setup is documented.
I appreciate your help. Raj -----Original Message----- From: Julien Nioche [mailto:[email protected]] Sent: Friday, August 27, 2010 4:42 AM To: [email protected] Subject: Re: Setting the Nutchschema field to a constant value Have a look at the subcollection plugin - I haven't used it myself but I think it does what you need Julien -- DigitalPebble Ltd Open Source Solutions for Text Engineering http://www.digitalpebble.com On 26 August 2010 19:03, Nemani, Raj <[email protected]> wrote: > All, > > > > I was wondering if I can force a constant value into one of the fields > defined in Nutch's schema. Here is the scenario. > > > > I have two sub-sites that I would like to crawl separately. Something > like > > > > http://parentsite.mydomain.com/site1/index.php > > > > http://parentsite.mydomain.com/site12/index.php > > > > > > I am sending the results of the crawl to the same Solr/Lucene index. > The Index is used by a drupal website to provide search results to the > user. > > > > The user has checkboxes on the drupal website to search for either Site1 > search results or site 2 search results. > > > > Here is the problem. There is no way for me to differentiate between > site1 and site2 documents in the index. > > > > One of the Schema fields generated by the Nutch document is called > 'site'. Ideally this should have been a good field for me to use to > differentiate > > between the documents in the index. But for the sub-sites I am crawling > the 'Site' field value will be set to "parentsite.mydomain.com" because > both the urls have the same site value. > > > > That is reason for me ask this question. Can I set the value of 'Site" > field to "Site1" for Site1 url and "Site2" for site 2 url crawls. > > > > Hope I have explained the scenario clearly. If what I am thinking is > not possible then can I achieve my ultimate objective in any other way. > > > > Thanks so much in advance > > Raj > > > > > >

