All,
I was wondering if I can force a constant value into one of the fields defined in Nutch's schema. Here is the scenario. I have two sub-sites that I would like to crawl separately. Something like http://parentsite.mydomain.com/site1/index.php http://parentsite.mydomain.com/site12/index.php I am sending the results of the crawl to the same Solr/Lucene index. The Index is used by a drupal website to provide search results to the user. The user has checkboxes on the drupal website to search for either Site1 search results or site 2 search results. Here is the problem. There is no way for me to differentiate between site1 and site2 documents in the index. One of the Schema fields generated by the Nutch document is called 'site'. Ideally this should have been a good field for me to use to differentiate between the documents in the index. But for the sub-sites I am crawling the 'Site' field value will be set to "parentsite.mydomain.com" because both the urls have the same site value. That is reason for me ask this question. Can I set the value of 'Site" field to "Site1" for Site1 url and "Site2" for site 2 url crawls. Hope I have explained the scenario clearly. If what I am thinking is not possible then can I achieve my ultimate objective in any other way. Thanks so much in advance Raj

