One option could be to use NUTCH_CONF_DIR environment variable. By default
nutch uses {nutch home}/conf directory to locate configuration files but you
can change it by using this environment variable.

We usually create different configuration directories for different type of
crawls and set NUTCH_CONF_DIR environment depending on the type of crawl
configuration we want to run.





-----Original Message-----
From: Felix Zimmermann [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 26, 2008 7:50 AM
To: [email protected]
Subject: individual crawl-urlfilter.txt and nutch-site.xml for different
crawls?

Hi,

 

I´d like to change the crawl-urlfilter.txt and nutch-site.xml depending on
the crawl. At the moment, I only use the “nutch crawl” command in a little
self-made .sh-script. In future, I´ll be in need of the other commands like
“nutch inject, fetch, …” too.

 

I think of something like “nutch crawl …. –urlfilter my_url_filter_file
–conffile my_nutch_site_xml_file”.

 

Am I right to make changes in the
org/apache/nutch/util/NutchConfiguration.java? If yes, how can I pass the
arguments?

 

If not, where do I have to modify the code to achieve this? I am not very
familiar with Java but I think I understand the code If I know where to go.

 

Thanks for every help!

 

Felix.

 


Reply via email to