Hi Sol,

of course, you could provide a separate package for every crawl.

In local mode, it's easier to point NUTCH_CONF_DIR to the right directory,
could be even a hierarchy of folders to search for config files separated
by ':' (config files are actually searched on the Java classpath)
E.g., one could define a shell function for Nutch, e.g.
 nutch () {
    NUTCH_LOG_DIR=./logs NUTCH_CONF_DIR=./conf:$NUTCH_HOME/conf 
$NUTCH_HOME/bin/nutch "$@"
 }

Every config file in ./conf/ is taken first (usually nutch-site.xml) before 
those
from $NUTCH_HOME/conf/.

For your specific use case, see also:

<property>
  <name>urlfilter.regex.file</name>
  <value>regex-urlfilter.txt</value>
  <description>Name of file on CLASSPATH containing regular expressions
  used by urlfilter-regex (RegexURLFilter) plugin.</description>
</property>

This would also work in cluster mode as you can set/overwrite properties
from command-line when launching Nutch.

Sebastian

On 11/08/2017 03:55 PM, Sol Lederman wrote:
> Hi,
> 
> I need to have different regex-urlfilter.txt files for different crawls.
> Since the file lives in conf and I don't see a way to point nutch inject to
> a different file or a different conf directory, I assume I should just swap
> in a different regex-urlfilter.txt file every time I do a crawl.
> 
> Does that sound right?
> 
> Thanks.
> 
> Sol
> 

Reply via email to