> > We have created a new nutch configuration directory.
> > The only difference between this configuration directory
> > and the normal one is the automaton-urlfilter.txt,
> > crawl-urlfilter.txt, and regex-urlfilter.txt files.
> 
> With the new nutch configuration directory, we set the env
> var NUTCH_CONF_DIR.  I know the nutch script is getting this
> value.  I put in some debug statements.  I can see it added
> properly to CLASSPATH.  I also set HADOOP_CONF_DIR.  This
> also does not have any effect.
> 
> I am checking the access times on the regex-urlfilter.txt file.
> The new regex-urlfilter.txt is not accessed.  The process is
> only accessing the regex-urlfilter.txt file in the $NUTCH_HOME/conf
> directory.  It does not appear to be using the NUTCH_CONF_DIR.
> 
> Does anyone have any thoughts or ideas for what we can do to
> get this to work with the NUTCH_CONF_DIR?  Thank you in
> advance for any pointers.

I fixed the problem.

Before modifying NUTCH_CONF_DIR and HADOOP_CONF_DIR, stop
the hadoop processes.  Then, modify the NUTCH_CONF_DIR and
HADOOP_CONF_DIR.  We set them to our special configuration
directory.  Then, start the hadoop processes.  Once the filtering
is done, we stop the hadoop processes.  Then, we unset the
NUTCH_CONF_DIR and HADOOP_CONF_DIR environment variables.
Finally, we restart the hadoop processes.

Everything works like a charm now.

JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Reply via email to