You could change the nutch script in bin to put your directory first in
the classpath and then it should work. By default nutch puts conf
higher up in the classpath.
Dennis Kubes
Tobias Wolf wrote:
Hi,
The Goal is to write this file before i start a new (nutch/crawl)
process over my web-application. If i just place the File in one of the
application-servers CLASSPATH (i tried $JBOSS_HOME/server/default/lib
for example) nutch does not find the regex-Files. I could experiment
with one of the class loaders explicit - but i first want to keep it
simple for me :)
Greetings
Tobias
----- Original Message ----- From: "Dennis Kubes" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, October 26, 2007 4:26 PM
Subject: Re: regex-urlfilter regex-urlnormalizer
Well it depends what you mean by exclude. If you don't want those
plugins to be called by the running jobs then you would need to remove
the plugin from the plugins.includes configuration variable in
nutch-site.xml.
From this:
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query...
to this
<property>
<name>plugin.includes</name>
<value>protocol-http|parse-(text|html|js)|index-basic|query...
If you mean you want to change the file name then change the
urlfilter.regex.file variable in nutch-site.xml
<property>
<name>urlfilter.regex.file</name>
<value>regex-urlfilter.txt</value>
<description>Name of file on CLASSPATH containing regular expressions
used by urlfilter-regex (RegexURLFilter) plugin.</description>
</property>
If you mean you want to remove from the build, then change the
plugins.includes and delete the files.
Dennis Kubes
Tobias Wolf wrote:
Hi there,
How can I exclude the files regex-urlfilter.txt and
regex-urlnormalizer.txt? Is there a possibility to overgive an
parameter or setting a propertie where these files are stored so that
the plugin can find them? A solution withous touching the source code
would be fine :)
Greetings
Tobias