Hi,
The Goal is to write this file before i start a new (nutch/crawl) process
over my web-application. If i just place the File in one of the
application-servers CLASSPATH (i tried $JBOSS_HOME/server/default/lib for
example) nutch does not find the regex-Files. I could experiment with one of
the class loaders explicit - but i first want to keep it simple for me :)
Greetings
Tobias
----- Original Message -----
From: "Dennis Kubes" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, October 26, 2007 4:26 PM
Subject: Re: regex-urlfilter regex-urlnormalizer
Well it depends what you mean by exclude. If you don't want those plugins
to be called by the running jobs then you would need to remove the plugin
from the plugins.includes configuration variable in nutch-site.xml.
From this:
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query...
to this
<property>
<name>plugin.includes</name>
<value>protocol-http|parse-(text|html|js)|index-basic|query...
If you mean you want to change the file name then change the
urlfilter.regex.file variable in nutch-site.xml
<property>
<name>urlfilter.regex.file</name>
<value>regex-urlfilter.txt</value>
<description>Name of file on CLASSPATH containing regular expressions
used by urlfilter-regex (RegexURLFilter) plugin.</description>
</property>
If you mean you want to remove from the build, then change the
plugins.includes and delete the files.
Dennis Kubes
Tobias Wolf wrote:
Hi there,
How can I exclude the files regex-urlfilter.txt and
regex-urlnormalizer.txt? Is there a possibility to overgive an parameter
or setting a propertie where these files are stored so that the plugin
can find them? A solution withous touching the source code would be fine
:)
Greetings
Tobias