Hi,

The Goal is to write this file before i start a new (nutch/crawl) process over my web-application. If i just place the File in one of the application-servers CLASSPATH (i tried $JBOSS_HOME/server/default/lib for example) nutch does not find the regex-Files. I could experiment with one of the class loaders explicit - but i first want to keep it simple for me :)


Greetings

Tobias


----- Original Message ----- From: "Dennis Kubes" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, October 26, 2007 4:26 PM
Subject: Re: regex-urlfilter regex-urlnormalizer


Well it depends what you mean by exclude. If you don't want those plugins to be called by the running jobs then you would need to remove the plugin from the plugins.includes configuration variable in nutch-site.xml.

From this:

<property>
<name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query...

to this

<property>
<name>plugin.includes</name> <value>protocol-http|parse-(text|html|js)|index-basic|query...

If you mean you want to change the file name then change the urlfilter.regex.file variable in nutch-site.xml

<property>
  <name>urlfilter.regex.file</name>
  <value>regex-urlfilter.txt</value>
  <description>Name of file on CLASSPATH containing regular expressions
  used by urlfilter-regex (RegexURLFilter) plugin.</description>
</property>

If you mean you want to remove from the build, then change the plugins.includes and delete the files.

Dennis Kubes


Tobias Wolf wrote:
Hi there,

How can I exclude the files regex-urlfilter.txt and regex-urlnormalizer.txt? Is there a possibility to overgive an parameter or setting a propertie where these files are stored so that the plugin can find them? A solution withous touching the source code would be fine :)


Greetings

Tobias

Reply via email to