Hi nutch community,

as you had may be notice I posted a patch for the plugin system to the nutch bug tracking system.
Sourceforge does not accept my file since it is may be to large with 1,6 MB, so I had uploaded to
http://www.media-style.com/gfx/nutch/nutch-plugin-patch.zip
As sourceforge now 3 times already mentioned, sorry for that,
the description can be found here:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=954964&group_id=59548


Doug can you please close the bug again, since i have no rights to do that. Thanks!

The code was ready since some months, but sorry I didn't found the time to do the last small changes.

For people remembering the conversation some months ago, the patch comes with:
+ the required ant build script update
+ the first standard plugin for that contains a set of content extractors.
+ HTML content extractor respecting the robot.txt
+ strongly improves of the java doc (but there is still room for improvement since I'm no native speaker)

To install the patch copy the dom4j.jar to $HOME/nutch/libs.
Assign the nutch_plugin_patch.txt to $HOME/nutch
copy "nutch-extractors" to $HOME (so in the same level as "nutch")

Use "ant test" or "ant tar".

It would be great if a native speaker can assist me to write a "how to write a plugin" tutorial until next week.
If there is anything I can do to help bringing this patch to the cvs head - let me know.

Greetings,
Stefan


---------------------------------------------------------------
open technology: http://www.media-style.com
open source: http://www.weta-group.net
open discussion: http://www.text-mining.org

Reply via email to