Stefan,
Thanks for putting this patch together! I was able to apply it.
Some quick notes:
1. The dom4j jar seemed corrupted. I downloaded a fresh copy and then the build worked fine.
2. I'd prefer to make the plugins source tree a part of the core source tree, under a plugins directory rather than mixed in with the core sources. That way there's just one thing to check out from CVS, all the paths are relative, etc. I'm happy to make this change myself. What do others think?
3. Are there deep reasons why you use dom4j rather than Java's built-in XML support? Nutch already uses the built-in support. In the interests of minimizing dependencies, it would be best to use just a single XML parser, and one that's included with the JVM is certainly handy. You use the dom4j-specific 'List element.getElements("...")' instead of the more verbose w3c version 'NodeList getElementsByTagName("...")', but it doesn't look like you depend on any advanced dom4j-specific features. Is that right? If so, perhaps we should consider moving away from dom4j.
4. You use tabs for indenting, which show up in Emacs as eight spaces. My guess is that in your editor they show up as less than eight spaces, probably four or two. Please try to fix this in your editor, as the standard for Java is that a tab should be used to represent eight spaces:
http://java.sun.com/docs/codeconv/html/CodeConventions.doc3.html#262
What do others think of this patch?
Doug
Stefan Groschupf wrote:
Well, working with command line and unix tools isn't that intuitive. But as we all know since 20 years unix is the os of the future. ;-)
There is now a patch in the unified format online http://www.media-style.com/gfx/nutch/nutch-plugin-patch.zip
You can patch it with: $ patch -p1 < patch18_05_04.txt
+ The patch contains dom4j as binary as well. + do not patch the tutorial with the old text.
copy the "nutch-extractors" plugin to the same level as "nutch" $HOME/nutch $HOME/nutch-extractors
cd nutch ant test ant tar
I really hope we can continue from here. Sorry. Stefan
Am 18.05.2004 um 00:38 schrieb Doug Cutting:
Stefan Groschupf wrote:
I tried to generate the patch via command line again: cvs -z3 -d:ext:[EMAIL PROTECTED]:/cvsroot/nutch diff -u nutch >> patch.txt The result was just garbage since a lot of classes that are new are not in the patch file.
Try 'diff -Nu'.
Doug
------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
--------------------------------------------------------------- open technology: http://www.media-style.com open source: http://www.weta-group.net open discussion: http://www.text-mining.org
------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
