Andrzej, Thankx -- This works!!!
-----Original Message----- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Monday, July 04, 2005 11:55 AM To: nutch-dev@lucene.apache.org Subject: Re: both html parser have bug with javascript Chirag Chaman wrote: > Andrzej, > > Thank you -- and here we were going nuts thinking the problem might > have been with the plugin! > Would it be possible to post the patch file of the changes once you > have made them as our version of Nutch is different from SVN. I suggest keeping around a vanilla version, and porting diffs to your tree, otherwise you will end up with more and more out-of-sync version... The change itself is trivial (available as 'svn diff -r 179640 DOMContentUtils.java'): Index: DOMContentUtils.java =================================================================== --- DOMContentUtils.java (revision 179640) +++ DOMContentUtils.java (working copy) @@ -102,25 +102,9 @@ boolean abortOnNestedAnchors, int anchorDepth) { if ("script".equalsIgnoreCase(node.getNodeName())) { - Node n = node.getAttributes().getNamedItem("language"); - if (n != null) { - String text = n.getNodeValue(); - sb.append(text); - } return false; } if ("style".equalsIgnoreCase(node.getNodeName())) { - Node n = node.getAttributes().getNamedItem("rel"); - if (n != null) { - String text = n.getNodeValue(); - sb.append(text); - } - n = node.getAttributes().getNamedItem("type"); - if (n != null) { - String text = n.getNodeValue(); - if (sb.length() > 0) sb.append(", "); - sb.append(text); - } return false; } if (abortOnNestedAnchors && "a".equalsIgnoreCase(node.getNodeName())) { > Thankx again. You're welcome. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers