[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930556#comment-13930556 ] Lewis John McGibbney commented on NUTCH-1253: - Talat's patch committed @revision 1576414 in 2.x HEAD. Thanks guys for highlighting the work still to be done here. Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930573#comment-13930573 ] Sebastian Nagel commented on NUTCH-1253: Also committed patch to trunk r1576422. Thanks! Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930577#comment-13930577 ] Lewis John McGibbney commented on NUTCH-1253: - Thanks [~wastl-nagel]. I forgot to add to trunk :| Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930638#comment-13930638 ] Hudson commented on NUTCH-1253: --- SUCCESS: Integrated in Nutch-nutchgora #948 (See [https://builds.apache.org/job/Nutch-nutchgora/948/]) NUTCH-1253 Incompatible neko and xerces versions (lewismc: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1576414) * /nutch/branches/2.x/CHANGES.txt * /nutch/branches/2.x/build.xml Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930648#comment-13930648 ] Hudson commented on NUTCH-1253: --- SUCCESS: Integrated in Nutch-trunk #2560 (See [https://builds.apache.org/job/Nutch-trunk/2560/]) NUTCH-1253 Incompatible neko and xerces versions (snagel: http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1576422) * /nutch/trunk/build.xml Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915563#comment-13915563 ] Yasin Kılınç commented on NUTCH-1253: - I checked and tested patch file into 2.x branch. I used ant eclipse target, then I opened via eclipse IDE. The project compile but eclipse shows warning because of, version of nekohtml is old. I want to attach patch file for this problem. Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915701#comment-13915701 ] Lewis John McGibbney commented on NUTCH-1253: - The version of nekohtml we are using is dependency org=net.sourceforge.nekohtml name=nekohtml rev=1.9.19 conf=*-master/ AFAIK this is most recent. Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915730#comment-13915730 ] Yasin Kılınç commented on NUTCH-1253: - Ok. But there is a line in target eclipse NUTCH_HOME/build.xml like this {code} library path=${basedir}/build/plugins/lib-nekohtml/nekohtml-0.9.5.jar exported=false / {code} Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885396#comment-13885396 ] Hudson commented on NUTCH-1253: --- SUCCESS: Integrated in Nutch-trunk #2511 (See [https://builds.apache.org/job/Nutch-trunk/2511/]) NUTCH-1253 Incompatable versions of neko and xerces (lewismc: http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1562448) * /nutch/trunk/CHANGES.txt * /nutch/trunk/src/plugin/lib-nekohtml/ivy.xml * /nutch/trunk/src/plugin/lib-nekohtml/plugin.xml * /nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java * /nutch/trunk/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java * /nutch/trunk/src/plugin/parse-tika/src/test/org/apache/nutch/tika/TestDOMContentUtils.java Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, nutch1253parsed.html, nutch1253test.html The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879847#comment-13879847 ] Sebastian Nagel commented on NUTCH-1253: +1 tested with a collection of problematic documents: no regressions But why not upgrade to 1.9.19 right now? Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879853#comment-13879853 ] Lewis John McGibbney commented on NUTCH-1253: - I'll post the patches today [~wastl-nagel]. Thanks Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879124#comment-13879124 ] Lewis John McGibbney commented on NUTCH-1253: - Any objections to commit? Further evidence that this patch is working for users who've reported this issue. http://s.apache.org/Sb3 Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852886#comment-13852886 ] Lewis John McGibbney commented on NUTCH-1253: - Some user input that the patch fro 2.x seems to resolve the issue described above. http://www.mail-archive.com/user%40nutch.apache.org/msg11318.html Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Assignee: Lewis John McGibbney Fix For: 2.3, 1.8 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572838#comment-13572838 ] Lewis John McGibbney commented on NUTCH-1253: - It seems that progress (towards a solution) has been made [0] for this issue. I am going to add Dennis' suggestions to the patch and debug this locally. I'll write back here in due course. [0] http://www.mail-archive.com/user@nutch.apache.org/msg08702.html Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Fix For: 1.7, 2.2 Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504531#comment-13504531 ] Tomasz Struczynski commented on NUTCH-1253: --- I don't have time for much analysis, but, the cause is probably this feature: {noformat} parser.setFeature(http://cyberneko.org/html/features/report-errors;, LOG.isTraceEnabled()); {noformat} as this is the only place which uses trace setting and is not surrounded by try... catch. Anyway, the problem is still unresolved (using gora branch). Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248223#comment-13248223 ] Ferdy Galema commented on NUTCH-1253: - Wow this issue keeps getting more and more interesting. I just found out that the exception is CAUSED BY enabling trace logging. That is why it is so confusing. My previous statement about it not affecting nutchgora is not true it seems. It indeed affects both trunk and nutchgora. See the following instructions for reproducing the problem: ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker http://www.iana.org/; ... Version: 5 Status: success(1,0) ... Now what happens when I add the following line to log4j.properties. (Note that the comment by Dennis has a type in this line). log4j.logger.org.apache.nutch.parse.html=TRACE,cmdstdout ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker http://www.iana.org/; ... Version: 5 Status: failed(2,200): org.apache.nutch.parse.ParseException: Unable to successfully parse content ... So this is very obscure. It might be a trace logging statement that triggers the exception. It cannot be something else. Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223504#comment-13223504 ] Lewis John McGibbney commented on NUTCH-1253: - Hi Ferdy, the patches I attached were identical for branch Nutchgora and trunk. I would have assumed if trunk was incorrect then Nutchgora would have shadowed this behaviour. Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222404#comment-13222404 ] Ferdy Galema commented on NUTCH-1253: - It indeed seems broken for trunk. When running it with default options in local mode, every parse simply fails. This is pretty suprising. With the help of Dennis' instructions it indeed becomes more clear what the error is about. Note that nutchgora is not affected. Though at first sight they seem to be using the same library versions. I'm amazed that this error has not been noticed earlier. I cannot speak for users/devs that are on 1.x, so I kindly ask if one of them is able to pick this issue up. (Or least provide some insight). My guess is that they either use tagsoup (instead of neko) or parse-tika for html parsing. Then again if that's the case I don't know why the defaults are now the way they are. Because of this I have not yet tested any of your patches, sorry Lewis. Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220984#comment-13220984 ] Ferdy Galema commented on NUTCH-1253: - I'll give this one a go.. Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216751#comment-13216751 ] Lewis John McGibbney commented on NUTCH-1253: - Anyone had time to try this one out? Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193042#comment-13193042 ] Ferdy Galema commented on NUTCH-1253: - Hi, Looking at the revision history it seems that 3 years ago the library actually WAS updated to 1.9.11, whereafter a few months later is was reverted to 0.9.4 and later on to 0.9.5 but the plugin version remained at 1.9.11. The fact that they bothered to change this version number in the first place is pretty curious in itself, because most plugins simply remain at version 1.0 despite several changes. Not that it matters, but just to indicate that this number has no real purpose. As to nekohtml jar, am not sure why it's still at this specific version, or why it is the preferred setting. Digging up the issues or mailing lists might give you some more info about this. It might be worth looking into tagsoup. I do find your AbstractMethodError curious though. Are you sure it's because of nekohtml and xerces? Can you provide a stracktrace? Incompatible neko and xerces versions - Key: NUTCH-1253 URL: https://issues.apache.org/jira/browse/NUTCH-1253 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Environment: Ubuntu 10.04 Reporter: Dennis Spathis The Nutch 1.4 distribution includes - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- nekohtml) - xercesImpl-2.9.1.jar (under .../runtime/local/lib) These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a catch(Throwable) clause in the getParse method to log the stacktrace.) I found that substituting a later, compatible version of nekohtml (1.9.11) fixes the problem. Curiously, and in support of the above, the nekohtml plugin.xml file in Nutch 1.4 contains the following: plugin id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.11 provider-name=org.cyberneko runtime library name=nekohtml-0.9.5.jar export name=*/ /library /runtime /plugin Note the conflicting version numbers (version tag is 1.9.11 but the specified library is nekohtml-0.9.5.jar). Was the 0.9.5 version included by mistake? Was the intention rather to include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira