[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-03-11 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930556#comment-13930556
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

Talat's patch committed @revision 1576414 in 2.x HEAD.
Thanks guys for highlighting the work still to be done here.

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, 
 NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, 
 NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-03-11 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930573#comment-13930573
 ] 

Sebastian Nagel commented on NUTCH-1253:


Also committed patch to trunk r1576422. Thanks!

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, 
 NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, 
 NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-03-11 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930577#comment-13930577
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

Thanks [~wastl-nagel]. I forgot to add to trunk :|

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, 
 NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, 
 NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-03-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930638#comment-13930638
 ] 

Hudson commented on NUTCH-1253:
---

SUCCESS: Integrated in Nutch-nutchgora #948 (See 
[https://builds.apache.org/job/Nutch-nutchgora/948/])
NUTCH-1253 Incompatible neko and xerces versions (lewismc: 
http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1576414)
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/build.xml


 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, 
 NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, 
 NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-03-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930648#comment-13930648
 ] 

Hudson commented on NUTCH-1253:
---

SUCCESS: Integrated in Nutch-trunk #2560 (See 
[https://builds.apache.org/job/Nutch-trunk/2560/])
NUTCH-1253 Incompatible neko and xerces versions (snagel: 
http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1576422)
* /nutch/trunk/build.xml


 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-eclipse.patch, NUTCH-1253-2.x-v2.patch, 
 NUTCH-1253-nutchgora.patch, NUTCH-1253-trunk.patch, 
 NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-02-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915563#comment-13915563
 ] 

Yasin Kılınç commented on NUTCH-1253:
-

I checked and tested patch file into 2.x branch. I used ant eclipse target, 
then I opened via eclipse IDE. The project compile but eclipse shows warning 
because of, version of nekohtml is old. I want to attach patch file for this 
problem.

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, 
 NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-02-28 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915701#comment-13915701
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

The version of nekohtml we are using is 

dependency org=net.sourceforge.nekohtml name=nekohtml rev=1.9.19 
conf=*-master/

AFAIK this is most recent.

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, 
 NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-02-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915730#comment-13915730
 ] 

Yasin Kılınç commented on NUTCH-1253:
-

Ok. But there is a line in target eclipse NUTCH_HOME/build.xml like this 
{code}
library path=${basedir}/build/plugins/lib-nekohtml/nekohtml-0.9.5.jar  
exported=false /
{code}

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, 
 NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885396#comment-13885396
 ] 

Hudson commented on NUTCH-1253:
---

SUCCESS: Integrated in Nutch-trunk #2511 (See 
[https://builds.apache.org/job/Nutch-trunk/2511/])
NUTCH-1253 Incompatable versions of neko and xerces (lewismc: 
http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1562448)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/plugin/lib-nekohtml/ivy.xml
* /nutch/trunk/src/plugin/lib-nekohtml/plugin.xml
* 
/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
* 
/nutch/trunk/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
* 
/nutch/trunk/src/plugin/parse-tika/src/test/org/apache/nutch/tika/TestDOMContentUtils.java


 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, 
 NUTCH-1253-trunk.patch, NUTCH-1253-trunk.v2.patch, NUTCH-1253.patch, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt, 
 nutch1253parsed.html, nutch1253test.html


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-01-23 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879847#comment-13879847
 ] 

Sebastian Nagel commented on NUTCH-1253:


+1 tested with a collection of problematic documents: no regressions
But why not upgrade to 1.9.19 right now?

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, 
 NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-01-23 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879853#comment-13879853
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

I'll post the patches today [~wastl-nagel]. Thanks

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, 
 NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-01-22 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879124#comment-13879124
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

Any objections to commit?
Further evidence that this patch is working for users who've reported this 
issue.
http://s.apache.org/Sb3

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, 
 NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2013-12-19 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852886#comment-13852886
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

Some user input that the patch fro 2.x seems to resolve the issue described 
above.
http://www.mail-archive.com/user%40nutch.apache.org/msg11318.html

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
Assignee: Lewis John McGibbney
 Fix For: 2.3, 1.8

 Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, 
 NUTCH-1253.patch, TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2013-02-06 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572838#comment-13572838
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

It seems that progress (towards a solution) has been made [0] for this issue. I 
am going to add Dennis' suggestions to the patch and debug this locally. I'll 
write back here in due course.

[0] http://www.mail-archive.com/user@nutch.apache.org/msg08702.html

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-11-27 Thread Tomasz Struczynski (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504531#comment-13504531
 ] 

Tomasz Struczynski commented on NUTCH-1253:
---

I don't have time for much analysis, but, the cause is probably this feature:

{noformat}
parser.setFeature(http://cyberneko.org/html/features/report-errors;,
  LOG.isTraceEnabled());
{noformat}

as this is the only place which uses trace setting and is not surrounded by 
try... catch.

Anyway, the problem is still unresolved (using gora branch).

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
 Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-04-06 Thread Ferdy Galema (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248223#comment-13248223
 ] 

Ferdy Galema commented on NUTCH-1253:
-

Wow this issue keeps getting more and more interesting. I just found out that 
the exception is CAUSED BY enabling trace logging. That is why it is so 
confusing. My previous statement about it not affecting nutchgora is not true 
it seems. It indeed affects both trunk and nutchgora. See the following 
instructions for reproducing the problem:


ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker 
http://www.iana.org/;
...
Version: 5
Status: success(1,0)
...


Now what happens when I add the following line to log4j.properties. (Note that 
the comment by Dennis has a type in this line).
log4j.logger.org.apache.nutch.parse.html=TRACE,cmdstdout

ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker 
http://www.iana.org/;
...
Version: 5
Status: failed(2,200): org.apache.nutch.parse.ParseException: Unable to 
successfully parse content
...

So this is very obscure. It might be a trace logging statement that triggers 
the exception. It cannot be something else.

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
 Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-03-06 Thread Lewis John McGibbney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223504#comment-13223504
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

Hi Ferdy, the patches I attached were identical for branch Nutchgora and trunk. 
I would have assumed if trunk was incorrect then Nutchgora would have shadowed 
this behaviour. 

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
 Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-03-05 Thread Ferdy Galema (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222404#comment-13222404
 ] 

Ferdy Galema commented on NUTCH-1253:
-

It indeed seems broken for trunk. When running it with default options in local 
mode, every parse simply fails. This is pretty suprising. With the help of 
Dennis' instructions it indeed becomes more clear what the error is about. Note 
that nutchgora is not affected. Though at first sight they seem to be using the 
same library versions.

I'm amazed that this error has not been noticed earlier. I cannot speak for 
users/devs that are on 1.x, so I kindly ask if one of them is able to pick this 
issue up. (Or least provide some insight). My guess is that they either use 
tagsoup (instead of neko) or parse-tika for html parsing. Then again if that's 
the case I don't know why the defaults are now the way they are. Because of 
this I have not yet tested any of your patches, sorry Lewis.

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
 Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-03-02 Thread Ferdy Galema (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220984#comment-13220984
 ] 

Ferdy Galema commented on NUTCH-1253:
-

I'll give this one a go..

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
 Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-02-26 Thread Lewis John McGibbney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216751#comment-13216751
 ] 

Lewis John McGibbney commented on NUTCH-1253:
-

Anyone had time to try this one out?

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis
 Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch


 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2012-01-25 Thread Ferdy Galema (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193042#comment-13193042
 ] 

Ferdy Galema commented on NUTCH-1253:
-

Hi,

Looking at the revision history it seems that 3 years ago the library actually 
WAS updated to 1.9.11, whereafter a few months later is was reverted to 0.9.4  
and later on to 0.9.5 but the plugin version remained at 1.9.11. The fact that 
they bothered to change this version number in the first place is pretty 
curious in itself, because most plugins simply remain at version 1.0 despite 
several changes. Not that it matters, but just to indicate that this number has 
no real purpose. As to nekohtml jar, am not sure why it's still at this 
specific version, or why it is the preferred setting. Digging up the issues or 
mailing lists might give you some more info about this. It might be worth 
looking into tagsoup.

I do find your AbstractMethodError curious though. Are you sure it's because of 
nekohtml and xerces? Can you provide a stracktrace?

 Incompatible neko and xerces versions
 -

 Key: NUTCH-1253
 URL: https://issues.apache.org/jira/browse/NUTCH-1253
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Ubuntu 10.04
Reporter: Dennis Spathis

 The Nutch 1.4 distribution includes
  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
 nekohtml)
  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
 These two JARs appear to be incompatible versions. When the HtmlParser 
 (configured to use neko) is invoked during a local-mode crawl, the parse 
 fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
 rebuild the HtmlParser plugin and add a
 catch(Throwable) clause in the getParse method to log the stacktrace.)
 I found that substituting a later, compatible version of nekohtml (1.9.11)
 fixes the problem.
 Curiously, and in support of the above, the nekohtml plugin.xml file in
 Nutch 1.4 contains the following:
 plugin
id=lib-nekohtml
name=CyberNeko HTML Parser
version=1.9.11
provider-name=org.cyberneko
runtime
library name=nekohtml-0.9.5.jar
export name=*/
/library
/runtime
 /plugin
 Note the conflicting version numbers (version tag is 1.9.11 but the
 specified library is nekohtml-0.9.5.jar).
 Was the 0.9.5 version included by mistake? Was the intention rather to
 include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira