Build failed in Jenkins: Nutch-trunk #1590

2011-08-30 Thread Apache Jenkins Server
See -- [...truncated 986 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/pack

Re: InvocationTargetException with Nutch 2.0 Gora 0.2 and Cassandra 0.8.4

2011-08-30 Thread Alexis
Hi Tom, I'm having the same issue. The two missing jars in the nutch-2.0-dev.job, cassandra-all-0.8.0.jar and hector-core-0.8.0-1.jar, have been manually uploaded for the Gora build to work into gora-cassandra/lib-ext SVN directory, because for some reason I did not get them downloaded through Mav

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-08-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093807#comment-13093807 ] Julien Nioche commented on NUTCH-937: - @Ferdy - good detective work! I like your sugges

[jira] [Commented] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html

2011-08-30 Thread Ferdy (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093789#comment-13093789 ] Ferdy commented on NUTCH-1097: -- It seems the current solution is still not complete, because

[jira] [Commented] (NUTCH-1096) Empty (not null) ContentLength results in failure of fetch

2011-08-30 Thread Ferdy (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093762#comment-13093762 ] Ferdy commented on NUTCH-1096: -- I do not have an example anymore (the corresponding url seems

[jira] [Updated] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-08-30 Thread Ferdy (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy updated NUTCH-937: Attachment: NUTCH-937-v1.patch > When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins > because M

[jira] [Commented] (NUTCH-1052) Multiple deletes of the same URL using SolrClean

2011-08-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093746#comment-13093746 ] Markus Jelsma commented on NUTCH-1052: -- Updating the CrawlDB is a tedious process and

[jira] [Updated] (NUTCH-1073) Rename parameters 'fetcher.threads.per.host.by.ip' and 'fetcher.threads.per.host'

2011-08-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1073: - Patch Info: [Patch Available] > Rename parameters 'fetcher.threads.per.host.by.ip' and > 'fetche

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-08-30 Thread Ferdy (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093705#comment-13093705 ] Ferdy commented on NUTCH-937: - I finally found out what the problem is with the above suggestio

[jira] [Created] (NUTCH-1100) SolrDedup broken

2011-08-30 Thread Markus Jelsma (JIRA)
SolrDedup broken Key: NUTCH-1100 URL: https://issues.apache.org/jira/browse/NUTCH-1100 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.4 Reporter: Markus Jelsma F

[jira] [Closed] (NUTCH-981) Add tests for solr* tasks

2011-08-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-981. --- Resolution: Duplicate Fix Version/s: (was: 2.0) > Add tests for solr* tasks > -

[jira] [Commented] (NUTCH-1095) remove i18n from Nutch site to archive and legacy secton of wiki

2011-08-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093655#comment-13093655 ] Markus Jelsma commented on NUTCH-1095: -- thanks > remove i18n from Nutch site to arch

[jira] [Commented] (NUTCH-1096) Empty (not null) ContentLength results in failure of fetch

2011-08-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093654#comment-13093654 ] Markus Jelsma commented on NUTCH-1096: -- by the way: can you also provide a an url wit

Re: InvocationTargetException with Nutch 2.0 Gora 0.2 and Cassandra 0.8.4

2011-08-30 Thread lewis john mcgibbney
Hi Tom, Well this is strange... No versions of hector in Nutch 2.0/runtime/deploy/nutch-2.0-dev.job or /local/lib however Gora 0.2 uses it a dependency as per /gora-cassandra/lib/hector-core0.8.0-1.jar I'm going to take some time later and try various debug combinations within eclipse to get to