[jira] [Created] (NUTCH-1207) ParserChecker to output signature

2011-11-21 Thread Markus Jelsma (Created) (JIRA)
ParserChecker to output signature - Key: NUTCH-1207 URL: https://issues.apache.org/jira/browse/NUTCH-1207 Project: Nutch Issue Type: Improvement Components: parser Reporter: Markus

[jira] [Resolved] (NUTCH-1207) ParserChecker to output signature

2011-11-21 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1207. -- Resolution: Fixed Committed for 1.5 in rev. 1204492. ParserChecker to output

[jira] [Commented] (NUTCH-1206) tika parser of nutch 1.3 is failing to prcess pdfs

2011-11-21 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154180#comment-13154180 ] Markus Jelsma commented on NUTCH-1206: -- Have you tried the Nutch trunk or the most

[jira] [Updated] (NUTCH-1104) Port issues from trunk NutchGora branch

2011-11-21 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1104: - Description: Umbrella issue for tracking issues that should be ported from 1.x trunk to the

[jira] [Commented] (NUTCH-1207) ParserChecker to output signature

2011-11-21 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154190#comment-13154190 ] Hudson commented on NUTCH-1207: --- Integrated in nutch-trunk-maven #33 (See

[Nutch Wiki] Trivial Update of NutchMavenSupport by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The NutchMavenSupport page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchMavenSupport?action=diffrev1=1rev2=2 Starting with Nutch 1.3 and with Nutch 2.0, you

[Nutch Wiki] Trivial Update of InternalDocumentation by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The InternalDocumentation page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/InternalDocumentation?action=diffrev1=8rev2=9 * [[WebDB]] *

[Nutch Wiki] Trivial Update of CrawlDatumStates by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CrawlDatumStates page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/CrawlDatumStates?action=diffrev1=2rev2=3 Nutch 1.x maintains state of pages in CrawlDb,

[Nutch Wiki] Trivial Update of CrawlDatumStates by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CrawlDatumStates page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/CrawlDatumStates?action=diffrev1=3rev2=4 Nutch 1.x maintains state of pages in CrawlDb,

Enabling Nutch wiki override of ACLs for Attachments

2011-11-21 Thread Lewis John Mcgibbney
Hi Guys, There has been some discussion recently about broken links to attachments on the Nutch wiki. The reason for this can be seen here [1]. I am not aware of the Nutch wiki suffering from Spam attacks, however this is not to say that it might not happen. Therefore is it worth re-enabling

[Nutch Wiki] Trivial Update of CrawlDatumStates by MarkusJelsma

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CrawlDatumStates page has been changed by MarkusJelsma: http://wiki.apache.org/nutch/CrawlDatumStates?action=diffrev1=4rev2=5 Comment: added scoreupdater *Injector - to populate

Re: Enabling Nutch wiki override of ACLs for Attachments

2011-11-21 Thread Markus Jelsma
Spam happens once in a while. Can uploading of attachments be restricted to committers? On Monday 21 November 2011 16:40:11 Lewis John Mcgibbney wrote: Hi Guys, There has been some discussion recently about broken links to attachments on the Nutch wiki. The reason for this can be seen here

Re: Enabling Nutch wiki override of ACLs for Attachments

2011-11-21 Thread Lewis John Mcgibbney
I don't think this is possible. Setting can either be configured such that anyone can edit but not upload attachments or else ONLY an AdminGroup or ContributersGroup can add material. This requires someone to maintain the respective configuration files in our wiki instance... which is not a huge

[Nutch Wiki] Trivial Update of InternalDocumentation by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The InternalDocumentation page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/InternalDocumentation?action=diffrev1=9rev2=10 * NutchDistributedFileSystem *

[Nutch Wiki] Trivial Update of RedirectHandling by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The RedirectHandling page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/RedirectHandling New page: = Redirect handling in Nutch = This page is in construction but

[Nutch Wiki] Trivial Update of NutchHadoopTutorial by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The NutchHadoopTutorial page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchHadoopTutorial?action=diffrev1=35rev2=36 This document does not go into the

[Nutch Wiki] Trivial Update of NutchResources by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The NutchResources page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchResources?action=diffrev1=1rev2=2 *

[Nutch Wiki] Trivial Update of RedirectHandling by LewisJohnMcgibbney

2011-11-21 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The RedirectHandling page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/RedirectHandling?action=diffrev1=1rev2=2 = Redirect handling in Nutch = This page is in

Re: Signature == null ?

2011-11-21 Thread Markus Jelsma
I can't dump the DB right now since it's far too large for a single node but from log output i can see that these records without signature were not parsable with Tika such as RSS feeds, bad PDF 's or timed out parses. On 15/11/2011 20:33, Markus Jelsma wrote: It's back again! Last try if

Dependency Injection

2011-11-21 Thread PJ Herring
Hey, So I am admittedly a noob with Nutch, but have spent some time digging through the source code. I am just curious if anyone has talked about, in future developments of Nutch, replacing the whole way we register plugins? I ask because I am using Nutch on a project with Maven. At the moment

Re: Dependency Injection

2011-11-21 Thread Mattmann, Chris A (388J)
Hey PJ, You aren't being an ass at all. You're asking an important question, and something I've been interested in for a while. Here are some relevant threads to take a look at: http://wiki.apache.org/nutch/Nutch2Architecture

Jenkins build is back to normal : Nutch-trunk #1671

2011-11-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/1671/