[jira] [Created] (NUTCH-2923) Add Job Id in Job Failure messages

2021-12-29 Thread Prakhar Chaube (Jira)
Prakhar Chaube created NUTCH-2923: - Summary: Add Job Id in Job Failure messages Key: NUTCH-2923 URL: https://issues.apache.org/jira/browse/NUTCH-2923 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2856) protocol-smb plugin is outdated

2021-12-29 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466704#comment-17466704 ] Lewis John McGibbney commented on NUTCH-2856: - I'll take this one on. I intend to use

[jira] [Assigned] (NUTCH-2856) protocol-smb plugin is outdated

2021-12-29 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2856: --- Assignee: Lewis John McGibbney > protocol-smb plugin is outdated >

[jira] [Commented] (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implm

2021-12-29 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466703#comment-17466703 ] Lewis John McGibbney commented on NUTCH-427: An old thread but I found an alternative SMB

[jira] [Commented] (NUTCH-2919) Upgrade to Tika 2.2.0

2021-12-29 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466696#comment-17466696 ] ASF GitHub Bot commented on NUTCH-2919: --- lewismc commented on pull request #717: URL:

[GitHub] [nutch] lewismc commented on pull request #717: NUTCH-2919 Upgrade to Tika 2.2.0

2021-12-29 Thread GitBox
lewismc commented on pull request #717: URL: https://github.com/apache/nutch/pull/717#issuecomment-1002887610 To achieve a successful CI build I also had to disable the `ant javadoc` target. The problem here is definitely Any23. This can be fixed in the next Any23 release by setting the

[jira] [Commented] (NUTCH-2919) Upgrade to Tika 2.2.0

2021-12-29 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466692#comment-17466692 ] ASF GitHub Bot commented on NUTCH-2919: --- lewismc commented on pull request #717: URL:

[GitHub] [nutch] lewismc commented on pull request #717: NUTCH-2919 Upgrade to Tika 2.2.0

2021-12-29 Thread GitBox
lewismc commented on pull request #717: URL: https://github.com/apache/nutch/pull/717#issuecomment-1002883562 CI was failing with the tests to the `any23` plugin. This makes some kind of sense as the Tika version included in Any23 may be in conflict with Tika 2.2.0. I've temporarily

[jira] [Commented] (NUTCH-2919) Upgrade to Tika 2.2.0

2021-12-29 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466691#comment-17466691 ] ASF GitHub Bot commented on NUTCH-2919: --- lewismc commented on pull request #717: URL:

[GitHub] [nutch] lewismc commented on pull request #717: NUTCH-2919 Upgrade to Tika 2.2.0

2021-12-29 Thread GitBox
lewismc commented on pull request #717: URL: https://github.com/apache/nutch/pull/717#issuecomment-1002883095 I was getting a local failure on [parse-tika's

Nutch metrics documentation request for review/feedback

2021-12-29 Thread lewis john mcgibbney
Hi dev@, *What?* I've been chipping away at some documentation which would provide a one-stop-shop for understanding Nutch metrics. My first pass is available at https://cwiki.apache.org/confluence/display/NUTCH/Metrics This relates to the recent JIRA issue I filed about establishing a Nutch

!! Join the #nutch Slack channel !!

2021-12-29 Thread lewis john mcgibbney
Hi user@, dev@, I took the liberty of setting up a #nutch channel for our community to communicate in a lower latency manner. First join the-asf.slack.com Slack workspace https://infra.apache.org/slack.html Then simply join the #nutch channel. See you there :) Thanks lewismc --

Re: Break out individual functions from IndexerJob -deleteGone flag?

2021-12-29 Thread Lewis John McGibbney
I also should note that the -deleteGone setting cannot be overriden via nutch-site.xml whereas similar settings do have equivalent configuration properties in nutch-default.xml https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1361-L1373 On 2021/12/29 17:08:20 lewis john

Break out individual functions from IndexerJob -deleteGone flag?

2021-12-29 Thread lewis john mcgibbney
Hi dev@, Reading the code for the IndexerJob -deleteGone flag [0] you can clearly see that we bundle deletion requests for 404s, redirects and duplicates into one option. This of course has pros and cons. Does anyone wish to share their opinion on how this is implemented? My opinion is that 1. The