[GitHub] [nutch] tballison commented on pull request #776: NUTCH-2959 -- upgrade Tika to 2.9.0

2023-09-19 Thread via GitHub
tballison commented on PR #776: URL: https://github.com/apache/nutch/pull/776#issuecomment-1726191372 :sob: Y, let's hold off until Hadoop 3.4.0 is released. Thank you, again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Commented] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode

2023-09-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766832#comment-17766832 ] Tim Allison commented on NUTCH-2937: As [~snagel] pointed out on the PR for NUTCH-2959 -- looks like

[GitHub] [nutch] tballison commented on pull request #776: NUTCH-2959 -- upgrade Tika to 2.9.0

2023-09-19 Thread via GitHub
tballison commented on PR #776: URL: https://github.com/apache/nutch/pull/776#issuecomment-1725801990 > Btw., I've just rediscovered that using Tika in (pseudo)distributed mode is broken since the upgrade to Tika 2.3.0, see [NUTCH-2937](https://issues.apache.org/jira/browse/NUTCH-2937).

[GitHub] [nutch] sebastian-nagel commented on pull request #776: NUTCH-2959 -- upgrade Tika to 2.9.0

2023-09-19 Thread via GitHub
sebastian-nagel commented on PR #776: URL: https://github.com/apache/nutch/pull/776#issuecomment-1725795918 > Can we exclude commons-io from hadoop and then add it as a dependency in the main ivy.xml? When running in distributed or pseudo-distributed mode, commons-io 2.8.0 is first

[GitHub] [nutch] tballison commented on pull request #776: NUTCH-2959 -- upgrade Tika to 2.9.0

2023-09-19 Thread via GitHub
tballison commented on PR #776: URL: https://github.com/apache/nutch/pull/776#issuecomment-1725746397 I'm getting a ConnectException when I try to run nutch-test-single-node-cluster. On hadoop startup, I see: ``` 2023-09-19 10:25:15,186 INFO util.GSet: VM type = 64-bit

[GitHub] [nutch] tballison commented on pull request #776: NUTCH-2959 -- upgrade Tika to 2.9.0

2023-09-19 Thread via GitHub
tballison commented on PR #776: URL: https://github.com/apache/nutch/pull/776#issuecomment-1725714860 I'm guessing that commit won't work if distributed hadoop is bringing its own jars (as you said!). Does hadoop do any custom classloading so that the job jars are isolated from the

[GitHub] [nutch] tballison commented on pull request #776: NUTCH-2959 -- upgrade Tika to 2.9.0

2023-09-19 Thread via GitHub
tballison commented on PR #776: URL: https://github.com/apache/nutch/pull/776#issuecomment-1725611372 I haven't worked with ant in a while. According to `ant dependencytree`, it looks like we don't have to exclude commons-io everywhere -- placing it in the main ivy.xml has the same effect

[GitHub] [nutch] tballison commented on pull request #776: NUTCH-2959 -- upgrade Tika to 2.9.0

2023-09-19 Thread via GitHub
tballison commented on PR #776: URL: https://github.com/apache/nutch/pull/776#issuecomment-1725604218 Weird, I just pushed a commit bumping commons-io on my NUTCH-2959 branch, and it isn't showing up in the PR... I'll wait a bit... Maybe github is out for coffee? -- This is an

[jira] [Created] (NUTCH-3003) Consider integration testing in a Dockerized mini-hadoop cluster via testcontainers?

2023-09-19 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3003: -- Summary: Consider integration testing in a Dockerized mini-hadoop cluster via testcontainers? Key: NUTCH-3003 URL: https://issues.apache.org/jira/browse/NUTCH-3003

[GitHub] [nutch] sebastian-nagel opened a new pull request, #777: NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive

2023-09-19 Thread via GitHub
sebastian-nagel opened a new pull request, #777: URL: https://github.com/apache/nutch/pull/777 - implement class CaseInsensitiveMetadata providing case-insensitive metadata look-ups (but no spell-checking) - use CaseInsensitiveMetadata to hold HTTP header metadata in in the class