[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2017-11-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255540#comment-16255540 ] Tim Allison commented on NUTCH-2457: I'm sure this is user error, and I need to put something else on

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2017-11-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1622#comment-1622 ] Tim Allison commented on NUTCH-2457: Before Tika 1.15 (I think...might have been 1.16?), you'd have to

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2017-11-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1620#comment-1620 ] Tim Allison commented on NUTCH-2457: So, in lieu of a PR...please, please, please use the

[jira] [Comment Edited] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2017-11-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1620#comment-1620 ] Tim Allison edited comment on NUTCH-2457 at 11/16/17 4:22 PM: -- So, in lieu of

[jira] [Created] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2017-11-06 Thread Tim Allison (JIRA)
Tim Allison created NUTCH-2457: -- Summary: Embedded documents likely not correctly parsed by Tika Key: NUTCH-2457 URL: https://issues.apache.org/jira/browse/NUTCH-2457 Project: Nutch Issue Type:

[jira] [Comment Edited] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content

2018-05-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482879#comment-16482879 ] Tim Allison edited comment on NUTCH-2578 at 5/21/18 6:37 PM: - Based on

[jira] [Commented] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content

2018-05-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482879#comment-16482879 ] Tim Allison commented on NUTCH-2578: Based on [~wastl-nagel]'s observation, I updated Apache Tika to

[jira] [Comment Edited] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content

2018-05-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482879#comment-16482879 ] Tim Allison edited comment on NUTCH-2578 at 5/21/18 6:38 PM: - Based on

[jira] [Commented] (NUTCH-2586) Add a fallback mechanism for missing meta tags

2018-07-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542898#comment-16542898 ] Tim Allison commented on NUTCH-2586: Is this better handled at the Tika level...or is this something

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2019-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939473#comment-16939473 ] Tim Allison commented on NUTCH-2457: Let me take a look at the code again...it has been a while... >

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2019-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939516#comment-16939516 ] Tim Allison commented on NUTCH-2457: W00t! Default is to parse embedded, right? :D > Embedded

[jira] [Comment Edited] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2019-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939516#comment-16939516 ] Tim Allison edited comment on NUTCH-2457 at 9/27/19 2:55 PM: - W00t! Default

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2019-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939478#comment-16939478 ] Tim Allison commented on NUTCH-2457: The issue is that the AutoDetectParser automatically/silently

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2023-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765306#comment-17765306 ] Tim Allison commented on NUTCH-2959: Currently working on this to bump to Tika 2.9.0. PR incoming

[jira] [Updated] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2959: --- Summary: Upgrade to Apache Tika 2.9.0 (was: Upgrade to Apache Tika 2.4.1) > Upgrade to Apache Tika

[jira] [Resolved] (NUTCH-2998) Remove the Any23 plugin

2023-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2998. Fix Version/s: 1.20 Resolution: Fixed > Remove the Any23 plugin > --- >

[jira] [Resolved] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2023-09-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2978. Fix Version/s: 1.20 Resolution: Fixed Many thanks [~markus17] for all of the work on this!

[jira] [Created] (NUTCH-2998) Remove the Any23 plugin

2023-08-28 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2998: -- Summary: Remove the Any23 plugin Key: NUTCH-2998 URL: https://issues.apache.org/jira/browse/NUTCH-2998 Project: Nutch Issue Type: Task Components:

[jira] [Resolved] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2999. Fix Version/s: 1.20 Resolution: Fixed Thank you [~markus17] for the review! > Update

[jira] [Resolved] (NUTCH-2961) Upgrade dependencies of parsefilter-naivebayes

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2961. Resolution: Fixed I confirmed we can simply remove those dependencies. I fixed this as part of

[jira] [Commented] (NUTCH-2961) Upgrade dependencies of parsefilter-naivebayes

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760508#comment-17760508 ] Tim Allison commented on NUTCH-2961: It looks like neither mahout nor lucene are actually used any

[jira] [Created] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2999: -- Summary: Update Lucene version to latest 8.x Key: NUTCH-2999 URL: https://issues.apache.org/jira/browse/NUTCH-2999 Project: Nutch Issue Type: Task

[jira] [Commented] (NUTCH-2998) Remove the Any23 plugin

2023-09-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764376#comment-17764376 ] Tim Allison commented on NUTCH-2998: I don't want to make such a drastic change without at least a

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be a content-type header.

[jira] [Created] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3000: -- Summary: protocol-selenium returns only the body,strips off the element Key: NUTCH-3000 URL: https://issues.apache.org/jira/browse/NUTCH-3000 Project: Nutch

[jira] [Created] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3001: -- Summary: protocol-selenium requires Content-Type header Key: NUTCH-3001 URL: https://issues.apache.org/jira/browse/NUTCH-3001 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be content-type. The logic

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Priority: Minor (was: Major) > protocol-selenium requires Content-Type header >

[jira] [Commented] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764698#comment-17764698 ] Tim Allison commented on NUTCH-3001: Or is the notion that if the selenium protocol doesn't pull any

[jira] [Commented] (NUTCH-2998) Remove the Any23 plugin

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764741#comment-17764741 ] Tim Allison commented on NUTCH-2998: Sorry, I botched the title in the PR:

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be a content-type header.

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764705#comment-17764705 ] Tim Allison commented on NUTCH-2978: I haven't tested in hadoop. I've just run it locally, and, for

[jira] [Resolved] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3001. Fix Version/s: 1.20 Resolution: Fixed > protocol-selenium requires Content-Type header >

[jira] [Resolved] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3000. Fix Version/s: 1.20 Resolution: Fixed > protocol-selenium returns only the body,strips off

[jira] [Commented] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760511#comment-17760511 ] Tim Allison commented on NUTCH-2999: https://github.com/apache/nutch/pull/770 > Update Lucene

[jira] [Commented] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760512#comment-17760512 ] Tim Allison commented on NUTCH-2999: This PR also takes care of NUTCH-2961 > Update Lucene version

[jira] [Resolved] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2999. Resolution: Fixed Updated PR should have fixed that issue. Would be nice to add testcontainers

[jira] [Reopened] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened NUTCH-2999: The applied PR breaks the lucene-based indexers. > Update Lucene version to latest 8.x >

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2023-08-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760926#comment-17760926 ] Tim Allison commented on NUTCH-2978: K, I think https://github.com/apache/nutch/pull/772 is better.

[jira] [Created] (NUTCH-3020) ParseSegment should check for protocol's flags for truncation

2023-11-01 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3020: -- Summary: ParseSegment should check for protocol's flags for truncation Key: NUTCH-3020 URL: https://issues.apache.org/jira/browse/NUTCH-3020 Project: Nutch

[jira] [Created] (NUTCH-3021) Improve http-protocol to identify truncated content

2023-11-01 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3021: -- Summary: Improve http-protocol to identify truncated content Key: NUTCH-3021 URL: https://issues.apache.org/jira/browse/NUTCH-3021 Project: Nutch Issue Type:

[jira] [Created] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3018: -- Summary: Consider pooling remote webdrivers for Selenium? Key: NUTCH-3018 URL: https://issues.apache.org/jira/browse/NUTCH-3018 Project: Nutch Issue Type: Task

[jira] [Resolved] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2959. Resolution: Fixed > Upgrade to Apache Tika 2.9.0 > > >

[jira] [Created] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-10-31 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3019: -- Summary: Upgrade to Apache Tika 2.9.1 Key: NUTCH-3019 URL: https://issues.apache.org/jira/browse/NUTCH-3019 Project: Nutch Issue Type: Task

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781482#comment-17781482 ] Tim Allison commented on NUTCH-3019: Separately, I noticed that logging from Tika was not working

[jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3018: --- Description: It looks like it takes between 2x and 4x of the time to initialize the remote

[jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781483#comment-17781483 ] Tim Allison commented on NUTCH-3018: It looks like we cannot create more web drivers than the

[jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781483#comment-17781483 ] Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:46 PM: -- It looks like

[jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781485#comment-17781485 ] Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:55 PM: -- On further

[jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781485#comment-17781485 ] Tim Allison commented on NUTCH-3018: On further reflection, what the above means is that if each of

[jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3018: --- Description: It looks like it takes between 2x and 4x of the time to initialize the remote

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783252#comment-17783252 ] Tim Allison commented on NUTCH-3019: ParserStatus         failed=84         success=625 > Upgrade to

[jira] [Comment Edited] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783252#comment-17783252 ] Tim Allison edited comment on NUTCH-3019 at 11/6/23 3:32 PM: - I just got

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783352#comment-17783352 ] Tim Allison commented on NUTCH-3019: {noformat} [junit] Tests run: 7, Failures: 4, Errors: 0,

[jira] [Resolved] (NUTCH-3020) ParseSegment should check for protocol's flags for truncation

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3020. Fix Version/s: 1.20 Resolution: Fixed > ParseSegment should check for protocol's flags for

[jira] [Comment Edited] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783254#comment-17783254 ] Tim Allison edited comment on NUTCH-3019 at 11/6/23 3:46 PM: - tballison

[jira] [Resolved] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3019. Fix Version/s: 1.20 Resolution: Fixed > Upgrade to Apache Tika 2.9.1 >

[jira] [Created] (NUTCH-3004) Avoid NPE in HttpResponse

2023-09-25 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3004: -- Summary: Avoid NPE in HttpResponse Key: NUTCH-3004 URL: https://issues.apache.org/jira/browse/NUTCH-3004 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-3006) Downgrade Tika dependency to 2.2.1 (core and parse-tika)

2023-09-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770059#comment-17770059 ] Tim Allison commented on NUTCH-3006: An alternative approach would be for Tika to revert

[jira] [Resolved] (NUTCH-3004) Avoid NPE in HttpResponse

2023-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3004. Resolution: Fixed > Avoid NPE in HttpResponse > - > > Key:

[jira] [Created] (NUTCH-3005) Upgrade selenium as needed

2023-09-26 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3005: -- Summary: Upgrade selenium as needed Key: NUTCH-3005 URL: https://issues.apache.org/jira/browse/NUTCH-3005 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771476#comment-17771476 ] Tim Allison commented on NUTCH-2959: If you and the Nutch team are ok with the shim, I'll work

[jira] [Created] (NUTCH-3003) Consider integration testing in a Dockerized mini-hadoop cluster via testcontainers?

2023-09-19 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3003: -- Summary: Consider integration testing in a Dockerized mini-hadoop cluster via testcontainers? Key: NUTCH-3003 URL: https://issues.apache.org/jira/browse/NUTCH-3003

[jira] [Commented] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode

2023-09-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766832#comment-17766832 ] Tim Allison commented on NUTCH-2937: As [~snagel] pointed out on the PR for NUTCH-2959 -- looks like

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771170#comment-17771170 ] Tim Allison commented on NUTCH-2959: I've continued to stub my toes on this this morning. The best

[jira] [Comment Edited] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771170#comment-17771170 ] Tim Allison edited comment on NUTCH-2959 at 10/2/23 3:51 PM: - I've continued

[jira] [Assigned] (NUTCH-2989) Can't have username/pw AND https on elastic-indexer?!

2023-08-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned NUTCH-2989: -- Assignee: Tim Allison > Can't have username/pw AND https on elastic-indexer?! >

[jira] [Resolved] (NUTCH-2989) Can't have username/pw AND https on elastic-indexer?!

2023-08-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2989. Resolution: Fixed Fellow Nutch devs, please let me know if I botched any of our processes in

[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695096#comment-17695096 ] Tim Allison commented on NUTCH-2920: My initial PR was a simple copy+paste with a few modifications

[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695148#comment-17695148 ] Tim Allison commented on NUTCH-2920: Well, that was a funny notion... Turns out there is no

[jira] [Comment Edited] (NUTCH-2927) indexer-elastic: use Java API client

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695217#comment-17695217 ] Tim Allison edited comment on NUTCH-2927 at 3/1/23 5:26 PM: Over on

[jira] [Commented] (NUTCH-2927) indexer-elastic: use Java API client

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695217#comment-17695217 ] Tim Allison commented on NUTCH-2927: Over on NUTCH-2920 , I stumbled into the blocker that

[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695152#comment-17695152 ] Tim Allison commented on NUTCH-2920: Current proposal is to go with the high level rest client for

[jira] [Resolved] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2988. Resolution: Duplicate Duplicate. Sorry! > Elasticsearch 7.13.2 compatible with ASL 2.0? >

[jira] [Created] (NUTCH-2989) Can't have username/pw AND https on elastic-indexer?!

2023-03-01 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2989: -- Summary: Can't have username/pw AND https on elastic-indexer?! Key: NUTCH-2989 URL: https://issues.apache.org/jira/browse/NUTCH-2989 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2988: --- Description: In the latest release of at least the 1.x branch of Nutch, the elasticsearch high

[jira] [Created] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2988: -- Summary: Elasticsearch 7.13.2 compatible with ASL 2.0? Key: NUTCH-2988 URL: https://issues.apache.org/jira/browse/NUTCH-2988 Project: Nutch Issue Type: Task

[jira] [Updated] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2988: --- Description: In the latest release of at least the 1.x branch of Nutch, the elasticsearch high

[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694744#comment-17694744 ] Tim Allison commented on NUTCH-2988: If you open the 7.13.2 jar file, there's just the two -- "Server

[jira] [Updated] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2988: --- Attachment: LICENSE.txt > Elasticsearch 7.13.2 compatible with ASL 2.0? >

[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694739#comment-17694739 ] Tim Allison commented on NUTCH-2988: Y, k.

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2023-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725805#comment-17725805 ] Tim Allison commented on NUTCH-2959: I just opened a PR to upgrade Tika to 2.8.0 on ANY23:

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2023-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725807#comment-17725807 ] Tim Allison commented on NUTCH-2959: Separately, I'm wondering if it would be useful to add an

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2023-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725842#comment-17725842 ] Tim Allison commented on NUTCH-2959: tika-server would be cleaner?  Could have autoscaling pods of

[jira] [Created] (NUTCH-2994) Implement an indexer for OpenSearch 2.x

2023-06-08 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2994: -- Summary: Implement an indexer for OpenSearch 2.x Key: NUTCH-2994 URL: https://issues.apache.org/jira/browse/NUTCH-2994 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-2994) Implement an indexer for OpenSearch 2.x

2023-06-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2994: --- Description: Over on NUTCH-2920, we added an indexer for OpenSearch 1.x. We should do this for

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-12-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794972#comment-17794972 ] Tim Allison commented on NUTCH-3026: Anyone have any time for feedback, even if only at a high level?

[jira] [Updated] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3026: --- Issue Type: New Feature (was: Task) > Allow statusOnly option for IndexingJob >

[jira] [Created] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3026: -- Summary: Allow statusOnly option for IndexingJob Key: NUTCH-3026 URL: https://issues.apache.org/jira/browse/NUTCH-3026 Project: Nutch Issue Type: Task

[jira] [Updated] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3026: --- Description: This issue follows on from discussion here:

[jira] [Updated] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3026: --- Description: This issue follows on from discussion here:

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787372#comment-17787372 ] Tim Allison commented on NUTCH-3026: The above PR is a WIP for discussion. Let me know what you

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825440#comment-17825440 ] Tim Allison commented on NUTCH-3026: I should close out the PR and this issue. With change in

[jira] [Resolved] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3026. Resolution: Won't Fix Lost support for working on this issue. > Allow statusOnly option for

[jira] [Comment Edited] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827510#comment-17827510 ] Tim Allison edited comment on NUTCH-3026 at 3/15/24 2:18 PM: - Lost support

[jira] [Commented] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode

2024-04-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834532#comment-17834532 ] Tim Allison commented on NUTCH-2937: I really, really, really wish we didn't have to do this! :P

[jira] [Commented] (NUTCH-3040) Upgrade to Hadoop 3.4.0

2024-04-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836191#comment-17836191 ] Tim Allison commented on NUTCH-3040: :cry-sob: This is great news! > Upgrade to Hadoop 3.4.0 >