[jira] [Commented] (NUTCH-3043) Generator: count URLs rejected by URL filters

2024-04-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840892#comment-17840892 ] ASF GitHub Bot commented on NUTCH-3043: --- lewismc commented on code in PR #814: URL:

[jira] [Commented] (NUTCH-3044) Generator: NPE when extracting the host part of a URL fails

2024-04-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840854#comment-17840854 ] ASF GitHub Bot commented on NUTCH-3044: --- sebastian-nagel opened a new pull request, #815: URL:

[jira] [Commented] (NUTCH-3043) Generator: count URLs rejected by URL filters

2024-04-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840845#comment-17840845 ] ASF GitHub Bot commented on NUTCH-3043: --- sebastian-nagel opened a new pull request, #814: URL:

[jira] [Commented] (NUTCH-3041) Address confusing logging in o.a.n.net.URLExemptionFilters

2024-04-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839186#comment-17839186 ] ASF GitHub Bot commented on NUTCH-3041: --- lewismc commented on PR #813: URL:

[jira] [Commented] (NUTCH-3041) Address confusing logging in o.a.n.net.URLExemptionFilters

2024-04-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839181#comment-17839181 ] ASF GitHub Bot commented on NUTCH-3041: --- lewismc opened a new pull request, #813: URL:

[jira] [Commented] (NUTCH-3039) Failure to handle ftp:// URLs

2024-04-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836126#comment-17836126 ] ASF GitHub Bot commented on NUTCH-3039: --- sebastian-nagel opened a new pull request, #812: URL:

[jira] [Commented] (NUTCH-3038) Address issues discovered during 1.20 release management dryrun

2024-04-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835078#comment-17835078 ] ASF GitHub Bot commented on NUTCH-3038: --- lewismc merged PR #811: URL:

[jira] [Commented] (NUTCH-3038) Address issues discovered during 1.20 release management dryrun

2024-04-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834481#comment-17834481 ] ASF GitHub Bot commented on NUTCH-3038: --- lewismc opened a new pull request, #811: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-04-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834002#comment-17834002 ] ASF GitHub Bot commented on NUTCH-3032: --- lewismc merged PR #810: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832510#comment-17832510 ] ASF GitHub Bot commented on NUTCH-3032: --- CatChullain commented on PR #810: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832479#comment-17832479 ] ASF GitHub Bot commented on NUTCH-3032: --- lewismc commented on PR #810: URL:

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832474#comment-17832474 ] ASF GitHub Bot commented on NUTCH-3036: --- lewismc merged PR #807: URL:

[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832473#comment-17832473 ] ASF GitHub Bot commented on NUTCH-3035: --- lewismc merged PR #808: URL:

[jira] [Commented] (NUTCH-3037) Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

2024-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832472#comment-17832472 ] ASF GitHub Bot commented on NUTCH-3037: --- lewismc merged PR #809: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832471#comment-17832471 ] ASF GitHub Bot commented on NUTCH-3032: --- lewismc commented on PR #810: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832463#comment-17832463 ] ASF GitHub Bot commented on NUTCH-3032: --- CatChullain commented on PR #810: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-29 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832294#comment-17832294 ] ASF GitHub Bot commented on NUTCH-3032: --- lewismc commented on code in PR #810: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831128#comment-17831128 ] ASF GitHub Bot commented on NUTCH-3032: --- CatChullain commented on PR #810: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830983#comment-17830983 ] ASF GitHub Bot commented on NUTCH-3032: --- lewismc commented on code in PR #810: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830964#comment-17830964 ] ASF GitHub Bot commented on NUTCH-3032: --- lewismc commented on code in PR #810: URL:

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830749#comment-17830749 ] ASF GitHub Bot commented on NUTCH-3032: --- CatChullain opened a new pull request, #810: URL:

[jira] [Commented] (NUTCH-3037) Upgrade org.apache.kafka:kafka_2.12: to v3.7.0

2024-03-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829648#comment-17829648 ] ASF GitHub Bot commented on NUTCH-3037: --- lewismc opened a new pull request, #809: URL:

[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827578#comment-17827578 ] ASF GitHub Bot commented on NUTCH-3035: --- sebastian-nagel commented on PR #808: URL:

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827509#comment-17827509 ] ASF GitHub Bot commented on NUTCH-3026: --- tballison closed pull request #799: NUTCH-3026 > Allow

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827307#comment-17827307 ] ASF GitHub Bot commented on NUTCH-3036: --- lewismc closed pull request #807: NUTCH-3036 Upgrade

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827308#comment-17827308 ] ASF GitHub Bot commented on NUTCH-3036: --- lewismc opened a new pull request, #807: URL:

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827306#comment-17827306 ] ASF GitHub Bot commented on NUTCH-3036: --- lewismc commented on PR #807: URL:

[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827305#comment-17827305 ] ASF GitHub Bot commented on NUTCH-3035: --- lewismc commented on PR #808: URL:

[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827304#comment-17827304 ] ASF GitHub Bot commented on NUTCH-3035: --- sebastian-nagel opened a new pull request, #808: URL:

[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827303#comment-17827303 ] ASF GitHub Bot commented on NUTCH-3035: --- lewismc closed pull request #808: NUTCH-3035 Update

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827302#comment-17827302 ] ASF GitHub Bot commented on NUTCH-3036: --- lewismc commented on PR #807: URL:

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827301#comment-17827301 ] ASF GitHub Bot commented on NUTCH-3036: --- lewismc commented on PR #807: URL:

[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827223#comment-17827223 ] ASF GitHub Bot commented on NUTCH-3035: --- sebastian-nagel opened a new pull request, #808: URL:

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827208#comment-17827208 ] ASF GitHub Bot commented on NUTCH-3036: --- lewismc opened a new pull request, #807: URL:

[jira] [Commented] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827079#comment-17827079 ] ASF GitHub Bot commented on NUTCH-3008: --- sebastian-nagel merged PR #806: URL:

[jira] [Commented] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826879#comment-17826879 ] ASF GitHub Bot commented on NUTCH-3008: --- lewismc commented on PR #806: URL:

[jira] [Commented] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826765#comment-17826765 ] ASF GitHub Bot commented on NUTCH-3008: --- sebastian-nagel opened a new pull request, #806: URL:

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826084#comment-17826084 ] ASF GitHub Bot commented on NUTCH-3033: --- lewismc merged PR #803: URL:

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826085#comment-17826085 ] ASF GitHub Bot commented on NUTCH-3033: --- lewismc commented on PR #803: URL:

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825903#comment-17825903 ] ASF GitHub Bot commented on NUTCH-3033: --- lewismc commented on PR #803: URL:

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825830#comment-17825830 ] ASF GitHub Bot commented on NUTCH-3033: --- lewismc opened a new pull request, #803: URL:

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825829#comment-17825829 ] ASF GitHub Bot commented on NUTCH-3033: --- lewismc closed pull request #803: NUTCH-3033 Upgrade Ivy

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825448#comment-17825448 ] ASF GitHub Bot commented on NUTCH-3033: --- lewismc commented on PR #803: URL:

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825441#comment-17825441 ] ASF GitHub Bot commented on NUTCH-3026: --- lewismc commented on PR #799: URL:

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825437#comment-17825437 ] ASF GitHub Bot commented on NUTCH-3026: --- lewismc commented on PR #799: URL:

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825435#comment-17825435 ] ASF GitHub Bot commented on NUTCH-3026: --- lewismc closed pull request #799: NUTCH-3026 > Allow

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825436#comment-17825436 ] ASF GitHub Bot commented on NUTCH-3026: --- tballison opened a new pull request, #799: URL:

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825432#comment-17825432 ] ASF GitHub Bot commented on NUTCH-3033: --- lewismc commented on PR #803: URL:

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825431#comment-17825431 ] ASF GitHub Bot commented on NUTCH-3033: --- lewismc opened a new pull request, #803: URL:

[jira] [Commented] (NUTCH-2834) Deduplication mode via command line in crawl script

2024-03-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825024#comment-17825024 ] ASF GitHub Bot commented on NUTCH-2834: --- sebastian-nagel commented on PR #800: URL:

[jira] [Commented] (NUTCH-2834) Deduplication mode via command line in crawl script

2024-03-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825023#comment-17825023 ] ASF GitHub Bot commented on NUTCH-2834: --- sebastian-nagel merged PR #800: URL:

[jira] [Commented] (NUTCH-3027) Trivial resource leak patch in DomainSuffixes.java

2024-03-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825019#comment-17825019 ] ASF GitHub Bot commented on NUTCH-3027: --- sebastian-nagel commented on PR #802: URL:

[jira] [Commented] (NUTCH-3027) Trivial resource leak patch in DomainSuffixes.java

2024-03-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825018#comment-17825018 ] ASF GitHub Bot commented on NUTCH-3027: --- sebastian-nagel closed pull request #802: fix for

[jira] [Commented] (NUTCH-3027) Trivial resource leak patch in DomainSuffixes.java

2024-01-18 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808396#comment-17808396 ] ASF GitHub Bot commented on NUTCH-3027: --- skehrli opened a new pull request, #802: URL:

[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

2024-01-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803200#comment-17803200 ] ASF GitHub Bot commented on NUTCH-1541: --- lewismc commented on PR #294: URL:

[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

2023-12-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799911#comment-17799911 ] ASF GitHub Bot commented on NUTCH-1541: --- grege117 commented on PR #294: URL:

[jira] [Commented] (NUTCH-2834) Deduplication mode via command line in crawl script

2023-12-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796794#comment-17796794 ] ASF GitHub Bot commented on NUTCH-2834: --- derhecht opened a new pull request, #800: URL:

[jira] [Commented] (NUTCH-3024) Remove flaky 'dependency check' target

2023-11-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789566#comment-17789566 ] ASF GitHub Bot commented on NUTCH-3024: --- lewismc merged PR #795: URL:

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787371#comment-17787371 ] ASF GitHub Bot commented on NUTCH-3026: --- tballison opened a new pull request, #799: URL:

[jira] [Commented] (NUTCH-2812) Methods returning array may expose internal representation

2023-11-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784299#comment-17784299 ] ASF GitHub Bot commented on NUTCH-2812: --- GabeHaegele opened a new pull request, #798: URL:

[jira] [Commented] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784186#comment-17784186 ] ASF GitHub Bot commented on NUTCH-3025: --- sebastian-nagel merged PR #796: URL:

[jira] [Commented] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784187#comment-17784187 ] ASF GitHub Bot commented on NUTCH-3025: --- sebastian-nagel commented on PR #796: URL:

[jira] [Commented] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784057#comment-17784057 ] ASF GitHub Bot commented on NUTCH-3025: --- jnioche commented on PR #796: URL:

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-11-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784029#comment-17784029 ] ASF GitHub Bot commented on NUTCH-3017: --- sebastian-nagel commented on PR #793: URL:

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-11-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784024#comment-17784024 ] ASF GitHub Bot commented on NUTCH-3017: --- sebastian-nagel closed pull request #793: [NUTCH-3017]

[jira] [Commented] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783563#comment-17783563 ] ASF GitHub Bot commented on NUTCH-3025: --- jnioche commented on PR #796: URL:

[jira] [Commented] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783544#comment-17783544 ] ASF GitHub Bot commented on NUTCH-3025: --- jnioche commented on code in PR #796: URL:

[jira] [Commented] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783533#comment-17783533 ] ASF GitHub Bot commented on NUTCH-3025: --- sebastian-nagel commented on code in PR #796: URL:

[jira] [Commented] (NUTCH-3020) ParseSegment should check for protocol's flags for truncation

2023-11-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783360#comment-17783360 ] ASF GitHub Bot commented on NUTCH-3020: --- tballison merged PR #794: URL:

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783290#comment-17783290 ] ASF GitHub Bot commented on NUTCH-3019: --- tballison merged PR #797: URL:

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783254#comment-17783254 ] ASF GitHub Bot commented on NUTCH-3019: --- tballison commented on PR #797: URL:

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783228#comment-17783228 ] ASF GitHub Bot commented on NUTCH-3019: --- tballison commented on PR #797: URL:

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783227#comment-17783227 ] ASF GitHub Bot commented on NUTCH-3019: --- tballison opened a new pull request, #797: URL:

[jira] [Commented] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

2023-11-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783226#comment-17783226 ] ASF GitHub Bot commented on NUTCH-3025: --- jnioche opened a new pull request, #796: URL:

[jira] [Commented] (NUTCH-3024) Remove flaky 'dependency check' target

2023-11-03 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782575#comment-17782575 ] ASF GitHub Bot commented on NUTCH-3024: --- lewismc opened a new pull request, #795: URL:

[jira] [Commented] (NUTCH-3014) Standardize Job names

2023-11-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782392#comment-17782392 ] ASF GitHub Bot commented on NUTCH-3014: --- lewismc merged PR #789: URL:

[jira] [Commented] (NUTCH-3014) Standardize Job names

2023-11-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782382#comment-17782382 ] ASF GitHub Bot commented on NUTCH-3014: --- lewismc commented on code in PR #789: URL:

[jira] [Commented] (NUTCH-3020) ParseSegment should check for protocol's flags for truncation

2023-11-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781925#comment-17781925 ] ASF GitHub Bot commented on NUTCH-3020: --- lewismc commented on PR #794: URL:

[jira] [Commented] (NUTCH-3020) ParseSegment should check for protocol's flags for truncation

2023-11-01 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781820#comment-17781820 ] ASF GitHub Bot commented on NUTCH-3020: --- tballison opened a new pull request, #794: URL:

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781302#comment-17781302 ] ASF GitHub Bot commented on NUTCH-3017: --- sebastian-nagel commented on code in PR #793: URL:

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781301#comment-17781301 ] ASF GitHub Bot commented on NUTCH-3017: --- sebastian-nagel commented on code in PR #793: URL:

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781088#comment-17781088 ] ASF GitHub Bot commented on NUTCH-3017: --- jnioche opened a new pull request, #793: URL:

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781080#comment-17781080 ] ASF GitHub Bot commented on NUTCH-3017: --- jnioche commented on PR #792: URL:

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781079#comment-17781079 ] ASF GitHub Bot commented on NUTCH-3017: --- jnioche closed pull request #792: Allow fast-urlfilter to

[jira] [Commented] (NUTCH-3017) Allow fast-urlfilter to load from HDFS/S3 and support gzipped input

2023-10-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781078#comment-17781078 ] ASF GitHub Bot commented on NUTCH-3017: --- jnioche opened a new pull request, #792: URL:

[jira] [Commented] (NUTCH-3014) Standardize Job names

2023-10-29 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780720#comment-17780720 ] ASF GitHub Bot commented on NUTCH-3014: --- sebastian-nagel commented on code in PR #789: URL:

[jira] [Commented] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780538#comment-17780538 ] ASF GitHub Bot commented on NUTCH-3015: --- lewismc merged PR #790: URL:

[jira] [Commented] (NUTCH-2887) Migrate to JUnit 5 Jupiter

2023-10-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779335#comment-17779335 ] ASF GitHub Bot commented on NUTCH-2887: --- lewismc commented on PR #791: URL:

[jira] [Commented] (NUTCH-2887) Migrate to JUnit 5 Jupiter

2023-10-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779170#comment-17779170 ] ASF GitHub Bot commented on NUTCH-2887: --- lewismc opened a new pull request, #791: URL:

[jira] [Commented] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778813#comment-17778813 ] ASF GitHub Bot commented on NUTCH-3015: --- lewismc commented on PR #790: URL:

[jira] [Commented] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778812#comment-17778812 ] ASF GitHub Bot commented on NUTCH-3015: --- lewismc commented on PR #790: URL:

[jira] [Commented] (NUTCH-3015) Add more CI steps to GitHub master-build.yml

2023-10-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778506#comment-17778506 ] ASF GitHub Bot commented on NUTCH-3015: --- lewismc opened a new pull request, #790: URL:

[jira] [Commented] (NUTCH-3014) Standardize Job names

2023-10-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778448#comment-17778448 ] ASF GitHub Bot commented on NUTCH-3014: --- lewismc opened a new pull request, #789: URL:

[jira] [Commented] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic

2023-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778181#comment-17778181 ] ASF GitHub Bot commented on NUTCH-3013: --- lewismc merged PR #788: URL:

[jira] [Commented] (NUTCH-3012) SegmentReader when dumping with option -recode: NPE on unparsed documents

2023-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778116#comment-17778116 ] ASF GitHub Bot commented on NUTCH-3012: --- sebastian-nagel merged PR #787: URL:

[jira] [Commented] (NUTCH-3011) HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx)

2023-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778115#comment-17778115 ] ASF GitHub Bot commented on NUTCH-3011: --- sebastian-nagel merged PR #786: URL:

[jira] [Commented] (NUTCH-2990) HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309

2023-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778108#comment-17778108 ] ASF GitHub Bot commented on NUTCH-2990: --- sebastian-nagel merged PR #779: URL:

[jira] [Commented] (NUTCH-3009) Upgrade to Hadoop 3.3.6

2023-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778107#comment-17778107 ] ASF GitHub Bot commented on NUTCH-3009: --- sebastian-nagel merged PR #782: URL:

[jira] [Commented] (NUTCH-3002) Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive

2023-10-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778105#comment-17778105 ] ASF GitHub Bot commented on NUTCH-3002: --- sebastian-nagel merged PR #777: URL:

[jira] [Commented] (NUTCH-3013) Employ commons-lang3's StopWatch to simplify timing logic

2023-10-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778022#comment-17778022 ] ASF GitHub Bot commented on NUTCH-3013: --- lewismc commented on PR #788: URL:

  1   2   3   4   5   6   7   8   9   10   >