[jira] [Commented] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826879#comment-17826879
 ] 

ASF GitHub Bot commented on NUTCH-3008:
---

lewismc commented on PR #806:
URL: https://github.com/apache/nutch/pull/806#issuecomment-1995922015

   Tested with ES 7.10.2 6 node cluster. +1 LGTM.




> indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
> ---
>
> Key: NUTCH-3008
> URL: https://issues.apache.org/jira/browse/NUTCH-3008
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, plugin
>Affects Versions: 1.19
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Downgrade to ES 7.10.2 (licensed under ASF 2.0) as an alternative solution to 
> address the licensing issues of the indexer-elastic plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues [nutch]

2024-03-13 Thread via GitHub


lewismc commented on PR #806:
URL: https://github.com/apache/nutch/pull/806#issuecomment-1995922015

   Tested with ES 7.10.2 6 node cluster. +1 LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826846#comment-17826846
 ] 

Hudson commented on NUTCH-3029:
---

FAILURE: Integrated in Jenkins build Nutch » Nutch-trunk #152 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/152/])
NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler (markus: 
[https://github.com/apache/nutch/commit/a8ec17ca853b2488bf5d96538915a00a05064a31])
* (edit) src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java


> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Nutch » Nutch-trunk #152

2024-03-13 Thread Apache Jenkins Server
See 


Changes:

[Markus Jelsma] NUTCH-3029 Host specific max. and min. intervals in adaptive 
scheduler


--
[...truncated 759.38 KB...]
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-querystring
[junit] Running 
org.apache.nutch.net.urlnormalizer.querystring.TestQuerystringURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.548 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-regex

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-regex
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.693 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-slash

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-slash
[junit] Running 
org.apache.nutch.net.urlnormalizer.regex.TestRegexURLNormalizer
[junit] Running 
org.apache.nutch.net.urlnormalizer.slash.TestSlashURLNormalizer
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.917 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.099 sec

test:

jar:
 [copy] Copying 1 file to 

 [copy] Copying 1 file to 

  [jar] Building jar: 


runtime:
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [copy] Copying 1 file to 

 [copy] Copying 2 files to 

 [copy] Copying 1 file to 

 [copy] Copying 1 file to 

 [copy] Copying 37 files to 

 [copy] Copying 2 files to 

 [copy] Copying 208 files to 

 [copy] Copying 490 files to 

 [copy] Copying 379 files to 


javadoc:
[mkdir] Created dir: 

[mkdir] Created dir: 

  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.nutch.crawl...
  [javadoc] Loading source files for package org.apache.nutch.exchange...
  [javadoc] Loading source files for package org.apache.nutch.fetcher...
  [javadoc] Loading source files for package org.apache.nutch.hostdb...
  [javadoc] Loading source files for package org.apache.nutch.indexer...
  [javadoc] Loading source files for package org.apache.nutch.metadata...
  [javadoc] Loading source files for package org.apache.nutch.net...
  [javadoc] Loading source files for package org.apache.nutch.net.protocols...
  [javadoc] Loading source files for package org.apache.nutch.parse...
  

[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826823#comment-17826823
 ] 

Markus Jelsma commented on NUTCH-3029:
--

throws was missing too

   84cda2abd..a8ec17ca8  master -> master

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826806#comment-17826806
 ] 

Hudson commented on NUTCH-3029:
---

FAILURE: Integrated in Jenkins build Nutch » Nutch-trunk #151 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/151/])
NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler (markus: 
[https://github.com/apache/nutch/commit/84cda2abd500667222fdb00e503780ee0bdaaab4])
* (edit) src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java


> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Nutch » Nutch-trunk #151

2024-03-13 Thread Apache Jenkins Server
See 


Changes:

[Markus Jelsma] NUTCH-3029 Host specific max. and min. intervals in adaptive 
scheduler


--
[...truncated 775.35 KB...]
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-querystring
[junit] Running 
org.apache.nutch.net.urlnormalizer.querystring.TestQuerystringURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.002 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-regex

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-regex
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.385 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-slash

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-slash
[junit] Running 
org.apache.nutch.net.urlnormalizer.regex.TestRegexURLNormalizer
[junit] Running 
org.apache.nutch.net.urlnormalizer.slash.TestSlashURLNormalizer
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.492 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.65 sec

test:

jar:
 [copy] Copying 1 file to 

 [copy] Copying 1 file to 

  [jar] Building jar: 


runtime:
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [copy] Copying 1 file to 

 [copy] Copying 2 files to 

 [copy] Copying 1 file to 

 [copy] Copying 1 file to 

 [copy] Copying 37 files to 

 [copy] Copying 2 files to 

 [copy] Copying 208 files to 

 [copy] Copying 490 files to 

 [copy] Copying 379 files to 


javadoc:
[mkdir] Created dir: 

[mkdir] Created dir: 

  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.nutch.crawl...
  [javadoc] Loading source files for package org.apache.nutch.exchange...
  [javadoc] Loading source files for package org.apache.nutch.fetcher...
  [javadoc] Loading source files for package org.apache.nutch.hostdb...
  [javadoc] Loading source files for package org.apache.nutch.indexer...
  [javadoc] Loading source files for package org.apache.nutch.metadata...
  [javadoc] Loading source files for package org.apache.nutch.net...
  [javadoc] Loading source files for package org.apache.nutch.net.protocols...
  [javadoc] Loading source files for package org.apache.nutch.parse...
  

[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826783#comment-17826783
 ] 

Markus Jelsma commented on NUTCH-3029:
--

Thanks Lewis!

   5ba50c0c6..84cda2abd  master -> master



 

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826776#comment-17826776
 ] 

Lewis John McGibbney commented on NUTCH-3029:
-

Hi [~martin.dj] [~markus17] it looks like we are missing some Javadoc

 
{quote} [javadoc] Standard Doclet version 11.0.22 {quote}
{quote} [javadoc] Building tree for all the packages and classes... 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:193:
 warning: no @param for url 
 [javadoc] public static String getHostName(String url) throws 
URISyntaxException { 
 [javadoc] ^ 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:193:
 warning: no @return 
 [javadoc] public static String getHostName(String url) throws 
URISyntaxException { 
 [javadoc] ^ 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:193:
 warning: no @throws for java.net.URISyntaxException 
 [javadoc] public static String getHostName(String url) throws 
URISyntaxException { 
 [javadoc] ^ 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:205:
 warning: no @return 
 [javadoc] public float getMaxInterval(Text url, float defaultMaxInterval){ 
 [javadoc] ^ 
 [javadoc] 
/home/runner/work/nutch/nutch/src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:227:
 warning: no @return 
 [javadoc] public float getMinInterval(Text url, float defaultMinInterval){ 
{quote}
{quote} [javadoc] ^{quote}
 

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826773#comment-17826773
 ] 

Hudson commented on NUTCH-3033:
---

FAILURE: Integrated in Jenkins build Nutch » Nutch-trunk #150 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/150/])
NUTCH-3033 Upgrade Ivy to v2.5.2 (#803) (github: 
[https://github.com/apache/nutch/commit/4f62dec0f3001a8d41b236913346669ac7968133])
* (edit) src/plugin/publish-rabbitmq/ivy.xml
* (edit) src/plugin/index-static/ivy.xml
* (edit) src/plugin/parsefilter-debug/ivy.xml
* (edit) src/plugin/indexer-cloudsearch/ivy.xml
* (edit) src/plugin/urlnormalizer-slash/ivy.xml
* (edit) build.xml
* (edit) src/plugin/urlfilter-domaindenylist/ivy.xml
* (edit) src/plugin/parse-tika/ivy.xml
* (edit) src/plugin/microformats-reltag/ivy.xml
* (edit) src/plugin/lib-http/ivy.xml
* (edit) src/plugin/urlfilter-suffix/ivy.xml
* (edit) src/plugin/urlnormalizer-pass/ivy.xml
* (edit) src/plugin/lib-xml/ivy.xml
* (edit) src/plugin/urlfilter-ignoreexempt/ivy.xml
* (edit) src/plugin/lib-htmlunit/build-ivy.xml
* (edit) src/plugin/index-more/ivy.xml
* (edit) src/plugin/urlnormalizer-querystring/ivy.xml
* (edit) src/plugin/index-replace/ivy.xml
* (edit) src/plugin/creativecommons/ivy.xml
* (edit) src/plugin/indexer-kafka/ivy.xml
* (edit) .gitignore
* (edit) src/plugin/scoring-metadata/ivy.xml
* (edit) src/plugin/scoring-orphan/ivy.xml
* (edit) src/plugin/urlnormalizer-basic/ivy.xml
* (edit) src/plugin/index-basic/ivy.xml
* (edit) src/plugin/protocol-httpclient/ivy.xml
* (edit) src/plugin/parse-metatags/ivy.xml
* (edit) src/plugin/urlmeta/ivy.xml
* (edit) src/plugin/tld/ivy.xml
* (edit) src/plugin/lib-selenium/ivy.xml
* (edit) src/plugin/urlfilter-domain/ivy.xml
* (edit) src/plugin/urlnormalizer-host/ivy.xml
* (edit) src/plugin/feed/ivy.xml
* (edit) src/plugin/indexer-dummy/ivy.xml
* (edit) src/plugin/indexer-rabbit/ivy.xml
* (edit) src/plugin/parse-js/ivy.xml
* (edit) src/plugin/parse-html/ivy.xml
* (edit) src/plugin/exchange-jexl/ivy.xml
* (edit) src/plugin/parse-ext/ivy.xml
* (edit) src/plugin/scoring-depth/ivy.xml
* (edit) src/plugin/subcollection/ivy.xml
* (edit) src/plugin/urlfilter-regex/ivy.xml
* (edit) src/plugin/protocol-okhttp/ivy.xml
* (edit) src/plugin/urlfilter-fast/ivy.xml
* (edit) src/plugin/urlnormalizer-ajax/ivy.xml
* (edit) src/plugin/urlfilter-automaton/ivy.xml
* (edit) src/plugin/index-metadata/ivy.xml
* (edit) src/plugin/lib-nekohtml/ivy.xml
* (edit) src/plugin/protocol-http/ivy.xml
* (edit) src/plugin/headings/ivy.xml
* (edit) src/plugin/language-identifier/ivy.xml
* (edit) ivy/ivy.xml
* (edit) src/plugin/protocol-file/ivy.xml
* (edit) src/plugin/scoring-link/ivy.xml
* (edit) src/plugin/scoring-opic/ivy.xml
* (edit) src/plugin/urlnormalizer-protocol/ivy.xml
* (edit) default.properties
* (edit) src/plugin/protocol-interactiveselenium/ivy.xml
* (edit) src/plugin/protocol-foo/ivy.xml
* (edit) src/plugin/scoring-similarity/ivy.xml
* (edit) src/plugin/nutch-extensionpoints/ivy.xml
* (edit) src/plugin/lib-regex-filter/ivy.xml
* (edit) src/plugin/urlfilter-prefix/ivy.xml
* (edit) src/plugin/lib-rabbitmq/ivy.xml
* (edit) src/plugin/build-plugin.xml
* (edit) src/plugin/mimetype-filter/ivy.xml
* (edit) src/plugin/urlnormalizer-regex/ivy.xml
* (edit) src/plugin/indexer-opensearch-1x/ivy.xml
* (edit) src/plugin/index-links/ivy.xml
* (edit) src/plugin/parsefilter-regex/ivy.xml
* (edit) src/plugin/indexer-csv/ivy.xml
* (edit) src/plugin/indexer-solr/ivy.xml
* (edit) src/plugin/index-jexl-filter/ivy.xml
* (edit) src/plugin/index-anchor/ivy.xml
* (edit) src/plugin/parsefilter-naivebayes/ivy.xml
* (edit) src/plugin/protocol-ftp/ivy.xml
* (edit) src/plugin/indexer-elastic/ivy.xml
* (edit) src/plugin/parse-zip/ivy.xml
* (edit) src/plugin/protocol-htmlunit/ivy.xml
* (edit) src/plugin/lib-htmlunit/ivy.xml
* (edit) src/plugin/index-geoip/ivy.xml
* (edit) src/plugin/urlfilter-validator/ivy.xml
* (edit) ivy/ivysettings.xml
* (edit) src/plugin/protocol-selenium/ivy.xml


> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826772#comment-17826772
 ] 

Hudson commented on NUTCH-3029:
---

FAILURE: Integrated in Jenkins build Nutch » Nutch-trunk #150 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/150/])
NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler (markus: 
[https://github.com/apache/nutch/commit/5ba50c0c6091a95818d3788f0d5b7c0ff49bec57])
* (edit) src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java


> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Nutch » Nutch-trunk #150

2024-03-13 Thread Apache Jenkins Server
See 


Changes:

[github] NUTCH-3033 Upgrade Ivy to v2.5.2 (#803)

[Markus Jelsma] NUTCH-3029 Host specific max. and min. intervals in adaptive 
scheduler


--
[...truncated 771.39 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.076 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-regex

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 

[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.359 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-slash

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-regex

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-slash
[junit] Running 
org.apache.nutch.net.urlnormalizer.regex.TestRegexURLNormalizer
[junit] Running 
org.apache.nutch.net.urlnormalizer.slash.TestSlashURLNormalizer
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.749 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.775 sec

test:

jar:
 [copy] Copying 1 file to 

 [copy] Copying 1 file to 

  [jar] Building jar: 


runtime:
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [copy] Copying 1 file to 

 [copy] Copying 2 files to 

 [copy] Copying 1 file to 

 [copy] Copying 1 file to 

 [copy] Copying 37 files to 

 [copy] Copying 2 files to 

 [copy] Copying 208 files to 

 [copy] Copying 490 files to 

 [copy] Copying 379 files to 


javadoc:
[mkdir] Created dir: 

[mkdir] Created dir: 

  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.nutch.crawl...
  [javadoc] Loading source files for package org.apache.nutch.exchange...
  [javadoc] Loading source files for package org.apache.nutch.fetcher...
  [javadoc] Loading source files for package org.apache.nutch.hostdb...
  [javadoc] Loading source files for package org.apache.nutch.indexer...
  [javadoc] Loading source files for package org.apache.nutch.metadata...
  [javadoc] Loading source files for package org.apache.nutch.net...
  [javadoc] Loading source files for package org.apache.nutch.net.protocols...
  [javadoc] Loading source files for package org.apache.nutch.parse...
  [javadoc] Loading source files for package org.apache.nutch.plugin...
  [javadoc] Loading source files for package org.apache.nutch.protocol...
  [javadoc] Loading source files for package org.apache.nutch.publisher...
  [javadoc] Loading source files for package org.apache.nutch.scoring...
  

[jira] [Commented] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826765#comment-17826765
 ] 

ASF GitHub Bot commented on NUTCH-3008:
---

sebastian-nagel opened a new pull request, #806:
URL: https://github.com/apache/nutch/pull/806

   This PR downgrades the ES client to version 7.10.2 which is licensed under 
ASF 2.0 - it's a quick fix to stay compatible with ASF policies.
   
   Not yet tested: indexing into ES
   
   To be done: update the LICENSE and NOTICE files. I'll do this as part of a 
separate issue NUTCH-3035.




> indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
> ---
>
> Key: NUTCH-3008
> URL: https://issues.apache.org/jira/browse/NUTCH-3008
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, plugin
>Affects Versions: 1.19
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Downgrade to ES 7.10.2 (licensed under ASF 2.0) as an alternative solution to 
> address the licensing issues of the indexer-elastic plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues [nutch]

2024-03-13 Thread via GitHub


sebastian-nagel opened a new pull request, #806:
URL: https://github.com/apache/nutch/pull/806

   This PR downgrades the ES client to version 7.10.2 which is licensed under 
ASF 2.0 - it's a quick fix to stay compatible with ASF policies.
   
   Not yet tested: indexing into ES
   
   To be done: update the LICENSE and NOTICE files. I'll do this as part of a 
separate issue NUTCH-3035.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-13 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-3035:
--

 Summary: Update license and notice file for release of 1.20 
 Key: NUTCH-3035
 URL: https://issues.apache.org/jira/browse/NUTCH-3035
 Project: Nutch
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.20
Reporter: Sebastian Nagel
Assignee: Sebastian Nagel
 Fix For: 1.20


Close to the release of 1.20 the license and notice files should be updated to 
contain all (third-party) licenses of all dependencies. Cf. NUTCH-2290 and 
NUTCH-2981.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826759#comment-17826759
 ] 

Markus Jelsma commented on NUTCH-3029:
--

   4f62dec0f..5ba50c0c6  master -> master



actual change was missing from the commit for some reason

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826760#comment-17826760
 ] 

Markus Jelsma commented on NUTCH-3033:
--

Ah, the new ivy works like a charm!

Thanks!

> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826084#comment-17826084
 ] 

ASF GitHub Bot commented on NUTCH-3033:
---

lewismc merged PR #803:
URL: https://github.com/apache/nutch/pull/803




> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed NUTCH-3033.
---

> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-3033.
-
Resolution: Fixed

> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826085#comment-17826085
 ] 

ASF GitHub Bot commented on NUTCH-3033:
---

lewismc commented on PR #803:
URL: https://github.com/apache/nutch/pull/803#issuecomment-1994562354

   Thanks @sebastian-nagel  




> Upgrade Ivy to v2.5.2
> -
>
> Key: NUTCH-3033
> URL: https://issues.apache.org/jira/browse/NUTCH-3033
> Project: Nutch
>  Issue Type: Task
>  Components: ivy
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> Ivy v2.5.2 was released August 20th 2023. Let’s upgrade.
> [https://ant.apache.org/ivy/history/2.5.2/release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3033 Upgrade Ivy to v2.5.2 [nutch]

2024-03-13 Thread via GitHub


lewismc commented on PR #803:
URL: https://github.com/apache/nutch/pull/803#issuecomment-1994562354

   Thanks @sebastian-nagel  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NUTCH-3033 Upgrade Ivy to v2.5.2 [nutch]

2024-03-13 Thread via GitHub


lewismc merged PR #803:
URL: https://github.com/apache/nutch/pull/803


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826060#comment-17826060
 ] 

Hudson commented on NUTCH-3029:
---

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #149 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/149/])
NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler (markus: 
[https://github.com/apache/nutch/commit/4642c30c2aeb2a1fa2436541bd4af877d0aad86a])
* (add) conf/adaptive-host-specific-intervals.txt.template


> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Markus Jelsma (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma resolved NUTCH-3029.
--
Resolution: Fixed

Thanks Martin!

   551c50b1c..4642c30c2  master -> master

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3030) Use system default cipher suites instead of hard-coded set

2024-03-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826038#comment-17826038
 ] 

Hudson commented on NUTCH-3030:
---

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #148 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/148/])
NUTCH-3030 Use system default cipher suites instead of hard-coded set (markus: 
[https://github.com/apache/nutch/commit/551c50b1caac27ae65f25517de5b202b314fef0e])
* (edit) 
src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java


> Use system default cipher suites instead of hard-coded set
> --
>
> Key: NUTCH-3030
> URL: https://issues.apache.org/jira/browse/NUTCH-3030
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.19
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-3030.patch, default_ciphers_and_protocols-2.patch
>
>
> If http.tls.supported.cipher.suites is not set in the configuration, it 
> defaults to a hard-coded list which is not exhaustive enough. I have 
> encountered websites that exclusively use ciphers which are not included, so 
> they could not be handled by protocol-http.
> I changed this list to the system default -- SSLSocketFactory's 
> .getDefaultCipherSuites() to be precise. One could also use 
> .getSupportedCipherSuites() here, I suppose.
> The original list should be moved to nutch-default.xml or omitted altogether. 
> The protocol list is still hard-coded, but it is now also added to 
> nutch-default.xml (so it can be easily changed manually if needed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3030) Use system default cipher suites instead of hard-coded set

2024-03-13 Thread Markus Jelsma (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma resolved NUTCH-3030.
--
Resolution: Fixed

42b55f6a9..551c50b1c  master -> master

 

Thanks Martin!

 

> Use system default cipher suites instead of hard-coded set
> --
>
> Key: NUTCH-3030
> URL: https://issues.apache.org/jira/browse/NUTCH-3030
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.19
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-3030.patch, default_ciphers_and_protocols-2.patch
>
>
> If http.tls.supported.cipher.suites is not set in the configuration, it 
> defaults to a hard-coded list which is not exhaustive enough. I have 
> encountered websites that exclusively use ciphers which are not included, so 
> they could not be handled by protocol-http.
> I changed this list to the system default -- SSLSocketFactory's 
> .getDefaultCipherSuites() to be precise. One could also use 
> .getSupportedCipherSuites() here, I suppose.
> The original list should be moved to nutch-default.xml or omitted altogether. 
> The protocol list is still hard-coded, but it is now also added to 
> nutch-default.xml (so it can be easily changed manually if needed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3030) Use system default cipher suites instead of hard-coded set

2024-03-13 Thread Markus Jelsma (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-3030:
-
Summary: Use system default cipher suites instead of hard-coded set  (was: 
Update default TLS cipher suites for http(s) protocol)

> Use system default cipher suites instead of hard-coded set
> --
>
> Key: NUTCH-3030
> URL: https://issues.apache.org/jira/browse/NUTCH-3030
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.19
>Reporter: Martin Djukanovic
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-3030.patch, default_ciphers_and_protocols-2.patch
>
>
> If http.tls.supported.cipher.suites is not set in the configuration, it 
> defaults to a hard-coded list which is not exhaustive enough. I have 
> encountered websites that exclusively use ciphers which are not included, so 
> they could not be handled by protocol-http.
> I changed this list to the system default -- SSLSocketFactory's 
> .getDefaultCipherSuites() to be precise. One could also use 
> .getSupportedCipherSuites() here, I suppose.
> The original list should be moved to nutch-default.xml or omitted altogether. 
> The protocol list is still hard-coded, but it is now also added to 
> nutch-default.xml (so it can be easily changed manually if needed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)