[jira] [Created] (NUTCH-2782) protocol-http / lib-http: support TLSv1.3

2020-04-23 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2782: -- Summary: protocol-http / lib-http: support TLSv1.3 Key: NUTCH-2782 URL: https://issues.apache.org/jira/browse/NUTCH-2782 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2379) crawl script dedup's crawldb update is slow

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090496#comment-17090496 ] Sebastian Nagel commented on NUTCH-2379: This is addressed in [PR

[jira] [Commented] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script

2020-04-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090475#comment-17090475 ] ASF GitHub Bot commented on NUTCH-2501: --- sebastian-nagel opened a new pull request #513: URL:

[jira] [Updated] (NUTCH-2501) allow to set Java heap size when using crawl script in distributed mode

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2501: --- Summary: allow to set Java heap size when using crawl script in distributed mode (was: Take

[jira] [Commented] (NUTCH-2342) Inlinks are not being indexed as part of index-links plugin

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090491#comment-17090491 ] Sebastian Nagel commented on NUTCH-2342: Sounds like a documentation problem: in order to index

[GitHub] [nutch] sebastian-nagel opened a new pull request #513: NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode

2020-04-23 Thread GitBox
sebastian-nagel opened a new pull request #513: URL: https://github.com/apache/nutch/pull/513 - bin/crawl - add hint how to set map and reduce task memory via -D ... options - use -D options for all steps (Nutch tools) - fix quoting of -D options, eg. -D

[GitHub] [nutch] sebastian-nagel commented on a change in pull request #279: NUTCH-2501: Take NUTCH_HEAPSIZE into account when crawling using crawl script

2020-04-23 Thread GitBox
sebastian-nagel commented on a change in pull request #279: URL: https://github.com/apache/nutch/pull/279#discussion_r413699673 ## File path: src/bin/crawl ## @@ -171,6 +175,8 @@ fi CRAWL_PATH="$1" LIMIT="$2" +JAVA_CHILD_HEAP_MB=`expr "$NUTCH_HEAP_MB" / "$NUM_TASKS"`

[jira] [Commented] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script

2020-04-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090480#comment-17090480 ] ASF GitHub Bot commented on NUTCH-2501: --- sebastian-nagel commented on a change in pull request

[jira] [Updated] (NUTCH-2780) Upgrade index-solr to use Solr 8.5.1

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2780: --- Labels: help-wanted (was: ) > Upgrade index-solr to use Solr 8.5.1 >

[jira] [Comment Edited] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090510#comment-17090510 ] Sebastian Nagel edited comment on NUTCH-2681 at 4/23/20, 11:01 AM: ---

[jira] [Updated] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2681: --- Fix Version/s: (was: 1.17) > ClassCastException - Apache Nutch 1.x, Selenium v2.48.2,

[jira] [Resolved] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2681. Resolution: Abandoned Well, Nutch now uses Selenium 3.141.5 (after NUTCH-2716) and Firefox

[jira] [Updated] (NUTCH-2342) Inlinks are not being indexed as part of index-links plugin

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2342: --- Fix Version/s: 1.17 > Inlinks are not being indexed as part of index-links plugin >

[jira] [Created] (NUTCH-2781) Increase default Java heap size

2020-04-23 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2781: -- Summary: Increase default Java heap size Key: NUTCH-2781 URL: https://issues.apache.org/jira/browse/NUTCH-2781 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2779) Upgrade to Tika 1.24.1

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090345#comment-17090345 ] Sebastian Nagel commented on NUTCH-2779: [Tika 1.24.1 is

[GitHub] [nutch] sebastian-nagel opened a new pull request #512: NUTCH-2781 Increase default Java heap size

2020-04-23 Thread GitBox
sebastian-nagel opened a new pull request #512: URL: https://github.com/apache/nutch/pull/512 - increase default value for NUTCH_HEAPSIZE to 4096 MB (from 1000 MB) - remove -Dmapred.child.java.opts=-Xmx1000m from default options in bin/crawl

[jira] [Resolved] (NUTCH-2385) 1.x Elasticsearch Indexer - path.home is not configured

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2385. Fix Version/s: (was: 1.17) Resolution: Abandoned Nutch now uses the

[jira] [Resolved] (NUTCH-2274) InteractiveSelenium Plugin's DefaultHandler Returns Null

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2274. Fix Version/s: (was: 1.17) Resolution: Abandoned Nutch now uses Selenium

[jira] [Comment Edited] (NUTCH-1103) Port protocol-sftp to 1.4

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090321#comment-17090321 ] Sebastian Nagel edited comment on NUTCH-1103 at 4/23/20, 6:50 AM: -- Well,

[jira] [Commented] (NUTCH-1103) Port protocol-sftp to 1.4

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090321#comment-17090321 ] Sebastian Nagel commented on NUTCH-1103: Well, looking at the history obviously not :( - there is

[DISCUSS] Release 1.17 ?

2020-04-23 Thread Sebastian Nagel
Hi all, 30 issues are done now https://issues.apache.org/jira/browse/NUTCH/fixforversion/12346090 including a number of important dependency upgrades: - Hadoop 3.1 (NUTCH-2777) - Elasticsearch 7.3.0 REST client (NUTCH-2739) Thanks to Shashanka Balakuntala Srinivasa for both! Dependency

[jira] [Commented] (NUTCH-2781) Increase default Java heap size

2020-04-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090384#comment-17090384 ] ASF GitHub Bot commented on NUTCH-2781: --- sebastian-nagel opened a new pull request #512: URL:

[jira] [Assigned] (NUTCH-1103) Port protocol-sftp to 1.4

2020-04-23 Thread Shashanka Balakuntala Srinivasa (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashanka Balakuntala Srinivasa reassigned NUTCH-1103: -- Assignee: Shashanka Balakuntala Srinivasa > Port

[jira] [Updated] (NUTCH-1194) Generator: CrawlDB lock should be released earlier

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1194: --- Summary: Generator: CrawlDB lock should be released earlier (was: CrawlDB lock should be

[GitHub] [nutch] sebastian-nagel opened a new pull request #514: NUTCH-1194 Generator: CrawlDB lock should be released earlier

2020-04-23 Thread GitBox
sebastian-nagel opened a new pull request #514: URL: https://github.com/apache/nutch/pull/514 - release CrawlDb lock after select step, in case, generated items are not marked in CrawlDb (generate.update.crawldb is false)

[jira] [Commented] (NUTCH-1194) Generator: CrawlDB lock should be released earlier

2020-04-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090639#comment-17090639 ] ASF GitHub Bot commented on NUTCH-1194: --- sebastian-nagel opened a new pull request #514: URL: