Sebastian Nagel created NUTCH-2782:
--
Summary: protocol-http / lib-http: support TLSv1.3
Key: NUTCH-2782
URL: https://issues.apache.org/jira/browse/NUTCH-2782
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090496#comment-17090496
]
Sebastian Nagel commented on NUTCH-2379:
This is addressed in [PR
[
https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090475#comment-17090475
]
ASF GitHub Bot commented on NUTCH-2501:
---
sebastian-nagel opened a new pull request #513:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2501:
---
Summary: allow to set Java heap size when using crawl script in distributed
mode (was: Take
[
https://issues.apache.org/jira/browse/NUTCH-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090491#comment-17090491
]
Sebastian Nagel commented on NUTCH-2342:
Sounds like a documentation problem: in order to index
sebastian-nagel opened a new pull request #513:
URL: https://github.com/apache/nutch/pull/513
- bin/crawl
- add hint how to set map and reduce task memory via -D ... options
- use -D options for all steps (Nutch tools)
- fix quoting of -D options, eg. -D
sebastian-nagel commented on a change in pull request #279:
URL: https://github.com/apache/nutch/pull/279#discussion_r413699673
##
File path: src/bin/crawl
##
@@ -171,6 +175,8 @@ fi
CRAWL_PATH="$1"
LIMIT="$2"
+JAVA_CHILD_HEAP_MB=`expr "$NUTCH_HEAP_MB" / "$NUM_TASKS"`
[
https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090480#comment-17090480
]
ASF GitHub Bot commented on NUTCH-2501:
---
sebastian-nagel commented on a change in pull request
[
https://issues.apache.org/jira/browse/NUTCH-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2780:
---
Labels: help-wanted (was: )
> Upgrade index-solr to use Solr 8.5.1
>
[
https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090510#comment-17090510
]
Sebastian Nagel edited comment on NUTCH-2681 at 4/23/20, 11:01 AM:
---
[
https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2681:
---
Fix Version/s: (was: 1.17)
> ClassCastException - Apache Nutch 1.x, Selenium v2.48.2,
[
https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2681.
Resolution: Abandoned
Well, Nutch now uses Selenium 3.141.5 (after NUTCH-2716) and Firefox
[
https://issues.apache.org/jira/browse/NUTCH-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2342:
---
Fix Version/s: 1.17
> Inlinks are not being indexed as part of index-links plugin
>
Sebastian Nagel created NUTCH-2781:
--
Summary: Increase default Java heap size
Key: NUTCH-2781
URL: https://issues.apache.org/jira/browse/NUTCH-2781
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090345#comment-17090345
]
Sebastian Nagel commented on NUTCH-2779:
[Tika 1.24.1 is
sebastian-nagel opened a new pull request #512:
URL: https://github.com/apache/nutch/pull/512
- increase default value for NUTCH_HEAPSIZE to 4096 MB (from 1000 MB)
- remove -Dmapred.child.java.opts=-Xmx1000m from default options in bin/crawl
[
https://issues.apache.org/jira/browse/NUTCH-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2385.
Fix Version/s: (was: 1.17)
Resolution: Abandoned
Nutch now uses the
[
https://issues.apache.org/jira/browse/NUTCH-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2274.
Fix Version/s: (was: 1.17)
Resolution: Abandoned
Nutch now uses Selenium
[
https://issues.apache.org/jira/browse/NUTCH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090321#comment-17090321
]
Sebastian Nagel edited comment on NUTCH-1103 at 4/23/20, 6:50 AM:
--
Well,
[
https://issues.apache.org/jira/browse/NUTCH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090321#comment-17090321
]
Sebastian Nagel commented on NUTCH-1103:
Well, looking at the history obviously not :(
- there is
Hi all,
30 issues are done now
https://issues.apache.org/jira/browse/NUTCH/fixforversion/12346090
including a number of important dependency upgrades:
- Hadoop 3.1 (NUTCH-2777)
- Elasticsearch 7.3.0 REST client (NUTCH-2739)
Thanks to Shashanka Balakuntala Srinivasa for both!
Dependency
[
https://issues.apache.org/jira/browse/NUTCH-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090384#comment-17090384
]
ASF GitHub Bot commented on NUTCH-2781:
---
sebastian-nagel opened a new pull request #512:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shashanka Balakuntala Srinivasa reassigned NUTCH-1103:
--
Assignee: Shashanka Balakuntala Srinivasa
> Port
[
https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1194:
---
Summary: Generator: CrawlDB lock should be released earlier (was: CrawlDB
lock should be
sebastian-nagel opened a new pull request #514:
URL: https://github.com/apache/nutch/pull/514
- release CrawlDb lock after select step, in case, generated items are not
marked in CrawlDb (generate.update.crawldb is false)
[
https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090639#comment-17090639
]
ASF GitHub Bot commented on NUTCH-1194:
---
sebastian-nagel opened a new pull request #514:
URL:
26 matches
Mail list logo