[jira] [Resolved] (NUTCH-2808) Document side effects of ignoring robots.txt

2021-12-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2808. Resolution: Implemented > Document side effects of ignoring robots.txt > --

[jira] [Resolved] (NUTCH-2918) Upgrade to log4j 2.16.0

2021-12-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2918. Resolution: Fixed > Upgrade to log4j 2.16.0 > --- > > K

[jira] [Assigned] (NUTCH-2918) Upgrade to log4j 2.16.0

2021-12-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2918: -- Assignee: Sebastian Nagel > Upgrade to log4j 2.16.0 > --- > >

[jira] [Resolved] (NUTCH-2916) Fix log file rotation / rename default log file

2021-12-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2916. Resolution: Fixed > Fix log file rotation / rename default log file > -

[jira] [Created] (NUTCH-2918) Upgrade to log4j 2.16.0

2021-12-14 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2918: -- Summary: Upgrade to log4j 2.16.0 Key: NUTCH-2918 URL: https://issues.apache.org/jira/browse/NUTCH-2918 Project: Nutch Issue Type: Improvement C

[jira] [Resolved] (NUTCH-2915) Upgrade to log4j 2.15.0

2021-12-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2915. Resolution: Fixed > Upgrade to log4j 2.15.0 > --- > > K

[jira] [Created] (NUTCH-2917) Remove transitive dependency to log4j 1.x

2021-12-13 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2917: -- Summary: Remove transitive dependency to log4j 1.x Key: NUTCH-2917 URL: https://issues.apache.org/jira/browse/NUTCH-2917 Project: Nutch Issue Type: Bug

[jira] [Assigned] (NUTCH-2916) Fix log file rotation / rename default log file

2021-12-13 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2916: -- Assignee: Sebastian Nagel > Fix log file rotation / rename default log file >

[jira] [Created] (NUTCH-2916) Fix log file rotation / rename default log file

2021-12-13 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2916: -- Summary: Fix log file rotation / rename default log file Key: NUTCH-2916 URL: https://issues.apache.org/jira/browse/NUTCH-2916 Project: Nutch Issue Type:

[jira] [Assigned] (NUTCH-2915) Upgrade to log4j 2.15.0

2021-12-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2915: -- Assignee: Sebastian Nagel > Upgrade to log4j 2.15.0 > --- > >

[jira] [Updated] (NUTCH-2915) Upgrade to log4j 2.15.0

2021-12-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2915: --- Description: See [Apache Log4j Security Vulnerabilities|https://logging.apache.org/log4j/2.x

[jira] [Created] (NUTCH-2915) Upgrade to log4j 2.15.0

2021-12-12 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2915: -- Summary: Upgrade to log4j 2.15.0 Key: NUTCH-2915 URL: https://issues.apache.org/jira/browse/NUTCH-2915 Project: Nutch Issue Type: Bug Component

[jira] [Commented] (NUTCH-2911) Add cleanup call in Fetcher.java

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453074#comment-17453074 ] Sebastian Nagel commented on NUTCH-2911: Hi [~prakharchaube], thanks. I've also s

[jira] [Updated] (NUTCH-2807) SitemapProcessor to warn that ignoring robots.txt affects detection of sitemaps

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2807: --- Summary: SitemapProcessor to warn that ignoring robots.txt affects detection of sitemaps (wa

[jira] [Commented] (NUTCH-2909) Establish a metrics naming convention

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453064#comment-17453064 ] Sebastian Nagel commented on NUTCH-2909: +1 bq. nutch.Injector.injector.urls_fil

[jira] [Assigned] (NUTCH-2807) SitemapProcessor to warn that ignoring robotst.xt affects detection of sitemaps

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2807: -- Assignee: Sebastian Nagel > SitemapProcessor to warn that ignoring robotst.xt affects

[jira] [Assigned] (NUTCH-2808) Document side effects of ignoring robots.txt

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2808: -- Assignee: Sebastian Nagel > Document side effects of ignoring robots.txt > ---

[jira] [Updated] (NUTCH-2914) nutch-default.xml: remove obsolete and unused properties

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2914: --- Summary: nutch-default.xml: remove obsolete and unused properties (was: nutch-default.xml: r

[jira] [Assigned] (NUTCH-2914) nutch-default.xml: remove obsolete and unused properties

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2914: -- Assignee: Sebastian Nagel > nutch-default.xml: remove obsolete and unused properties >

[jira] [Created] (NUTCH-2914) nutch-default.xml: remove obselete and unused properties

2021-12-03 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2914: -- Summary: nutch-default.xml: remove obselete and unused properties Key: NUTCH-2914 URL: https://issues.apache.org/jira/browse/NUTCH-2914 Project: Nutch Is

[jira] [Commented] (NUTCH-2910) FetchItemQueues overloaded constructor also interprets fetcher timeout as -1 e.g. no-timeout.

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452987#comment-17452987 ] Sebastian Nagel commented on NUTCH-2910: Hi [~lewismc], I've opened NUTCH-2913 to

[jira] [Commented] (NUTCH-2913) nutch-default.xml: document properties set programatically

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452983#comment-17452983 ] Sebastian Nagel commented on NUTCH-2913: Properties are not documented in nutch-d

[jira] [Created] (NUTCH-2913) nutch-default.xml: document properties set programatically

2021-12-03 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2913: -- Summary: nutch-default.xml: document properties set programatically Key: NUTCH-2913 URL: https://issues.apache.org/jira/browse/NUTCH-2913 Project: Nutch

[jira] [Resolved] (NUTCH-2911) Add cleanup call in Fetcher.java

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2911. Resolution: Fixed Thanks, [~prakharchaube]! > Add cleanup call in Fetcher.java > -

[jira] [Updated] (NUTCH-2911) Add cleanup call in Fetcher.java

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2911: --- Affects Version/s: 1.18 > Add cleanup call in Fetcher.java >

[jira] [Updated] (NUTCH-2911) Add cleanup call in Fetcher.java

2021-12-03 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2911: --- Fix Version/s: 1.19 > Add cleanup call in Fetcher.java > > >

[jira] [Resolved] (NUTCH-2860) Upgrade to Tika 1.26

2021-12-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2860. Resolution: Duplicate Superceded by NUTCH-2891. > Upgrade to Tika 1.26 > -

[jira] [Updated] (NUTCH-2860) Upgrade to Tika 1.26

2021-12-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2860: --- Fix Version/s: (was: 1.19) > Upgrade to Tika 1.26 > > >

[jira] [Commented] (NUTCH-2910) FetchItemQueues overloaded constructor also interprets fetcher timeout as -1 e.g. no-timeout.

2021-12-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451689#comment-17451689 ] Sebastian Nagel commented on NUTCH-2910: Hi [~lewismc], this is not a problem: -

[jira] [Resolved] (NUTCH-2905) Mask sensitive strings in log output of index writers

2021-12-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2905. Resolution: Fixed > Mask sensitive strings in log output of index writers > ---

[jira] [Resolved] (NUTCH-2908) Log mapreduce job messages and counters in local mode

2021-12-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2908. Resolution: Fixed Thanks everybody! > Log mapreduce job messages and counters in local mod

[jira] [Resolved] (NUTCH-2891) Upgrade to Tika 2.1

2021-12-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2891. Resolution: Implemented > Upgrade to Tika 2.1 > --- > > Key

[jira] [Commented] (NUTCH-2826) Migrate Nutch Site from Apache CMS to Hugo

2021-11-24 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448744#comment-17448744 ] Sebastian Nagel commented on NUTCH-2826: Great work, [~lewismc]! > Migrate Nutch

[jira] [Commented] (NUTCH-2908) Log mapreduce job messages and counters in local mode

2021-11-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447983#comment-17447983 ] Sebastian Nagel commented on NUTCH-2908: Yes, that's it. Only slightly longer tha

[jira] [Assigned] (NUTCH-2908) Log mapreduce job messages and counters in local mode

2021-11-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2908: -- Assignee: Sebastian Nagel > Log mapreduce job messages and counters in local mode > --

[jira] [Created] (NUTCH-2908) Log mapreduce job messages and counters in local mode

2021-11-22 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2908: -- Summary: Log mapreduce job messages and counters in local mode Key: NUTCH-2908 URL: https://issues.apache.org/jira/browse/NUTCH-2908 Project: Nutch Issue

[jira] [Resolved] (NUTCH-2867) Support for custom HostDb aggregators

2021-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2867. Resolution: Implemented > Support for custom HostDb aggregators > -

[jira] [Resolved] (NUTCH-2865) WARC exporter support for metadata and dropping empty responses

2021-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2865. Resolution: Implemented > WARC exporter support for metadata and dropping empty responses >

[jira] [Resolved] (NUTCH-2892) Upgrade to Any23 2.5

2021-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2892. Resolution: Implemented Merged PR. Thanks, [~lewismc]! > Upgrade to Any23 2.5 > --

[jira] [Commented] (NUTCH-2867) Support for custom HostDb aggregators

2021-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447396#comment-17447396 ] Sebastian Nagel commented on NUTCH-2867: Thanks, [~markus17]! Yep, good catch - u

[jira] [Commented] (NUTCH-2865) WARC exporter support for metadata and dropping empty responses

2021-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447395#comment-17447395 ] Sebastian Nagel commented on NUTCH-2865: Thanks, [~markus17]! I'll commit it shor

[jira] [Resolved] (NUTCH-2904) Upgrade to crawler-commons 1.2

2021-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2904. Resolution: Implemented > Upgrade to crawler-commons 1.2 > -- >

[jira] [Commented] (NUTCH-2865) WARC exporter support for metadata and dropping empty responses

2021-11-19 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446676#comment-17446676 ] Sebastian Nagel commented on NUTCH-2865: Hi [~markus17], looks good and tests wen

[jira] [Commented] (NUTCH-2867) Support for custom HostDb aggregators

2021-11-19 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446670#comment-17446670 ] Sebastian Nagel commented on NUTCH-2867: Hi [~markus17], great! Successfully test

[jira] [Commented] (NUTCH-2867) Support for custom HostDb aggregators

2021-11-19 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446538#comment-17446538 ] Sebastian Nagel commented on NUTCH-2867: Hi [~markus17], the patch adds 2 classes

[jira] [Closed] (NUTCH-1717) HostDB not to complain if filters/normalizers are disabled

2021-11-19 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-1717. -- Resolution: Fixed I think this can be closed: fixed 2014. > HostDB not to complain if filters/

[jira] [Created] (NUTCH-2907) protocol-selenium: HTTPS proxy not working

2021-11-18 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2907: -- Summary: protocol-selenium: HTTPS proxy not working Key: NUTCH-2907 URL: https://issues.apache.org/jira/browse/NUTCH-2907 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-2902) Jexl parsing error on statements

2021-11-18 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2902. Resolution: Fixed Patch applied in [e837324|https://github.com/apache/nutch/commit/e837324

[jira] [Created] (NUTCH-2906) Allow to use credential provider for index writers configuration

2021-11-17 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2906: -- Summary: Allow to use credential provider for index writers configuration Key: NUTCH-2906 URL: https://issues.apache.org/jira/browse/NUTCH-2906 Project: Nutch

[jira] [Updated] (NUTCH-2906) Allow to use credential provider for index writers configuration

2021-11-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2906: --- Labels: help-wanted (was: ) > Allow to use credential provider for index writers configurati

[jira] [Updated] (NUTCH-2903) Unable to Connect to Elasticsearch over HTTPS

2021-11-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2903: --- Fix Version/s: 1.19 > Unable to Connect to Elasticsearch over HTTPS > ---

[jira] [Assigned] (NUTCH-2903) Unable to Connect to Elasticsearch over HTTPS

2021-11-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2903: -- Assignee: Sebastian Nagel > Unable to Connect to Elasticsearch over HTTPS > --

[jira] [Created] (NUTCH-2905) Mask sensitive strings in log output of index writers

2021-11-17 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2905: -- Summary: Mask sensitive strings in log output of index writers Key: NUTCH-2905 URL: https://issues.apache.org/jira/browse/NUTCH-2905 Project: Nutch Issue

[jira] [Created] (NUTCH-2904) Upgrade to crawler-commons 1.2

2021-11-17 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2904: -- Summary: Upgrade to crawler-commons 1.2 Key: NUTCH-2904 URL: https://issues.apache.org/jira/browse/NUTCH-2904 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-2902) Jexl parsing error on statements

2021-11-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2902: --- Fix Version/s: 1.19 > Jexl parsing error on statements > > >

[jira] [Updated] (NUTCH-2902) Jexl parsing error on statements

2021-11-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2902: --- Affects Version/s: 1.18 > Jexl parsing error on statements >

[jira] [Created] (NUTCH-2902) Jexl parsing error on statements

2021-11-11 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2902: -- Summary: Jexl parsing error on statements Key: NUTCH-2902 URL: https://issues.apache.org/jira/browse/NUTCH-2902 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2901) migrate to maven or gradle

2021-11-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442355#comment-17442355 ] Sebastian Nagel commented on NUTCH-2901: See NUTCH-2292 - nothing trivial because

[jira] [Resolved] (NUTCH-2899) Remove needless warning about missing o/a/rat/anttasks/antlib.xml

2021-11-11 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2899. Resolution: Fixed > Remove needless warning about missing o/a/rat/anttasks/antlib.xml > ---

[jira] [Assigned] (NUTCH-2899) Remove needless warning about missing o/a/rat/anttasks/antlib.xml

2021-10-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2899: -- Assignee: Sebastian Nagel > Remove needless warning about missing o/a/rat/anttasks/ant

[jira] [Created] (NUTCH-2899) Remove needless warning about missing o/a/rat/anttasks/antlib.xml

2021-10-22 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2899: -- Summary: Remove needless warning about missing o/a/rat/anttasks/antlib.xml Key: NUTCH-2899 URL: https://issues.apache.org/jira/browse/NUTCH-2899 Project: Nutch

[jira] [Assigned] (NUTCH-2862) Do not include Ivy jar in source release package

2021-10-20 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2862: -- Assignee: Sebastian Nagel > Do not include Ivy jar in source release package > ---

[jira] [Resolved] (NUTCH-2862) Do not include Ivy jar in source release package

2021-10-20 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2862. Resolution: Fixed > Do not include Ivy jar in source release package >

[jira] [Resolved] (NUTCH-2890) Protocol-okhttp: upgrade okhttp to 4.9.1 to address infinite connection retries

2021-09-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2890. Resolution: Implemented > Protocol-okhttp: upgrade okhttp to 4.9.1 to address infinite conn

[jira] [Resolved] (NUTCH-2894) Java plugin compilation classpath: priorize plugin dependencies

2021-09-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2894. Resolution: Fixed > Java plugin compilation classpath: priorize plugin dependencies > -

[jira] [Created] (NUTCH-2896) Protocol-okhttp: make connection pool configurable

2021-09-21 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2896: -- Summary: Protocol-okhttp: make connection pool configurable Key: NUTCH-2896 URL: https://issues.apache.org/jira/browse/NUTCH-2896 Project: Nutch Issue Ty

[jira] [Updated] (NUTCH-2890) Protocol-okhttp: upgrade okhttp to 4.9.1 to address infinite connection retries

2021-09-20 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2890: --- Summary: Protocol-okhttp: upgrade okhttp to 4.9.1 to address infinite connection retries (wa

[jira] [Created] (NUTCH-2895) Allow to add plugin dependency jars by wildcard

2021-09-20 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2895: -- Summary: Allow to add plugin dependency jars by wildcard Key: NUTCH-2895 URL: https://issues.apache.org/jira/browse/NUTCH-2895 Project: Nutch Issue Type:

[jira] [Assigned] (NUTCH-2894) Java plugin compilation classpath: priorize plugin dependencies

2021-09-19 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2894: -- Assignee: Sebastian Nagel > Java plugin compilation classpath: priorize plugin depende

[jira] [Updated] (NUTCH-2894) Java plugin compilation classpath: priorize plugin dependencies

2021-09-19 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2894: --- Summary: Java plugin compilation classpath: priorize plugin dependencies (was: Java compilat

[jira] [Created] (NUTCH-2894) Java compilation classpath: priorize plugin dependencies

2021-09-19 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2894: -- Summary: Java compilation classpath: priorize plugin dependencies Key: NUTCH-2894 URL: https://issues.apache.org/jira/browse/NUTCH-2894 Project: Nutch Is

[jira] [Created] (NUTCH-2891) Upgrade to Tika 2.1

2021-09-17 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2891: -- Summary: Upgrade to Tika 2.1 Key: NUTCH-2891 URL: https://issues.apache.org/jira/browse/NUTCH-2891 Project: Nutch Issue Type: Improvement Compo

[jira] [Updated] (NUTCH-2890) Protocok-okhttp: upgrade okhttp to 4.9.1 to address infinite connection retries

2021-09-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2890: --- Attachment: fetcher-protocol-okhttp-NUTCH-2890-async-profiler.png > Protocok-okhttp: upgrade

[jira] [Created] (NUTCH-2890) Protocok-okhttp: upgrade okhttp to 4.9.1 to address infinite connection retries

2021-09-17 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2890: -- Summary: Protocok-okhttp: upgrade okhttp to 4.9.1 to address infinite connection retries Key: NUTCH-2890 URL: https://issues.apache.org/jira/browse/NUTCH-2890 Pro

[jira] [Updated] (NUTCH-2889) nutch indexer-elasticsearch plugin, doesn't work with https protocol

2021-09-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2889: --- Component/s: indexer > nutch indexer-elasticsearch plugin, doesn't work with https protocol >

[jira] [Updated] (NUTCH-2889) nutch indexer-elasticsearch plugin, doesn't work with https protocol

2021-09-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2889: --- Fix Version/s: (was: 1.18) 1.19 > nutch indexer-elasticsearch plugin,

[jira] [Updated] (NUTCH-2889) nutch indexer-elasticsearch plugin, doesn't work with https protocol

2021-09-17 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2889: --- Labels: help-wanted indexer-elasticsearch plugin (was: indexer-elasticsearch plugin) > nutc

[jira] [Commented] (NUTCH-2826) Migrate Nutch Site from Apache CMS to Hugo

2021-09-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408357#comment-17408357 ] Sebastian Nagel commented on NUTCH-2826: Great! And thanks! Let me know when some

[jira] [Commented] (NUTCH-2886) Move Nutch WebApp to separate repository

2021-07-13 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379675#comment-17379675 ] Sebastian Nagel commented on NUTCH-2886: +1 ??reducing the number of dependencie

[jira] [Updated] (NUTCH-2880) parse-html/tika: update/complete HTML elements to extract outlinks from

2021-06-18 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2880: --- Labels: help-wanted (was: ) > parse-html/tika: update/complete HTML elements to extract outl

[jira] [Created] (NUTCH-2880) parse-html/tika: update/complete HTML elements to extract outlinks from

2021-06-18 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2880: -- Summary: parse-html/tika: update/complete HTML elements to extract outlinks from Key: NUTCH-2880 URL: https://issues.apache.org/jira/browse/NUTCH-2880 Project: Nu

[jira] [Resolved] (NUTCH-2868) urlnormalizer-protocol fails with StringIndexOutOfBoundsException when reading invalid line in configuration file

2021-06-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2868. Resolution: Fixed > urlnormalizer-protocol fails with StringIndexOutOfBoundsException when

[jira] [Assigned] (NUTCH-2868) urlnormalizer-protocol fails with StringIndexOutOfBoundsException when reading invalid line in configuration file

2021-06-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2868: -- Assignee: Sebastian Nagel > urlnormalizer-protocol fails with StringIndexOutOfBoundsEx

[jira] [Resolved] (NUTCH-2869) Add @Override annotations to Nutch plugins

2021-06-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2869. Resolution: Implemented Committed. Thanks, [~markus17] and [~lewismc] for the reviews! > A

[jira] [Assigned] (NUTCH-2869) Add @Override annotations to Nutch plugins

2021-06-12 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2869: -- Assignee: Sebastian Nagel > Add @Override annotations to Nutch plugins > -

[jira] [Created] (NUTCH-2869) Add @Override annotations to Nutch plugins

2021-06-10 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2869: -- Summary: Add @Override annotations to Nutch plugins Key: NUTCH-2869 URL: https://issues.apache.org/jira/browse/NUTCH-2869 Project: Nutch Issue Type: Impr

[jira] [Created] (NUTCH-2868) urlnormalizer-protocol fails with StringIndexOutOfBoundsException when reading invalid line in configuration file

2021-06-10 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2868: -- Summary: urlnormalizer-protocol fails with StringIndexOutOfBoundsException when reading invalid line in configuration file Key: NUTCH-2868 URL: https://issues.apache.org/jira/

[jira] [Commented] (NUTCH-2855) Update org.elasticsearch.client

2021-06-10 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360651#comment-17360651 ] Sebastian Nagel commented on NUTCH-2855: Hi [~markus17], I'm developing on Kubunt

[jira] [Assigned] (NUTCH-2866) MetaData.toString() should return "key=value ..."

2021-06-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2866: -- Assignee: Sebastian Nagel > MetaData.toString() should return "key=value ..." > --

[jira] [Resolved] (NUTCH-2866) MetaData.toString() should return "key=value ..."

2021-06-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2866. Resolution: Fixed > MetaData.toString() should return "key=value ..." > ---

[jira] [Commented] (NUTCH-2866) MetaData.toString() should return "key=value ..."

2021-06-01 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355140#comment-17355140 ] Sebastian Nagel commented on NUTCH-2866: [~markus17], you're joking! A trivial pa

[jira] [Comment Edited] (NUTCH-2865) WARC exporter support for metadata and dropping empty responses

2021-05-31 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354530#comment-17354530 ] Sebastian Nagel edited comment on NUTCH-2865 at 5/31/21, 3:57 PM: -

[jira] [Commented] (NUTCH-2865) WARC exporter support for metadata and dropping empty responses

2021-05-31 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354530#comment-17354530 ] Sebastian Nagel commented on NUTCH-2865: Hi [~markus17], there are still print st

[jira] [Created] (NUTCH-2866) MetaData.toString() should return "key=value ..."

2021-05-31 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2866: -- Summary: MetaData.toString() should return "key=value ..." Key: NUTCH-2866 URL: https://issues.apache.org/jira/browse/NUTCH-2866 Project: Nutch Issue Typ

[jira] [Commented] (NUTCH-2855) Update org.elasticsearch.client

2021-05-20 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348199#comment-17348199 ] Sebastian Nagel commented on NUTCH-2855: Another option is to try the Docker buil

[jira] [Created] (NUTCH-2864) Upgrade Dockerfile to use JDK 11

2021-05-20 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2864: -- Summary: Upgrade Dockerfile to use JDK 11 Key: NUTCH-2864 URL: https://issues.apache.org/jira/browse/NUTCH-2864 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2855) Update org.elasticsearch.client

2021-05-20 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348187#comment-17348187 ] Sebastian Nagel commented on NUTCH-2855: Ant/ivy should install the jar {{build/

[jira] [Created] (NUTCH-2863) Injector to parse command-line flags case-insensitive

2021-05-07 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2863: -- Summary: Injector to parse command-line flags case-insensitive Key: NUTCH-2863 URL: https://issues.apache.org/jira/browse/NUTCH-2863 Project: Nutch Issue

[jira] [Created] (NUTCH-2862) Do not include Ivy jar in source release package

2021-04-19 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2862: -- Summary: Do not include Ivy jar in source release package Key: NUTCH-2862 URL: https://issues.apache.org/jira/browse/NUTCH-2862 Project: Nutch Issue Type

[jira] [Created] (NUTCH-2861) Remove parse-swf

2021-04-19 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2861: -- Summary: Remove parse-swf Key: NUTCH-2861 URL: https://issues.apache.org/jira/browse/NUTCH-2861 Project: Nutch Issue Type: Improvement Componen

<    1   2   3   4   5   6   7   8   9   10   >