[jira] [Updated] (NUTCH-3056) Injector to support resolving seed URLs

2024-05-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3056: - Description: We have a case where clients submit huge uncurated seed files, the host may not

[jira] [Updated] (NUTCH-3056) Injector to support resolving seed URLs

2024-05-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3056: - Description: We have a case where clients submit huge uncurated seed files, the host may not

[jira] [Created] (NUTCH-3056) Injector to support resolving seed URLs

2024-05-16 Thread Markus Jelsma (Jira)
Markus Jelsma created NUTCH-3056: Summary: Injector to support resolving seed URLs Key: NUTCH-3056 URL: https://issues.apache.org/jira/browse/NUTCH-3056 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-04-30 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842384#comment-17842384 ] Markus Jelsma commented on NUTCH-3028: -- Ok, the Content object is now also available in the

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-04-30 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3028: - Attachment: NUTCH-3028-2.patch > WARCExported to support filtering by JEXL >

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-04-30 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3028: - Description: Filtering segment data to WARC is now possible using JEXL expressions. In the next

[jira] [Commented] (NUTCH-3039) Failure to handle ftp:// URLs

2024-04-11 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836133#comment-17836133 ] Markus Jelsma commented on NUTCH-3039: -- Thanks for spotting that! > Failure to handle ftp:// URLs >

[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827048#comment-17827048 ] Markus Jelsma commented on NUTCH-3029: -- comment describing throws is also required these days.    

[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826823#comment-17826823 ] Markus Jelsma commented on NUTCH-3029: -- throws was missing too    84cda2abd..a8ec17ca8  master ->

[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826783#comment-17826783 ] Markus Jelsma commented on NUTCH-3029: -- Thanks Lewis!    5ba50c0c6..84cda2abd  master -> master  

[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826759#comment-17826759 ] Markus Jelsma commented on NUTCH-3029: --    4f62dec0f..5ba50c0c6  master -> master actual change

[jira] [Commented] (NUTCH-3033) Upgrade Ivy to v2.5.2

2024-03-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826760#comment-17826760 ] Markus Jelsma commented on NUTCH-3033: -- Ah, the new ivy works like a charm! Thanks! > Upgrade Ivy

[jira] [Resolved] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-3029. -- Resolution: Fixed Thanks Martin!    551c50b1c..4642c30c2  master -> master > Host specific

[jira] [Resolved] (NUTCH-3030) Use system default cipher suites instead of hard-coded set

2024-03-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-3030. -- Resolution: Fixed 42b55f6a9..551c50b1c  master -> master   Thanks Martin!   > Use system

[jira] [Updated] (NUTCH-3030) Use system default cipher suites instead of hard-coded set

2024-03-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3030: - Summary: Use system default cipher suites instead of hard-coded set (was: Update default TLS

[jira] [Commented] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-12 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825863#comment-17825863 ] Markus Jelsma commented on NUTCH-3032: -- No idea what git fork is supposed to do, maybe it should be

[jira] [Resolved] (NUTCH-3031) ProtocolFactory host mapper to support domains

2024-03-12 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-3031. -- Resolution: Fixed    83acd501e..c390dfc8b  master -> master > ProtocolFactory host mapper to

[jira] [Updated] (NUTCH-3031) ProtocolFactory host mapper to support domains

2024-03-05 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3031: - Attachment: NUTCH-3031.patch > ProtocolFactory host mapper to support domains >

[jira] [Created] (NUTCH-3031) ProtocolFactory host mapper to support domains

2024-03-05 Thread Markus Jelsma (Jira)
Markus Jelsma created NUTCH-3031: Summary: ProtocolFactory host mapper to support domains Key: NUTCH-3031 URL: https://issues.apache.org/jira/browse/NUTCH-3031 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-3030) Update default TLS cipher suites for http(s) protocol

2024-02-19 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818531#comment-17818531 ] Markus Jelsma commented on NUTCH-3030: -- For some reason the attached patch did not apply cleanly

[jira] [Updated] (NUTCH-3030) Update default TLS cipher suites for http(s) protocol

2024-02-19 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3030: - Attachment: NUTCH-3030.patch > Update default TLS cipher suites for http(s) protocol >

[jira] [Assigned] (NUTCH-3030) Update default TLS cipher suites for http(s) protocol

2024-02-19 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-3030: Assignee: Markus Jelsma > Update default TLS cipher suites for http(s) protocol >

[jira] [Assigned] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-02-19 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-3029: Assignee: Markus Jelsma > Host specific max. and min. intervals in adaptive scheduler >

[jira] [Commented] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-02-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815345#comment-17815345 ] Markus Jelsma commented on NUTCH-3028: -- New patch: when expression was not set, an exception was

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-02-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3028: - Attachment: NUTCH-3028-1.patch > WARCExported to support filtering by JEXL >

[jira] [Commented] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-02-06 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814731#comment-17814731 ] Markus Jelsma commented on NUTCH-3028: -- Any objections to this one before i get it in? >

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-02-01 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3028: - Description: Filtering segment data to WARC is now possible using JEXL expressions. In the next

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-02-01 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3028: - Attachment: NUTCH-3027.patch > WARCExported to support filtering by JEXL >

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-02-01 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3028: - Attachment: (was: NUTCH-3027.patch) > WARCExported to support filtering by JEXL >

[jira] [Updated] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-02-01 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-3028: - Attachment: NUTCH-3028.patch > WARCExported to support filtering by JEXL >

[jira] [Created] (NUTCH-3028) WARCExported to support filtering by JEXL

2024-02-01 Thread Markus Jelsma (Jira)
Markus Jelsma created NUTCH-3028: Summary: WARCExported to support filtering by JEXL Key: NUTCH-3028 URL: https://issues.apache.org/jira/browse/NUTCH-3028 Project: Nutch Issue Type:

[jira] [Work started] (NUTCH-3027) Trivial resource leak patch in DomainSuffixes.java

2024-01-19 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-3027 started by Markus Jelsma. > Trivial resource leak patch in DomainSuffixes.java >

[jira] [Commented] (NUTCH-3027) Trivial resource leak patch in DomainSuffixes.java

2024-01-19 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808614#comment-17808614 ] Markus Jelsma commented on NUTCH-3027: -- Thanks Sascha Kehrli! Committed 

[jira] [Resolved] (NUTCH-3027) Trivial resource leak patch in DomainSuffixes.java

2024-01-19 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-3027. -- Fix Version/s: 1.20 Resolution: Fixed > Trivial resource leak patch in

[jira] [Assigned] (NUTCH-3027) Trivial resource leak patch in DomainSuffixes.java

2024-01-19 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-3027: Assignee: Markus Jelsma > Trivial resource leak patch in DomainSuffixes.java >

[jira] [Commented] (NUTCH-1635) New crawldb sometimes ends up in current

2023-10-02 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771023#comment-17771023 ] Markus Jelsma commented on NUTCH-1635: -- Good point! No, we haven't seen this behaviour for the past

[jira] [Closed] (NUTCH-1635) New crawldb sometimes ends up in current

2023-10-02 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-1635. Resolution: Not A Problem > New crawldb sometimes ends up in current >

[jira] [Commented] (NUTCH-3007) Fix impossible casts

2023-09-28 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769989#comment-17769989 ] Markus Jelsma commented on NUTCH-3007: -- +1 yes! > Fix impossible casts > > >

[jira] [Commented] (NUTCH-2852) Method invokes System.exit(...) 9 bugs

2023-09-28 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769988#comment-17769988 ] Markus Jelsma commented on NUTCH-2852: -- Seems just fine for these files +1 > Method invokes

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2023-09-18 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766306#comment-17766306 ] Markus Jelsma commented on NUTCH-2978: -- Thanks for picking it up. I am very happy this one is

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2023-09-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764699#comment-17764699 ] Markus Jelsma commented on NUTCH-2978: -- You managed to get it up and running, as well when deployed

[jira] [Commented] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764697#comment-17764697 ] Markus Jelsma commented on NUTCH-3000: -- Yes, this is a bit odd indeed. +1 > protocol-selenium

[jira] [Commented] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760522#comment-17760522 ] Markus Jelsma commented on NUTCH-2999: -- Seems fine +1 > Update Lucene version to latest 8.x >

[jira] [Commented] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-28 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738191#comment-17738191 ] Markus Jelsma commented on NUTCH-2993: -- To be honest, i am not too happy with the implementation

[jira] [Commented] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-28 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738047#comment-17738047 ] Markus Jelsma commented on NUTCH-2993: -- Thanks Sebastian! # changed the checks again. # check for

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-28 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: NUTCH-2993-1.15-1.patch > ScoringDepth plugin to skip depth check based on URL

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-28 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: (was: NUTCH-2993.patch) > ScoringDepth plugin to skip depth check based on URL

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-28 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: NUTCH-2993.patch > ScoringDepth plugin to skip depth check based on URL Pattern >

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-28 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Description: We do not want some crawl to go deep and broad, but instead focus it on a narrow

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: NUTCH-2993-1.15.patch > ScoringDepth plugin to skip depth check based on URL Pattern

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: (was: NUTCH-2993-1.15-1.patch) > ScoringDepth plugin to skip depth check based

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: (was: NUTCH-2993-1.15.patch) > ScoringDepth plugin to skip depth check based on

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: NUTCH-2993-1.15-1.patch > ScoringDepth plugin to skip depth check based on URL

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Description: We do not want some crawl to go deep and broad, but instead focus it on a narrow

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to skip depth check based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Summary: ScoringDepth plugin to skip depth check based on URL Pattern (was: ScoringDepth plugin

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to override maxDepth based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: NUTCH-2993-1.15.patch > ScoringDepth plugin to override maxDepth based on URL

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to override maxDepth based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: (was: NUTCH-2993-1.15.patch) > ScoringDepth plugin to override maxDepth based on

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to override maxDepth based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: (was: NUTCH-2993-1.15.patch) > ScoringDepth plugin to override maxDepth based on

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to override maxDepth based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: NUTCH-2993-1.15.patch > ScoringDepth plugin to override maxDepth based on URL

[jira] [Commented] (NUTCH-2993) ScoringDepth plugin to override maxDepth based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730535#comment-17730535 ] Markus Jelsma commented on NUTCH-2993: -- Here's a simple patch against Nutch 1.15. Will patch for

[jira] [Updated] (NUTCH-2993) ScoringDepth plugin to override maxDepth based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2993: - Attachment: NUTCH-2993-1.15.patch > ScoringDepth plugin to override maxDepth based on URL

[jira] [Created] (NUTCH-2993) ScoringDepth plugin to override maxDepth based on URL Pattern

2023-06-08 Thread Markus Jelsma (Jira)
Markus Jelsma created NUTCH-2993: Summary: ScoringDepth plugin to override maxDepth based on URL Pattern Key: NUTCH-2993 URL: https://issues.apache.org/jira/browse/NUTCH-2993 Project: Nutch

[jira] [Commented] (NUTCH-2985) Disable plugin urlfilter-validator by default

2023-02-24 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693291#comment-17693291 ] Markus Jelsma commented on NUTCH-2985: -- +1 > Disable plugin urlfilter-validator by default >

[jira] [Commented] (NUTCH-2974) Ant build fails with "Unparseable date" on certain platforms

2023-01-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677435#comment-17677435 ] Markus Jelsma commented on NUTCH-2974: -- Sounds like a nice solution for this obscure bug +1 > Ant

[jira] [Commented] (NUTCH-2634) Some links marked as "nofollow" are followed anyway.

2023-01-06 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655383#comment-17655383 ] Markus Jelsma commented on NUTCH-2634: -- +1 > Some links marked as "nofollow" are followed anyway. >

[jira] [Comment Edited] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-22 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651243#comment-17651243 ] Markus Jelsma edited comment on NUTCH-2978 at 12/22/22 11:33 AM: - Ah

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-22 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651243#comment-17651243 ] Markus Jelsma commented on NUTCH-2978: -- Ah nope, this is not it. Parse-tika throws lots of errors

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648636#comment-17648636 ] Markus Jelsma commented on NUTCH-2978: -- New patch now makes sure there is a log4j 2.19 in tika and

[jira] [Updated] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2978: - Attachment: NUTCH-2978-3.patch > Move to slf4j2 and remove log4j1 and reload4j >

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648633#comment-17648633 ] Markus Jelsma commented on NUTCH-2978: -- Ok, i also wanted to get rid of loose log4j libs. There was

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648625#comment-17648625 ] Markus Jelsma commented on NUTCH-2978: -- Patch now includes Sebastian's patch, and actually contains

[jira] [Updated] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2978: - Attachment: NUTCH-2978-2.patch > Move to slf4j2 and remove log4j1 and reload4j >

[jira] [Updated] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2978: - Attachment: (was: NUTCH-2978-1.patch) > Move to slf4j2 and remove log4j1 and reload4j >

[jira] [Updated] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2978: - Attachment: NUTCH-2978-1.patch > Move to slf4j2 and remove log4j1 and reload4j >

[jira] [Updated] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-16 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2978: - Attachment: NUTCH-2978-1.patch > Move to slf4j2 and remove log4j1 and reload4j >

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-15 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648060#comment-17648060 ] Markus Jelsma commented on NUTCH-2978: -- Ah yes, thanks! I am not sure if a 'solution' will come from

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-13 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646681#comment-17646681 ] Markus Jelsma commented on NUTCH-2978: -- About the slf issues, Somewhere another slf4j jar was

[jira] [Resolved] (NUTCH-2924) Generate maxCount expr evaluated only once

2022-12-12 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2924. -- Resolution: Fixed {color:#00}To https://gitbox.apache.org/repos/asf/nutch.git {color}   

[jira] [Comment Edited] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644838#comment-17644838 ] Markus Jelsma edited comment on NUTCH-2978 at 12/8/22 2:34 PM: --- Ah, well. I

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644838#comment-17644838 ] Markus Jelsma commented on NUTCH-2978: -- Ah, well. I also tried a Tika parsing fetcher of a vanilla

[jira] [Comment Edited] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644825#comment-17644825 ] Markus Jelsma edited comment on NUTCH-2978 at 12/8/22 2:12 PM: --- This

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644825#comment-17644825 ] Markus Jelsma commented on NUTCH-2978: -- This morning i saw one of our internal projects spewing the

[jira] [Commented] (NUTCH-2924) Generate maxCount expr evaluated only once

2022-12-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644808#comment-17644808 ] Markus Jelsma commented on NUTCH-2924: -- Here's the proper patch, finally. > Generate maxCount expr

[jira] [Updated] (NUTCH-2924) Generate maxCount expr evaluated only once

2022-12-08 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2924: - Attachment: NUTCH-2924-5.patch > Generate maxCount expr evaluated only once >

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644491#comment-17644491 ] Markus Jelsma commented on NUTCH-2978: -- Yes, i saw the slf4j present in the plugin, it troubled my

[jira] [Commented] (NUTCH-2924) Generate maxCount expr evaluated only once

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644489#comment-17644489 ] Markus Jelsma commented on NUTCH-2924: -- Yes, that is expected. This patch requires a hostdb to be

[jira] [Resolved] (NUTCH-2977) Support for showing dependency tree

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-2977. -- Fix Version/s: 1.20 Resolution: Fixed > Support for showing dependency tree >

[jira] [Commented] (NUTCH-2977) Support for showing dependency tree

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644436#comment-17644436 ] Markus Jelsma commented on NUTCH-2977: -- {color:#00}Committed:{color}   ed7b6615b..d806aa450  

[jira] [Updated] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2978: - Description: I got in trouble upgrading some dependencies and got a lot of LinkageErrors today,

[jira] [Updated] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2978: - Description: I got in trouble upgrading some dependencies and got a lot of LinkageErrors today,

[jira] [Updated] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2978: - Attachment: NUTCH-2978.patch > Move to slf4j2 and remove log4j1 and reload4j >

[jira] [Created] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2022-12-07 Thread Markus Jelsma (Jira)
Markus Jelsma created NUTCH-2978: Summary: Move to slf4j2 and remove log4j1 and reload4j Key: NUTCH-2978 URL: https://issues.apache.org/jira/browse/NUTCH-2978 Project: Nutch Issue Type: Task

[jira] [Updated] (NUTCH-2977) Support for showing dependency tree

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2977: - Attachment: NUTCH-2977.patch > Support for showing dependency tree >

[jira] [Updated] (NUTCH-2977) Support for showing dependency tree

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2977: - Description: I am upgrading Nutch to slf4j 2 and need to get rid of old 1.7 stuff, and

[jira] [Created] (NUTCH-2977) Support for showing dependency tree

2022-12-07 Thread Markus Jelsma (Jira)
Markus Jelsma created NUTCH-2977: Summary: Support for showing dependency tree Key: NUTCH-2977 URL: https://issues.apache.org/jira/browse/NUTCH-2977 Project: Nutch Issue Type: Task

[jira] [Commented] (NUTCH-2924) Generate maxCount expr evaluated only once

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644298#comment-17644298 ] Markus Jelsma commented on NUTCH-2924: -- Updated patch for master. > Generate maxCount expr

[jira] [Updated] (NUTCH-2924) Generate maxCount expr evaluated only once

2022-12-07 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2924: - Attachment: NUTCH-2924-4.patch > Generate maxCount expr evaluated only once >

[jira] [Commented] (NUTCH-2973) Single domain names (eg https://localnet) can't be crawled - filtering fails

2022-10-20 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621023#comment-17621023 ] Markus Jelsma commented on NUTCH-2973: -- Hello David, By default urlfilter-validator is an active

[jira] [Commented] (NUTCH-2969) Javadoc: Javascript search is not working when built on JDK 11

2022-08-22 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582978#comment-17582978 ] Markus Jelsma commented on NUTCH-2969: -- Nice! > Javadoc: Javascript search is not working when

[jira] [Commented] (NUTCH-2960) indexer-elastic: remove plugin from binary package to address licensing issues

2022-08-22 Thread Markus Jelsma (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582859#comment-17582859 ] Markus Jelsma commented on NUTCH-2960: -- Yes, this would be much preferred over removing the binaries

  1   2   3   4   5   6   7   8   9   10   >