[jira] [Updated] (NUTCH-2197) Add solr5 solrcloud indexer support

2016-01-07 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2197: Attachment: NUTCH-2197.patch I've attached a patch with initial support for Solr5 +

[jira] [Updated] (NUTCH-2203) Suffix URL filter can't handle trailing/leading whitespaces

2016-01-19 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2203: Attachment: NUTCH-2203.patch Attached a patch to fix this. > Suffix URL filter can't

[jira] [Created] (NUTCH-2203) Suffix URL filter can't handle trailing/leading whitespaces

2016-01-19 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2203: --- Summary: Suffix URL filter can't handle trailing/leading whitespaces Key: NUTCH-2203 URL: https://issues.apache.org/jira/browse/NUTCH-2203 Project: Nutch

[jira] [Created] (NUTCH-2242) lastModified not always set

2016-03-23 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2242: --- Summary: lastModified not always set Key: NUTCH-2242 URL: https://issues.apache.org/jira/browse/NUTCH-2242 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-2242) lastModified not always set

2016-03-23 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2242: Flags: Patch Patch Info: Patch Available > lastModified not always set >

[jira] [Updated] (NUTCH-2242) lastModified not always set

2016-03-23 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2242: Attachment: NUTCH-2242.patch Initial version of patch. Please review > lastModified not

[jira] [Commented] (NUTCH-2242) lastModified not always set

2016-05-11 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279842#comment-15279842 ] Jurian Broertjes commented on NUTCH-2242: - Hi Sebastian, I've put this in the reduce() function

[jira] [Created] (NUTCH-2378) ChildFirst plugin classloader

2017-05-01 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2378: --- Summary: ChildFirst plugin classloader Key: NUTCH-2378 URL: https://issues.apache.org/jira/browse/NUTCH-2378 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-2378) ChildFirst plugin classloader

2017-05-01 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2378: Attachment: NUTCH-2378-childfirst-plugin-classloader.patch > ChildFirst plugin classloader

[jira] [Updated] (NUTCH-2380) indexer-elastic version bump

2017-05-02 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2380: Attachment: NUTCH-2380-indexer-elastic-p0.patch > indexer-elastic version bump >

[jira] [Created] (NUTCH-2380) indexer-elastic version bump

2017-05-02 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2380: --- Summary: indexer-elastic version bump Key: NUTCH-2380 URL: https://issues.apache.org/jira/browse/NUTCH-2380 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2373) Indexer for Hbase

2017-05-02 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992836#comment-15992836 ] Jurian Broertjes commented on NUTCH-2373: - Nutch 1.x version > Indexer for Hbase >

[jira] [Issue Comment Deleted] (NUTCH-2373) Indexer for Hbase

2017-05-02 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2373: Comment: was deleted (was: Nutch 1.x version) > Indexer for Hbase > - > >

[jira] [Created] (NUTCH-2382) indexer-hbase Nutch 1.x branch

2017-05-02 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2382: --- Summary: indexer-hbase Nutch 1.x branch Key: NUTCH-2382 URL: https://issues.apache.org/jira/browse/NUTCH-2382 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-2382) indexer-hbase Nutch 1.x branch

2017-05-02 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2382: Attachment: NUTCH-2382-indexer-hbase-p1.patch > indexer-hbase Nutch 1.x branch >

[jira] [Created] (NUTCH-2431) Filterchecker to implement Tool-interface

2017-09-25 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2431: --- Summary: Filterchecker to implement Tool-interface Key: NUTCH-2431 URL: https://issues.apache.org/jira/browse/NUTCH-2431 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-2431) Filterchecker to implement Tool-interface

2017-09-25 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2431: Attachment: NUTCH-2431.patch > Filterchecker to implement Tool-interface >

[jira] [Commented] (NUTCH-2380) indexer-elastic version bump

2017-12-18 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294778#comment-16294778 ] Jurian Broertjes commented on NUTCH-2380: - I've tested it a while back, and it's currently also

[jira] [Commented] (NUTCH-2431) URLFilterchecker to implement Tool-interface

2017-12-18 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294783#comment-16294783 ] Jurian Broertjes commented on NUTCH-2431: - Yes, this is indeed resolved by NUTCH-2477 >

[jira] [Commented] (NUTCH-2382) indexer-hbase Nutch 1.x branch

2017-12-18 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295034#comment-16295034 ] Jurian Broertjes commented on NUTCH-2382: - Yeah +1 for that. > indexer-hbase Nutch 1.x branch >

[jira] [Commented] (NUTCH-2431) Filterchecker to implement Tool-interface

2017-11-07 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241941#comment-16241941 ] Jurian Broertjes commented on NUTCH-2431: - Will have a look at your feedback the coming week >

[jira] [Created] (NUTCH-2477) Refactor *Checker classes to use base class for common code

2017-12-12 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2477: --- Summary: Refactor *Checker classes to use base class for common code Key: NUTCH-2477 URL: https://issues.apache.org/jira/browse/NUTCH-2477 Project: Nutch

[jira] [Updated] (NUTCH-2477) Refactor *Checker classes to use base class for common code

2017-12-12 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2477: External issue URL: https://github.com/apache/nutch/pull/256 > Refactor *Checker classes to

[jira] [Commented] (NUTCH-2477) Refactor *Checker classes to use base class for common code

2017-12-12 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287823#comment-16287823 ] Jurian Broertjes commented on NUTCH-2477: - Feedback is welcome > Refactor *Checker classes to use

[jira] [Commented] (NUTCH-2565) MergeDB incorrectly handles unfetched CrawlDatums

2018-06-12 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509573#comment-16509573 ] Jurian Broertjes commented on NUTCH-2565: - One solution would be to sum the retries of both

[jira] [Commented] (NUTCH-2565) MergeDB incorrectly handles unfetched CrawlDatums

2018-06-12 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509801#comment-16509801 ] Jurian Broertjes commented on NUTCH-2565: - Maybe it would be sufficient to only test on

[jira] [Created] (NUTCH-2597) NPE in updatehostdb

2018-06-13 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2597: --- Summary: NPE in updatehostdb Key: NUTCH-2597 URL: https://issues.apache.org/jira/browse/NUTCH-2597 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2597) NPE in updatehostdb

2018-06-13 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511082#comment-16511082 ] Jurian Broertjes commented on NUTCH-2597: - PR: [https://github.com/apache/nutch/pull/349] Fixes

[jira] [Commented] (NUTCH-2012) Merge parsechecker and indexchecker

2018-06-12 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509686#comment-16509686 ] Jurian Broertjes commented on NUTCH-2012: - It looks like the process() function still uses

[jira] [Commented] (NUTCH-2565) MergeDB incorrectly handles unfetched CrawlDatums

2018-06-14 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512589#comment-16512589 ] Jurian Broertjes commented on NUTCH-2565: - Updated PR with the proposed solution > MergeDB

[jira] [Created] (NUTCH-2565) MergeDB incorrectly handles unfetched CrawlDatums

2018-04-10 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2565: --- Summary: MergeDB incorrectly handles unfetched CrawlDatums Key: NUTCH-2565 URL: https://issues.apache.org/jira/browse/NUTCH-2565 Project: Nutch Issue

[jira] [Commented] (NUTCH-2565) MergeDB incorrectly handles unfetched CrawlDatums

2018-04-10 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432083#comment-16432083 ] Jurian Broertjes commented on NUTCH-2565: - PR: https://github.com/apache/nutch/pull/311 > MergeDB

[jira] [Commented] (NUTCH-2543) readdb & readlinkdb to implement AbstractChecker

2018-03-22 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409584#comment-16409584 ] Jurian Broertjes commented on NUTCH-2543: - PR: [https://github.com/apache/nutch/pull/303] PR also

[jira] [Created] (NUTCH-2543) readdb & readlinkdb to implement AbstractChecker

2018-03-22 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2543: --- Summary: readdb & readlinkdb to implement AbstractChecker Key: NUTCH-2543 URL: https://issues.apache.org/jira/browse/NUTCH-2543 Project: Nutch Issue

[jira] [Created] (NUTCH-2717) Generator cannot open hostDB

2019-05-16 Thread Jurian Broertjes (JIRA)
Jurian Broertjes created NUTCH-2717: --- Summary: Generator cannot open hostDB Key: NUTCH-2717 URL: https://issues.apache.org/jira/browse/NUTCH-2717 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-2717) Generator cannot open hostDB

2019-05-16 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2717: Description: During generate, the hostDB cannot be opened anymore, see: {quote}2019-05-16

[jira] [Commented] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2019-05-07 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834822#comment-16834822 ] Jurian Broertjes commented on NUTCH-2525: - Updated patch so it applies against master > Metadata

[jira] [Updated] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2019-05-07 Thread Jurian Broertjes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2525: Attachment: NUTCH-2525-p1.patch > Metadata indexer cannot handle uppercase parse metadata

[jira] [Created] (NUTCH-2750) improve CrawlDbReader & LinkDbReader reader handling

2019-10-24 Thread Jurian Broertjes (Jira)
Jurian Broertjes created NUTCH-2750: --- Summary: improve CrawlDbReader & LinkDbReader reader handling Key: NUTCH-2750 URL: https://issues.apache.org/jira/browse/NUTCH-2750 Project: Nutch

[jira] [Updated] (NUTCH-2750) improve CrawlDbReader & LinkDbReader reader handling

2019-10-24 Thread Jurian Broertjes (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jurian Broertjes updated NUTCH-2750: Description: The current implementation in the CrawlDbReader re-opens readers for every