[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2197:
Attachment: NUTCH-2197.patch
I've attached a patch with initial support for Solr5 +
[
https://issues.apache.org/jira/browse/NUTCH-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2203:
Attachment: NUTCH-2203.patch
Attached a patch to fix this.
> Suffix URL filter can't
Jurian Broertjes created NUTCH-2203:
---
Summary: Suffix URL filter can't handle trailing/leading
whitespaces
Key: NUTCH-2203
URL: https://issues.apache.org/jira/browse/NUTCH-2203
Project: Nutch
Jurian Broertjes created NUTCH-2242:
---
Summary: lastModified not always set
Key: NUTCH-2242
URL: https://issues.apache.org/jira/browse/NUTCH-2242
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2242:
Flags: Patch
Patch Info: Patch Available
> lastModified not always set
>
[
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2242:
Attachment: NUTCH-2242.patch
Initial version of patch. Please review
> lastModified not
[
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279842#comment-15279842
]
Jurian Broertjes commented on NUTCH-2242:
-
Hi Sebastian, I've put this in the reduce() function
Jurian Broertjes created NUTCH-2378:
---
Summary: ChildFirst plugin classloader
Key: NUTCH-2378
URL: https://issues.apache.org/jira/browse/NUTCH-2378
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2378:
Attachment: NUTCH-2378-childfirst-plugin-classloader.patch
> ChildFirst plugin classloader
[
https://issues.apache.org/jira/browse/NUTCH-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2380:
Attachment: NUTCH-2380-indexer-elastic-p0.patch
> indexer-elastic version bump
>
Jurian Broertjes created NUTCH-2380:
---
Summary: indexer-elastic version bump
Key: NUTCH-2380
URL: https://issues.apache.org/jira/browse/NUTCH-2380
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992836#comment-15992836
]
Jurian Broertjes commented on NUTCH-2373:
-
Nutch 1.x version
> Indexer for Hbase
>
[
https://issues.apache.org/jira/browse/NUTCH-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2373:
Comment: was deleted
(was: Nutch 1.x version)
> Indexer for Hbase
> -
>
>
Jurian Broertjes created NUTCH-2382:
---
Summary: indexer-hbase Nutch 1.x branch
Key: NUTCH-2382
URL: https://issues.apache.org/jira/browse/NUTCH-2382
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2382:
Attachment: NUTCH-2382-indexer-hbase-p1.patch
> indexer-hbase Nutch 1.x branch
>
Jurian Broertjes created NUTCH-2431:
---
Summary: Filterchecker to implement Tool-interface
Key: NUTCH-2431
URL: https://issues.apache.org/jira/browse/NUTCH-2431
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2431:
Attachment: NUTCH-2431.patch
> Filterchecker to implement Tool-interface
>
[
https://issues.apache.org/jira/browse/NUTCH-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294778#comment-16294778
]
Jurian Broertjes commented on NUTCH-2380:
-
I've tested it a while back, and it's currently also
[
https://issues.apache.org/jira/browse/NUTCH-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294783#comment-16294783
]
Jurian Broertjes commented on NUTCH-2431:
-
Yes, this is indeed resolved by NUTCH-2477
>
[
https://issues.apache.org/jira/browse/NUTCH-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295034#comment-16295034
]
Jurian Broertjes commented on NUTCH-2382:
-
Yeah +1 for that.
> indexer-hbase Nutch 1.x branch
>
[
https://issues.apache.org/jira/browse/NUTCH-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241941#comment-16241941
]
Jurian Broertjes commented on NUTCH-2431:
-
Will have a look at your feedback the coming week
>
Jurian Broertjes created NUTCH-2477:
---
Summary: Refactor *Checker classes to use base class for common
code
Key: NUTCH-2477
URL: https://issues.apache.org/jira/browse/NUTCH-2477
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2477:
External issue URL: https://github.com/apache/nutch/pull/256
> Refactor *Checker classes to
[
https://issues.apache.org/jira/browse/NUTCH-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287823#comment-16287823
]
Jurian Broertjes commented on NUTCH-2477:
-
Feedback is welcome
> Refactor *Checker classes to use
[
https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509573#comment-16509573
]
Jurian Broertjes commented on NUTCH-2565:
-
One solution would be to sum the retries of both
[
https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509801#comment-16509801
]
Jurian Broertjes commented on NUTCH-2565:
-
Maybe it would be sufficient to only test on
Jurian Broertjes created NUTCH-2597:
---
Summary: NPE in updatehostdb
Key: NUTCH-2597
URL: https://issues.apache.org/jira/browse/NUTCH-2597
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511082#comment-16511082
]
Jurian Broertjes commented on NUTCH-2597:
-
PR: [https://github.com/apache/nutch/pull/349]
Fixes
[
https://issues.apache.org/jira/browse/NUTCH-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509686#comment-16509686
]
Jurian Broertjes commented on NUTCH-2012:
-
It looks like the process() function still uses
[
https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512589#comment-16512589
]
Jurian Broertjes commented on NUTCH-2565:
-
Updated PR with the proposed solution
> MergeDB
Jurian Broertjes created NUTCH-2565:
---
Summary: MergeDB incorrectly handles unfetched CrawlDatums
Key: NUTCH-2565
URL: https://issues.apache.org/jira/browse/NUTCH-2565
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432083#comment-16432083
]
Jurian Broertjes commented on NUTCH-2565:
-
PR: https://github.com/apache/nutch/pull/311
> MergeDB
[
https://issues.apache.org/jira/browse/NUTCH-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409584#comment-16409584
]
Jurian Broertjes commented on NUTCH-2543:
-
PR: [https://github.com/apache/nutch/pull/303]
PR also
Jurian Broertjes created NUTCH-2543:
---
Summary: readdb & readlinkdb to implement AbstractChecker
Key: NUTCH-2543
URL: https://issues.apache.org/jira/browse/NUTCH-2543
Project: Nutch
Issue
Jurian Broertjes created NUTCH-2717:
---
Summary: Generator cannot open hostDB
Key: NUTCH-2717
URL: https://issues.apache.org/jira/browse/NUTCH-2717
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2717:
Description:
During generate, the hostDB cannot be opened anymore, see:
{quote}2019-05-16
[
https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834822#comment-16834822
]
Jurian Broertjes commented on NUTCH-2525:
-
Updated patch so it applies against master
> Metadata
[
https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2525:
Attachment: NUTCH-2525-p1.patch
> Metadata indexer cannot handle uppercase parse metadata
Jurian Broertjes created NUTCH-2750:
---
Summary: improve CrawlDbReader & LinkDbReader reader handling
Key: NUTCH-2750
URL: https://issues.apache.org/jira/browse/NUTCH-2750
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurian Broertjes updated NUTCH-2750:
Description:
The current implementation in the CrawlDbReader re-opens readers for every
40 matches
Mail list logo