[
https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron van der Vegt updated NUTCH-2237:
Attachment: NUTCH-2237.patch
> DeduplicationJob: Add extra order criteria based on slug
>
[
https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183112#comment-15183112
]
Ron van der Vegt commented on NUTCH-2237:
-
Thanks for the feedback!
- maybe URLUtil.java would be
[
https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron van der Vegt updated NUTCH-2237:
Attachment: NUTCH-2237.patch
Extra option compare criteria added:
[-compareOrder ,,,]
>
Ron van der Vegt created NUTCH-2237:
---
Summary: DeduplicationJob: Add extra order criteria based on slug
Key: NUTCH-2237
URL: https://issues.apache.org/jira/browse/NUTCH-2237
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron van der Vegt updated NUTCH-2232:
Attachment: NUTCH-2232.patch
> DeduplicationJob: Url is not decoded before the url length
Ron van der Vegt created NUTCH-2232:
---
Summary: DeduplicationJob: Url is not decoded before the url
length is compared.
Key: NUTCH-2232
URL: https://issues.apache.org/jira/browse/NUTCH-2232
Project:
[
https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron van der Vegt updated NUTCH-2219:
Attachment: NUTCH-2219.patch
> Dedup script, allow users to change the order in which main
Ron van der Vegt created NUTCH-2219:
---
Summary: Dedup script, allow users to change the order in which
main documents are selected.
Key: NUTCH-2219
URL: https://issues.apache.org/jira/browse/NUTCH-2219
[
https://issues.apache.org/jira/browse/NUTCH-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron van der Vegt updated NUTCH-1815:
Attachment: NUTCH-1815-1.9.patch
A small patch for 1.9 which will not add to the prefixed
[
https://issues.apache.org/jira/browse/NUTCH-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron van der Vegt updated NUTCH-1815:
Attachment: NUTCH-1815-1.9.patch
Metadata Parsed with parse-tika is Duplicated
10 matches
Mail list logo