[jira] [Updated] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug

2016-03-07 Thread Ron van der Vegt (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron van der Vegt updated NUTCH-2237: Attachment: NUTCH-2237.patch > DeduplicationJob: Add extra order criteria based on slug >

[jira] [Commented] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug

2016-03-07 Thread Ron van der Vegt (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183112#comment-15183112 ] Ron van der Vegt commented on NUTCH-2237: - Thanks for the feedback! - maybe URLUtil.java would be

[jira] [Updated] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug

2016-03-02 Thread Ron van der Vegt (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron van der Vegt updated NUTCH-2237: Attachment: NUTCH-2237.patch Extra option compare criteria added: [-compareOrder ,,,] >

[jira] [Created] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug

2016-03-02 Thread Ron van der Vegt (JIRA)
Ron van der Vegt created NUTCH-2237: --- Summary: DeduplicationJob: Add extra order criteria based on slug Key: NUTCH-2237 URL: https://issues.apache.org/jira/browse/NUTCH-2237 Project: Nutch

[jira] [Updated] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Ron van der Vegt (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron van der Vegt updated NUTCH-2232: Attachment: NUTCH-2232.patch > DeduplicationJob: Url is not decoded before the url length

[jira] [Created] (NUTCH-2232) DeduplicationJob: Url is not decoded before the url length is compared.

2016-02-24 Thread Ron van der Vegt (JIRA)
Ron van der Vegt created NUTCH-2232: --- Summary: DeduplicationJob: Url is not decoded before the url length is compared. Key: NUTCH-2232 URL: https://issues.apache.org/jira/browse/NUTCH-2232 Project:

[jira] [Updated] (NUTCH-2219) Dedup script, allow users to change the order in which main documents are selected.

2016-02-15 Thread Ron van der Vegt (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron van der Vegt updated NUTCH-2219: Attachment: NUTCH-2219.patch > Dedup script, allow users to change the order in which main

[jira] [Created] (NUTCH-2219) Dedup script, allow users to change the order in which main documents are selected.

2016-02-15 Thread Ron van der Vegt (JIRA)
Ron van der Vegt created NUTCH-2219: --- Summary: Dedup script, allow users to change the order in which main documents are selected. Key: NUTCH-2219 URL: https://issues.apache.org/jira/browse/NUTCH-2219

[jira] [Updated] (NUTCH-1815) Metadata Parsed with parse-tika is Duplicated

2015-01-09 Thread Ron van der Vegt (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron van der Vegt updated NUTCH-1815: Attachment: NUTCH-1815-1.9.patch A small patch for 1.9 which will not add to the prefixed

[jira] [Updated] (NUTCH-1815) Metadata Parsed with parse-tika is Duplicated

2015-01-09 Thread Ron van der Vegt (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron van der Vegt updated NUTCH-1815: Attachment: NUTCH-1815-1.9.patch Metadata Parsed with parse-tika is Duplicated