DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun
dedup on a segment
-
Key: NUTCH-525
URL:
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vishal Shah updated NUTCH-525:
--
Attachment: deleteDups.patch
Patch for the bug attached here.
DeleteDuplicates generates
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514910
]
Vishal Shah commented on NUTCH-525:
---
Hi,
I'll add a unit test.
For the undelete thing, the need could arise
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vishal Shah updated NUTCH-525:
--
Attachment: RededupUnitTest.patch
I have modified the existing junit test for DeleteDuplicates to test
[
https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507144
]
Vishal Shah commented on NUTCH-503:
---
Hi Emmanuel,
Can you please dump the contents of your crawldb after
Generator exits incorrectly for small fetchlists
-
Key: NUTCH-503
URL: https://issues.apache.org/jira/browse/NUTCH-503
Project: Nutch
Issue Type: Bug
Components: generator
[
https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vishal Shah updated NUTCH-503:
--
Attachment: emptyfetchlist.patch
Hi,
The previous patch is missing a header line. I've reattached