[jira] Resolved: (NUTCH-534) SegmentMerger: add -normalize option

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-534. - Resolution: Fixed Assignee: Andrzej Bialecki (was: Emmanuel Joke) Patch applied

[jira] Closed: (NUTCH-534) SegmentMerger: add -normalize option

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-534. --- SegmentMerger: add -normalize option

[jira] Commented: (NUTCH-368) Message queueing system

2008-01-15 Thread Chris Chiappone (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559234#action_12559234 ] Chris Chiappone commented on NUTCH-368: --- I tried to run this patch but im not sure

[jira] Resolved: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-528. - Resolution: Fixed Assignee: Andrzej Bialecki (was: Emmanuel Joke) Patch applied

[jira] Closed: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-528. --- CrawlDbReader: add some new stats + dump into a csv format

[jira] Commented: (NUTCH-596) ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559255#action_12559255 ] Andrzej Bialecki commented on NUTCH-596: - I'm voting for simplicity ;) i.e. the

[jira] Resolved: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish.

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-597. - Resolution: Fixed Fix Version/s: 1.0.0 Assignee: Andrzej Bialecki Patch

[jira] Closed: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish.

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-597. --- Fetcher2 - java.lang.NullPointerException when host does not exist and

[jira] Commented: (NUTCH-594) Serve Nutch search results in XML and JSON

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559265#action_12559265 ] Andrzej Bialecki commented on NUTCH-594: - I like the concept of this patch that

[jira] Commented: (NUTCH-592) Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559272#action_12559272 ] Andrzej Bialecki commented on NUTCH-592: - This seems to be a duplicate of

[jira] Commented: (NUTCH-590) Index multiple docs per call using IndexingFilter extension point

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559280#action_12559280 ] Andrzej Bialecki commented on NUTCH-590: - Nutch has a provision to return multiple

[jira] Commented: (NUTCH-584) urls missing from fetchlist

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559319#action_12559319 ] Andrzej Bialecki commented on NUTCH-584: - Thank you for the simple test case! I

[jira] Updated: (NUTCH-584) urls missing from fetchlist

2008-01-15 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-584: Attachment: generator.patch Patch to address this problem - your test case executes fine

Serious bug in Generator / FreeGenerator

2008-01-15 Thread Andrzej Bialecki
Hi all, I believe I found the main reason for the elusive issue of missing urls in Generator output. Please see https://issues.apache.org/jira/browse/NUTCH-584 for details. If my analysis is correct (I'd appreciate a review) then I'll commit the patch shortly. -- Best regards, Andrzej

[jira] Commented: (NUTCH-363) Fetcher normalizes everything at least twice

2008-01-15 Thread iwan cornelius (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559369#action_12559369 ] iwan cornelius commented on NUTCH-363: -- Has this been resolved or a work around found?

[jira] Commented: (NUTCH-363) Fetcher normalizes everything at least twice

2008-01-15 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559378#action_12559378 ] Emmanuel Joke commented on NUTCH-363: - FYI, The operation to normalize link within the