[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508812 ] Doğacan Güney commented on NUTCH-392: - OK, I have done a bit of testing on compression but I'm stuck. Here it is:

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508816 ] Andrzej Bialecki commented on NUTCH-392: - Re: Content versioning - we can use negative int values as version

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508818 ] Doğacan Güney commented on NUTCH-392: - Re: Content versioning - we can use negative int values as version

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508820 ] Sami Siren commented on NUTCH-392: -- But why is parse_text_block's size so close to parse_text data of parse_text

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508823 ] Doğacan Güney commented on NUTCH-392: - data of parse_text is already compressed so recompressing it does not

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508861 ] Doğacan Güney commented on NUTCH-392: - After changing ParseText to not do any internal compression, segment

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508900 ] Andrzej Bialecki commented on NUTCH-392: - Excellent work, Doğacan - thank you. The numbers for RECORD

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-02 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500935 ] Doğacan Güney commented on NUTCH-392: - Perhaps we can allow a user to configure this on a per-structure basis by

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500951 ] Andrzej Bialecki commented on NUTCH-392: - I don't think it's a good idea, it's creating too many cryptic

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500603 ] Doğacan Güney commented on NUTCH-392: - From what I understand of MapFile.Writer code in hadoop, if you give

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500635 ] Andrzej Bialecki commented on NUTCH-392: - Good point. We can change it to use the following pattern (as

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500728 ] Andrzej Bialecki commented on NUTCH-392: - I think it is okay to allow BLOCK compression for linkdb,

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500822 ] Doug Cutting commented on NUTCH-392: Anchors, explain, and the cache are used relatively infrequently,

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2006-10-25 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-392?page=comments#action_12444719 ] Doug Cutting commented on NUTCH-392: This should not be applied until Nutch uses Hadoop 0.8. It also contains a patch required to make Nutch work correctly