[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508812
]
Doğacan Güney commented on NUTCH-392:
-
OK, I have done a bit of testing on compression but I'm stuck. Here it is:
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508816
]
Andrzej Bialecki commented on NUTCH-392:
-
Re: Content versioning - we can use negative int values as version
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508818
]
Doğacan Güney commented on NUTCH-392:
-
Re: Content versioning - we can use negative int values as version
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508820
]
Sami Siren commented on NUTCH-392:
--
But why is parse_text_block's size so close to parse_text
data of parse_text
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508823
]
Doğacan Güney commented on NUTCH-392:
-
data of parse_text is already compressed so recompressing it does not
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508861
]
Doğacan Güney commented on NUTCH-392:
-
After changing ParseText to not do any internal compression, segment
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508900
]
Andrzej Bialecki commented on NUTCH-392:
-
Excellent work, Doğacan - thank you. The numbers for RECORD
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500935
]
Doğacan Güney commented on NUTCH-392:
-
Perhaps we can allow a user to configure this on a per-structure basis by
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500951
]
Andrzej Bialecki commented on NUTCH-392:
-
I don't think it's a good idea, it's creating too many cryptic
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500603
]
Doğacan Güney commented on NUTCH-392:
-
From what I understand of MapFile.Writer code in hadoop, if you give
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500635
]
Andrzej Bialecki commented on NUTCH-392:
-
Good point. We can change it to use the following pattern (as
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500728
]
Andrzej Bialecki commented on NUTCH-392:
-
I think it is okay to allow BLOCK compression for linkdb,
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500822
]
Doug Cutting commented on NUTCH-392:
Anchors, explain, and the cache are used relatively infrequently,
[
http://issues.apache.org/jira/browse/NUTCH-392?page=comments#action_12444719 ]
Doug Cutting commented on NUTCH-392:
This should not be applied until Nutch uses Hadoop 0.8. It also contains a
patch required to make Nutch work correctly
14 matches
Mail list logo