[jira] Commented: (HADOOP-1381) The distance between sync blocks in SequenceFiles should be configurable rather than hard coded to 2000 bytes

2007-05-18 Thread Owen O'Malley (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496977 ] Owen O'Malley commented on HADOOP-1381: --- As long as it is configurable, 100k as the default would be fine.

[jira] Commented: (HADOOP-1381) The distance between sync blocks in SequenceFiles should be configurable rather than hard coded to 2000 bytes

2007-05-18 Thread Owen O'Malley (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496968 ] Owen O'Malley commented on HADOOP-1381: --- The current setup forces all non-block compressed sequence files to

[jira] Commented: (HADOOP-1381) The distance between sync blocks in SequenceFiles should be configurable rather than hard coded to 2000 bytes

2007-05-18 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496973 ] Doug Cutting commented on HADOOP-1381: -- This sounds like a time/space tradeoff. We currently have a 1% space

[jira] Commented: (HADOOP-1381) The distance between sync blocks in SequenceFiles should be configurable rather than hard coded to 2000 bytes

2007-05-16 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496435 ] Doug Cutting commented on HADOOP-1381: -- reduce the overhead by a factor of 500 But if the overhead is

[jira] Commented: (HADOOP-1381) The distance between sync blocks in SequenceFiles should be configurable rather than hard coded to 2000 bytes

2007-05-16 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496429 ] Doug Cutting commented on HADOOP-1381: -- Why would this be better? The current design is to add them as

[jira] Commented: (HADOOP-1381) The distance between sync blocks in SequenceFiles should be configurable rather than hard coded to 2000 bytes

2007-05-16 Thread Owen O'Malley (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496432 ] Owen O'Malley commented on HADOOP-1381: --- If your input splits are roughly 128MB or so, putting in a sync