[ https://issues.apache.org/jira/browse/PARQUET-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Blue resolved PARQUET-306. ------------------------------- Resolution: Fixed Fix Version/s: 1.8.0 Merged #211. Thanks for reviewing, Alex! > Improve alignment between row groups and HDFS blocks > ---------------------------------------------------- > > Key: PARQUET-306 > URL: https://issues.apache.org/jira/browse/PARQUET-306 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Reporter: Ryan Blue > Assignee: Ryan Blue > Fix For: 1.8.0 > > > Row groups should not span HDFS blocks to avoid remote reads. There are 3 > things we can use to avoid this: > 1. Set the next row group's size to the remaining bytes in the current HDFS > block > 2. Use HDFS-3689, variable-length HDFS blocks, when available > 3. Pad after row groups close to the block boundary to start the next row > group at the start of the next block -- This message was sent by Atlassian JIRA (v6.3.4#6332)