[jira] [Resolved] (PARQUET-316) Run.sh is broken in parquet-benchmarks
[ https://issues.apache.org/jira/browse/PARQUET-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue resolved PARQUET-316. --- Resolution: Fixed Fix Version/s: 1.8.0 Merged Nezih's PR. Thanks for fixing this! Run.sh is broken in parquet-benchmarks -- Key: PARQUET-316 URL: https://issues.apache.org/jira/browse/PARQUET-316 Project: Parquet Issue Type: Bug Reporter: Nezih Yigitbasi Assignee: Nezih Yigitbasi Fix For: 1.8.0 With the package renaming (to org.apache.parquet) the run.sh script is now broken. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-146) make Parquet compile with java 7 instead of java 6
[ https://issues.apache.org/jira/browse/PARQUET-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608848#comment-14608848 ] Ryan Blue commented on PARQUET-146: --- We should discuss this on the mailing list. We've had recent contributions fixing support for java 6, so we definitely want to build consensus before deprecating support. make Parquet compile with java 7 instead of java 6 -- Key: PARQUET-146 URL: https://issues.apache.org/jira/browse/PARQUET-146 Project: Parquet Issue Type: Improvement Reporter: Julien Le Dem Labels: beginner, noob, pick-me-up currently Parquet is compatible with java 6. we should remove this constraint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-146) make Parquet compile with java 7 instead of java 6
[ https://issues.apache.org/jira/browse/PARQUET-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608834#comment-14608834 ] Nezih Yigitbasi commented on PARQUET-146: - [~singhashish] From the title of this issue it seems like the pom file should also be updated. I just created a PR for that. make Parquet compile with java 7 instead of java 6 -- Key: PARQUET-146 URL: https://issues.apache.org/jira/browse/PARQUET-146 Project: Parquet Issue Type: Improvement Reporter: Julien Le Dem Labels: beginner, noob, pick-me-up currently Parquet is compatible with java 6. we should remove this constraint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-321) Set the HDFS padding default to 8MB
[ https://issues.apache.org/jira/browse/PARQUET-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated PARQUET-321: -- Summary: Set the HDFS padding default to 8MB (was: Set the HDFS padding default to 16MB) Set the HDFS padding default to 8MB --- Key: PARQUET-321 URL: https://issues.apache.org/jira/browse/PARQUET-321 Project: Parquet Issue Type: Improvement Components: parquet-mr Reporter: Ryan Blue Assignee: Ryan Blue Fix For: 1.8.0 PARQUET-306 added the ability to pad row groups so that they align with HDFS blocks to avoid remote reads. The ParquetFileWriter will now either pad the remaining space in the block or target a row group for the remaining size. The padding maximum controls the threshold of the amount of padding that will be used. If the space left is under this threshold, it is padded. If it is greater than this threshold, then the next row group is fit into the remaining space. The current padding maximum is 0. I think we should change the padding maximum to 8MB. My reasoning is this: we want this number to be small enough that it won't prevent the library from writing reasonable row groups, but larger than the minimum size row group we would want to write. 8MB is 1/16th of the row group default, so I think it is reasonable: we don't want a row group to be smaller than 8 MB. We also want this to be large enough that a few row groups in a block don't cause a tiny row group to be written in the excess space. 8MB accounts for 4 row groups that are 2MB under-size. In addition, it is reasonable to not allow row groups under 8MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-321) Set the HDFS padding default to 16MB
Ryan Blue created PARQUET-321: - Summary: Set the HDFS padding default to 16MB Key: PARQUET-321 URL: https://issues.apache.org/jira/browse/PARQUET-321 Project: Parquet Issue Type: Improvement Components: parquet-mr Reporter: Ryan Blue Assignee: Ryan Blue Fix For: 1.8.0 PARQUET-306 added the ability to pad row groups so that they align with HDFS blocks to avoid remote reads. The ParquetFileWriter will now either pad the remaining space in the block or target a row group for the remaining size. The padding maximum controls the threshold of the amount of padding that will be used. If the space left is under this threshold, it is padded. If it is greater than this threshold, then the next row group is fit into the remaining space. The current padding maximum is 0. I think we should change the padding maximum to 8MB. My reasoning is this: we want this number to be small enough that it won't prevent the library from writing reasonable row groups, but larger than the minimum size row group we would want to write. 8MB is 1/16th of the row group default, so I think it is reasonable: we don't want a row group to be smaller than 8 MB. We also want this to be large enough that a few row groups in a block don't cause a tiny row group to be written in the excess space. 8MB accounts for 4 row groups that are 2MB under-size. In addition, it is reasonable to not allow row groups under 8MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)