[jira] [Resolved] (PARQUET-316) Run.sh is broken in parquet-benchmarks

2015-06-30 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved PARQUET-316.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

Merged Nezih's PR. Thanks for fixing this!

 Run.sh is broken in parquet-benchmarks
 --

 Key: PARQUET-316
 URL: https://issues.apache.org/jira/browse/PARQUET-316
 Project: Parquet
  Issue Type: Bug
Reporter: Nezih Yigitbasi
Assignee: Nezih Yigitbasi
 Fix For: 1.8.0


 With the package renaming (to org.apache.parquet) the run.sh script is now 
 broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-146) make Parquet compile with java 7 instead of java 6

2015-06-30 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608848#comment-14608848
 ] 

Ryan Blue commented on PARQUET-146:
---

We should discuss this on the mailing list. We've had recent contributions 
fixing support for java 6, so we definitely want to build consensus before 
deprecating support.

 make Parquet compile with java 7 instead of java 6
 --

 Key: PARQUET-146
 URL: https://issues.apache.org/jira/browse/PARQUET-146
 Project: Parquet
  Issue Type: Improvement
Reporter: Julien Le Dem
  Labels: beginner, noob, pick-me-up

 currently Parquet is compatible with java 6. we should remove this constraint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-146) make Parquet compile with java 7 instead of java 6

2015-06-30 Thread Nezih Yigitbasi (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608834#comment-14608834
 ] 

Nezih Yigitbasi commented on PARQUET-146:
-

[~singhashish] From the title of this issue it seems like the pom file should 
also be updated. I just created a PR for that.

 make Parquet compile with java 7 instead of java 6
 --

 Key: PARQUET-146
 URL: https://issues.apache.org/jira/browse/PARQUET-146
 Project: Parquet
  Issue Type: Improvement
Reporter: Julien Le Dem
  Labels: beginner, noob, pick-me-up

 currently Parquet is compatible with java 6. we should remove this constraint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-321) Set the HDFS padding default to 8MB

2015-06-30 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated PARQUET-321:
--
Summary: Set the HDFS padding default to 8MB  (was: Set the HDFS padding 
default to 16MB)

 Set the HDFS padding default to 8MB
 ---

 Key: PARQUET-321
 URL: https://issues.apache.org/jira/browse/PARQUET-321
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Reporter: Ryan Blue
Assignee: Ryan Blue
 Fix For: 1.8.0


 PARQUET-306 added the ability to pad row groups so that they align with HDFS 
 blocks to avoid remote reads. The ParquetFileWriter will now either pad the 
 remaining space in the block or target a row group for the remaining size.
 The padding maximum controls the threshold of the amount of padding that will 
 be used. If the space left is under this threshold, it is padded. If it is 
 greater than this threshold, then the next row group is fit into the 
 remaining space. The current padding maximum is 0.
 I think we should change the padding maximum to 8MB. My reasoning is this: we 
 want this number to be small enough that it won't prevent the library from 
 writing reasonable row groups, but larger than the minimum size row group we 
 would want to write. 8MB is 1/16th of the row group default, so I think it is 
 reasonable: we don't want a row group to be smaller than 8 MB.
 We also want this to be large enough that a few row groups in a  block don't 
 cause a tiny row group to be written in the excess space. 8MB accounts for 4 
 row groups that are 2MB under-size. In addition, it is reasonable to not 
 allow row groups under 8MB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-321) Set the HDFS padding default to 16MB

2015-06-30 Thread Ryan Blue (JIRA)
Ryan Blue created PARQUET-321:
-

 Summary: Set the HDFS padding default to 16MB
 Key: PARQUET-321
 URL: https://issues.apache.org/jira/browse/PARQUET-321
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Reporter: Ryan Blue
Assignee: Ryan Blue
 Fix For: 1.8.0


PARQUET-306 added the ability to pad row groups so that they align with HDFS 
blocks to avoid remote reads. The ParquetFileWriter will now either pad the 
remaining space in the block or target a row group for the remaining size.

The padding maximum controls the threshold of the amount of padding that will 
be used. If the space left is under this threshold, it is padded. If it is 
greater than this threshold, then the next row group is fit into the remaining 
space. The current padding maximum is 0.

I think we should change the padding maximum to 8MB. My reasoning is this: we 
want this number to be small enough that it won't prevent the library from 
writing reasonable row groups, but larger than the minimum size row group we 
would want to write. 8MB is 1/16th of the row group default, so I think it is 
reasonable: we don't want a row group to be smaller than 8 MB.

We also want this to be large enough that a few row groups in a  block don't 
cause a tiny row group to be written in the excess space. 8MB accounts for 4 
row groups that are 2MB under-size. In addition, it is reasonable to not allow 
row groups under 8MB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)