[jira] [Commented] (PARQUET-299) [Vectorized Reader] ColumnVector length should be in terms of rows, not DataPages
[ https://issues.apache.org/jira/browse/PARQUET-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588122#comment-14588122 ] Dong Chen commented on PARQUET-299: --- Thanks [~nezihyigitbasi], I will take a try on Hive side to handle this. If the rows size of vector depends on the data pages and is not constant, I get a question about the array {{values}} in {{ColumnVector}}. For example, the {{int[] values}} in {{IntColumnVector}}, is initialized with default size 1K. I guessed this array is designed to store decoded values if eager decoding. Since the actual rows numbers is always a litter bigger, how will we handle this? Resize it to {{ColumnVector.numValues}}, or other thoughts? [Vectorized Reader] ColumnVector length should be in terms of rows, not DataPages - Key: PARQUET-299 URL: https://issues.apache.org/jira/browse/PARQUET-299 Project: Parquet Issue Type: Sub-task Components: parquet-mr Reporter: Zhenxiao Luo In https://github.com/zhenxiao/incubator-parquet-mr/tree/vector ColumnVector length is in terms of DataPages, need to be in terms of rows -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-299) [Vectorized Reader] ColumnVector length should be in terms of rows, not DataPages
[ https://issues.apache.org/jira/browse/PARQUET-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588332#comment-14588332 ] Nezih Yigitbasi commented on PARQUET-299: - [~dongc] you are right that array size is fixed and that's an issue and we are still working on sorting out these issues. Will keep you posted. [Vectorized Reader] ColumnVector length should be in terms of rows, not DataPages - Key: PARQUET-299 URL: https://issues.apache.org/jira/browse/PARQUET-299 Project: Parquet Issue Type: Sub-task Components: parquet-mr Reporter: Zhenxiao Luo In https://github.com/zhenxiao/incubator-parquet-mr/tree/vector ColumnVector length is in terms of DataPages, need to be in terms of rows -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-178) META-INF for slf4j should not be in parquet-format jar
[ https://issues.apache.org/jira/browse/PARQUET-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue resolved PARQUET-178. --- Resolution: Fixed Assignee: Ryan Blue Merged. Thanks for letting us know about this [~koert]! META-INF for slf4j should not be in parquet-format jar -- Key: PARQUET-178 URL: https://issues.apache.org/jira/browse/PARQUET-178 Project: Parquet Issue Type: Bug Components: parquet-format Affects Versions: 1.6.0 Reporter: koert kuipers Assignee: Ryan Blue Priority: Minor {noformat} $ jar tf parquet-format-2.2.0-rc1.jar | grep org\\.slf META-INF/maven/org.slf4j/ META-INF/maven/org.slf4j/slf4j-api/ META-INF/maven/org.slf4j/slf4j-api/pom.xml META-INF/maven/org.slf4j/slf4j-api/pom.properties {noformat} It is not clear to me why these are here. I suspect they should not be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-308) Add accessor to ParquetWriter to get current data size
Ryan Blue created PARQUET-308: - Summary: Add accessor to ParquetWriter to get current data size Key: PARQUET-308 URL: https://issues.apache.org/jira/browse/PARQUET-308 Project: Parquet Issue Type: Improvement Components: parquet-mr Affects Versions: 1.7.0 Reporter: Ryan Blue Assignee: Ryan Blue Fix For: 1.8.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics
[ https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589347#comment-14589347 ] Ferdinand Xu commented on PARQUET-41: - Hi guys, The pull request for parquet-format-mr is located at https://github.com/apache/parquet-mr/pull/215 and the one for parquet-format is at https://github.com/apache/parquet-format/pull/28. Currently, I only add the support for integer and test passed for unit test and hive side. The second one is used to define the data structure. Please help me review these two PRs. Thank you! Add bloom filters to parquet statistics --- Key: PARQUET-41 URL: https://issues.apache.org/jira/browse/PARQUET-41 Project: Parquet Issue Type: New Feature Components: parquet-format, parquet-mr Reporter: Alex Levenson Assignee: ferdinand xu Labels: filter2 For row groups with no dictionary, we could still produce a bloom filter. This could be very useful in filtering entire row groups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-309) Remove unnecessary compile dependency on parquet-generator
Konstantin Shaposhnikov created PARQUET-309: --- Summary: Remove unnecessary compile dependency on parquet-generator Key: PARQUET-309 URL: https://issues.apache.org/jira/browse/PARQUET-309 Project: Parquet Issue Type: Improvement Components: parquet-mr Affects Versions: 1.7.0 Reporter: Konstantin Shaposhnikov parquet-generator is used during build time only. Other parquet-jars (e.g. parquet-encoding) should not depend on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)