[jira] [Commented] (PARQUET-299) [Vectorized Reader] ColumnVector length should be in terms of rows, not DataPages

2015-06-16 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588122#comment-14588122
 ] 

Dong Chen commented on PARQUET-299:
---

Thanks [~nezihyigitbasi], I will take a try on Hive side to handle this.

If the rows size of vector depends on the data pages and is not constant, I get 
a question about the array {{values}} in {{ColumnVector}}. For example, the 
{{int[] values}} in {{IntColumnVector}}, is initialized with default size 1K. I 
guessed this array is designed to store decoded values if eager decoding. Since 
the actual rows numbers is always a litter bigger, how will we handle this? 
Resize it to {{ColumnVector.numValues}}, or other thoughts?



 [Vectorized Reader] ColumnVector length should be in terms of rows, not 
 DataPages
 -

 Key: PARQUET-299
 URL: https://issues.apache.org/jira/browse/PARQUET-299
 Project: Parquet
  Issue Type: Sub-task
  Components: parquet-mr
Reporter: Zhenxiao Luo

 In https://github.com/zhenxiao/incubator-parquet-mr/tree/vector
 ColumnVector length is in terms of DataPages, need to be in terms of rows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-299) [Vectorized Reader] ColumnVector length should be in terms of rows, not DataPages

2015-06-16 Thread Nezih Yigitbasi (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588332#comment-14588332
 ] 

Nezih Yigitbasi commented on PARQUET-299:
-

[~dongc] you are right that array size is fixed and that's an issue and we are 
still working on sorting out these issues. Will keep you posted.

 [Vectorized Reader] ColumnVector length should be in terms of rows, not 
 DataPages
 -

 Key: PARQUET-299
 URL: https://issues.apache.org/jira/browse/PARQUET-299
 Project: Parquet
  Issue Type: Sub-task
  Components: parquet-mr
Reporter: Zhenxiao Luo

 In https://github.com/zhenxiao/incubator-parquet-mr/tree/vector
 ColumnVector length is in terms of DataPages, need to be in terms of rows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PARQUET-178) META-INF for slf4j should not be in parquet-format jar

2015-06-16 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved PARQUET-178.
---
Resolution: Fixed
  Assignee: Ryan Blue

Merged. Thanks for letting us know about this [~koert]!

 META-INF for slf4j should not be in parquet-format jar
 --

 Key: PARQUET-178
 URL: https://issues.apache.org/jira/browse/PARQUET-178
 Project: Parquet
  Issue Type: Bug
  Components: parquet-format
Affects Versions: 1.6.0
Reporter: koert kuipers
Assignee: Ryan Blue
Priority: Minor

 {noformat}
 $ jar tf parquet-format-2.2.0-rc1.jar  | grep org\\.slf
 META-INF/maven/org.slf4j/
 META-INF/maven/org.slf4j/slf4j-api/
 META-INF/maven/org.slf4j/slf4j-api/pom.xml
 META-INF/maven/org.slf4j/slf4j-api/pom.properties
 {noformat}
 It is not clear to me why these are here. I suspect they should not be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-308) Add accessor to ParquetWriter to get current data size

2015-06-16 Thread Ryan Blue (JIRA)
Ryan Blue created PARQUET-308:
-

 Summary: Add accessor to ParquetWriter to get current data size
 Key: PARQUET-308
 URL: https://issues.apache.org/jira/browse/PARQUET-308
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.7.0
Reporter: Ryan Blue
Assignee: Ryan Blue
 Fix For: 1.8.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

2015-06-16 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589347#comment-14589347
 ] 

Ferdinand Xu commented on PARQUET-41:
-

Hi guys,
The pull request for parquet-format-mr is located at 
https://github.com/apache/parquet-mr/pull/215 and the one for parquet-format is 
at https://github.com/apache/parquet-format/pull/28. Currently, I only add the 
support for integer and test passed for unit test and hive side. The second one 
is used to define the data structure. Please help me review these two PRs. 
Thank you!

 Add bloom filters to parquet statistics
 ---

 Key: PARQUET-41
 URL: https://issues.apache.org/jira/browse/PARQUET-41
 Project: Parquet
  Issue Type: New Feature
  Components: parquet-format, parquet-mr
Reporter: Alex Levenson
Assignee: ferdinand xu
  Labels: filter2

 For row groups with no dictionary, we could still produce a bloom filter. 
 This could be very useful in filtering entire row groups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-309) Remove unnecessary compile dependency on parquet-generator

2015-06-16 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created PARQUET-309:
---

 Summary: Remove unnecessary compile dependency on parquet-generator
 Key: PARQUET-309
 URL: https://issues.apache.org/jira/browse/PARQUET-309
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.7.0
Reporter: Konstantin Shaposhnikov


parquet-generator is used during build time only. Other parquet-jars (e.g. 
parquet-encoding) should not depend on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)