[jira] [Updated] (PARQUET-1189) Release Parquet Java 1.10

2018-01-17 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated PARQUET-1189: --- Issue Type: Task (was: Bug) > Release Parquet Java 1.10 > - > >

[jira] [Commented] (PARQUET-1169) Segment fault when using NextBatch of parquet::arrow::ColumnReader in parquet-cpp

2018-01-17 Thread Jian Fang (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329326#comment-16329326 ] Jian Fang commented on PARQUET-1169: [~xhochy] I updated our parquet-cpp and arrow with master

[jira] [Commented] (PARQUET-1193) [CPP] Implement ColumnOrder to support min_value and max_value

2018-01-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329261#comment-16329261 ] ASF GitHub Bot commented on PARQUET-1193: - xhochy commented on a change in pull request #430:

Re: [PARQUET-CPP] Writing hierarchical schema to a parquet

2018-01-17 Thread Wes McKinney
This work would only involve the Arrow interface in src/parquet/arrow (converting from Arrow representation to repetition/definition level encoding, and back), so you wouldn't need to master the whole Parquet codebase, at least. I'd like to help with this work, but realistically I won't have

[jira] [Commented] (PARQUET-1084) Parquet-C++ doesn't selectively read columns with mmap'ed files

2018-01-17 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328893#comment-16328893 ] Wes McKinney commented on PARQUET-1084: --- I see. I would say in that file we should indicate that

Re: Recommended page size controversy

2018-01-17 Thread Jim Pivarski
Optimizing compression ratios is one issue, optimizing page granularity for column indexes is another, and a third issue is that there is per-page metadata in the Parquet footer in Thrift format that has to be interpreted before anything in the file can be accessed. Too many pages could slow down

Re: [PARQUET-CPP] Writing hierarchical schema to a parquet

2018-01-17 Thread Jim Pivarski
I also have a use-case that requires lists-of-structs and encountered that limitation in pyarrow. Just one level deep would enable a lot of HEP data. I've worked out the logic of converting Parquet definition and repetition levels into Arrow-style arrays:

[jira] [Created] (PARQUET-1196) [C++] Provide a parquet_arrow example project incl. CMake setup

2018-01-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1196: Summary: [C++] Provide a parquet_arrow example project incl. CMake setup Key: PARQUET-1196 URL: https://issues.apache.org/jira/browse/PARQUET-1196 Project: Parquet

feature request for Hive/Athena

2018-01-17 Thread michael belostoky
Hello dear Parquet developers, I am using your file format and enjoy it a lot ! One feature I find missing is allowing a 'column.mappings' option in the SERDEPROPERTIES clause similar to org.openx.data.jsonserde.JsonSerDe. I think this is a very desirable feature among many Analysts/Developers

[jira] [Commented] (PARQUET-1193) [CPP] Implement ColumnOrder to support min_value and max_value

2018-01-17 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328631#comment-16328631 ] ASF GitHub Bot commented on PARQUET-1193: - majetideepak opened a new pull request #430:

[jira] [Commented] (PARQUET-1084) Parquet-C++ doesn't selectively read columns with mmap'ed files

2018-01-17 Thread Jakob Blomer (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328489#comment-16328489 ] Jakob Blomer commented on PARQUET-1084: --- That's very interesting, many thanks to all of you for