Re: Parquet Sync - 10/17/2019 - Meeting Notes

2019-10-17 Thread Julien Le Dem
Thanks for the notes. Sorry I missed the sync because of a conflict. On Thu, Oct 17, 2019 at 10:00 AM Gidon Gershinsky wrote: > A slight correction re C++. I said the following > C++ work is near completion/merge. Deepark has reviewed it and made > additional changes / refactoring. > > On Thu,

Re: custom CompressionCodec support

2019-10-17 Thread Radev, Martin
Hi Falak, I was one of the people who recently exposed this to Arrow but this is not part of the Parquet specification. In particular, any implementation for writing parquet files can decide whether to expose this or select a reasonable value internally. If you're using Arrow, you would

[jira] [Commented] (PARQUET-1678) [C++] Provide classes for reading/writing using input/output operators

2019-10-17 Thread Gawain BOLTON (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953988#comment-16953988 ] Gawain BOLTON commented on PARQUET-1678: [~wesm]: Thank you for your feedback.   Yes I

Re: custom CompressionCodec support

2019-10-17 Thread Falak Kansal
Hi Fokko, Thanks for replying, yes sure. The problem we are facing is that with parquet zstd we are not able to control the compression level, we tried setting different compression levels but it doesn't make any difference in the size. We tested/have made sure that we are getting the same

Re: Parquet Sync - 10/17/2019 - Meeting Notes

2019-10-17 Thread Gidon Gershinsky
A slight correction re C++. I said the following C++ work is near completion/merge. Deepark has reviewed it and made additional changes / refactoring. On Thu, Oct 17, 2019 at 7:33 PM wrote: > 10/17/2019 > > Attendee: > Gidon > Gabor > Ryan > Karfiol > Xinli > > Topics: > > Column Encryption >

Parquet Sync - 10/17/2019 - Meeting Notes

2019-10-17 Thread shangx
10/17/2019 Attendee: Gidon Gabor Ryan Karfiol Xinli Topics: Column Encryption For C++ version, Gidon worked with Deepak to have reviews going on. For Java, we are blocked on the Parquet-11 release. Gabor proposed to have branch the Parquet-11 and then merge later. But we would need to be in

Re: multi threading support

2019-10-17 Thread Driesprong, Fokko
Thank you for your question Manik, First of all, I think most of the people working on this project are guys, but I would not exclude any other gender. Secondly. Parquet is widely used in different open source project such as Hive, Presto and Spark. These frameworks scale-out by design. For

Re: custom CompressionCodec support

2019-10-17 Thread Driesprong, Fokko
Hi Manik, The supported compression codecs that ship with Parquet are tested and validated in the CI pipeline. Sometimes there are issues with compressors, therefore they are not easily pluggable. Feel free to open up a PR to the project if you believe if there are compressors missing, then we

[jira] [Updated] (PARQUET-1679) Invalid SchemaException for UUID while using AvroParquetWriter

2019-10-17 Thread Felix Kizhakkel Jose (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Kizhakkel Jose updated PARQUET-1679: -- Description: Hi, I am getting 

[jira] [Updated] (PARQUET-1680) Parquet Java Serialization is very slow

2019-10-17 Thread Felix Kizhakkel Jose (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Kizhakkel Jose updated PARQUET-1680: -- Description: Hi, I am doing a POC to compare different data formats and its