[jira] [Commented] (PARQUET-1826) Document hadoop configuration options

2020-03-25 Thread Walid Gara (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067229#comment-17067229 ] Walid Gara commented on PARQUET-1826: - Thanks [~gszadovszky] > Document hadoop configuration

[jira] [Updated] (PARQUET-1828) Add a SSE2 path for the ByteStreamSplit encoder implementation

2020-03-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1828: Labels: pull-request-available (was: ) > Add a SSE2 path for the ByteStreamSplit

[jira] [Created] (PARQUET-1828) Add a SSE2 path for the ByteStreamSplit encoder implementation

2020-03-25 Thread Martin Radev (Jira)
Martin Radev created PARQUET-1828: - Summary: Add a SSE2 path for the ByteStreamSplit encoder implementation Key: PARQUET-1828 URL: https://issues.apache.org/jira/browse/PARQUET-1828 Project: Parquet

[jira] [Resolved] (PARQUET-458) [C++] Implement support for DataPageV2

2020-03-25 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved PARQUET-458. -- Resolution: Fixed Issue resolved by pull request 6481

Re: Creating a Parquet file with a UUID field

2020-03-25 Thread Brad Smith
I've created the Jira issue here: https://issues.apache.org/jira/browse/PARQUET-1827 I wonder if any Parquet implementations support this UUID type? Anyway, thanks for your help, Brad On Wed, Mar 25, 2020 at 11:05 AM Gabor Szadovszky wrote: > Hi Brad, > > So, the UUD logical type is added to

[jira] [Created] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-03-25 Thread Brad Smith (Jira)
Brad Smith created PARQUET-1827: --- Summary: UUID type currently not supported by parquet-mr Key: PARQUET-1827 URL: https://issues.apache.org/jira/browse/PARQUET-1827 Project: Parquet Issue

Re: Creating a Parquet file with a UUID field

2020-03-25 Thread Gabor Szadovszky
Hi Brad, So, the UUD logical type is added to parquet-format which is the specification of Parquet. It is not yet implemented in parquet-mr so you are not able to use it. However, parquet-mr does not provide too much support for logical types anyway, so you might simply use the

Creating a Parquet file with a UUID field

2020-03-25 Thread Brad Smith
I recently read about the new UUID logical type introduced in parquet-format 2.4.0. I'm interested in trying it out, but I haven't been able to figure out how to make it work so far. For example, the code below uses the parquet-mr library to output a very simple test Parquet file with one string

Re: Interpretation of PageHeader uncompressed_page_size

2020-03-25 Thread Gabor Szadovszky
Hi Hatem, I agree that the levels shall be included as per the specification. I checked the implementation in parquet-mr as well and it also includes the levels in both uncompressed and compressed values. Cheers, Gabor On Wed, Mar 25, 2020 at 1:02 PM Hatem Helal wrote: > I've recently done

Interpretation of PageHeader uncompressed_page_size

2020-03-25 Thread Hatem Helal
I've recently done some work on adding support for DataPageV2 to the cpp code base [1]. A question came up if the uncompressed_page_size includes the levels which are not compressed in the V2 format anyway. My understanding of the thrift specification [2] is that the levels are included in this

[jira] [Assigned] (PARQUET-1826) Document hadoop configuration options

2020-03-25 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1826: - Assignee: Walid Gara Based on our discussion in the Parquet sync I'm

[jira] [Assigned] (PARQUET-1787) Expected distinct numbers is not parsed correctly

2020-03-25 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1787: - Assignee: Walid Gara > Expected distinct numbers is not parsed correctly >

[jira] [Assigned] (PARQUET-1815) Add union API to BloomFilter interface

2020-03-25 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1815: - Assignee: Walid Gara > Add union API to BloomFilter interface >

[jira] [Assigned] (PARQUET-1816) Add intersection API to BloomFilter interface

2020-03-25 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1816: - Assignee: Walid Gara > Add intersection API to BloomFilter interface >

[jira] [Assigned] (PARQUET-1743) Add equals to BlockSplitBloomFilter

2020-03-25 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1743: - Assignee: Walid Gara > Add equals to BlockSplitBloomFilter >

[jira] [Created] (PARQUET-1826) Document hadoop configuration options

2020-03-25 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-1826: - Summary: Document hadoop configuration options Key: PARQUET-1826 URL: https://issues.apache.org/jira/browse/PARQUET-1826 Project: Parquet Issue