Re: Interpretation of PageHeader uncompressed_page_size

2020-03-26 Thread Hatem Helal
the > levels in both uncompressed and compressed values. > > Cheers, > Gabor > > On Wed, Mar 25, 2020 at 1:02 PM Hatem Helal wrote: > > > I've recently done some work on adding support for DataPageV2 to the cpp > > code base [1]. A question came up if the uncompr

Interpretation of PageHeader uncompressed_page_size

2020-03-25 Thread Hatem Helal
I've recently done some work on adding support for DataPageV2 to the cpp code base [1]. A question came up if the uncompressed_page_size includes the levels which are not compressed in the V2 format anyway. My understanding of the thrift specification [2] is that the levels are included in this

[jira] [Assigned] (PARQUET-458) [C++] Implement support for DataPageV2

2020-02-24 Thread Hatem Helal (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hatem Helal reassigned PARQUET-458: --- Assignee: Hatem Helal > [C++] Implement support for DataPag

[jira] [Created] (PARQUET-1639) [C++] Remove regex dependency for parsing ApplicationVersion

2019-08-16 Thread Hatem Helal (JIRA)
Hatem Helal created PARQUET-1639: Summary: [C++] Remove regex dependency for parsing ApplicationVersion Key: PARQUET-1639 URL: https://issues.apache.org/jira/browse/PARQUET-1639 Project: Parquet

[jira] [Resolved] (PARQUET-1623) [C++] Invalid memory access with a magic number of records

2019-07-12 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hatem Helal resolved PARQUET-1623. -- Resolution: Fixed Issue resolved by pull request 4857 [https://github.com/apache/arrow/pull

[jira] [Commented] (PARQUET-1623) [C++] Invalid memory access with a magic number of records

2019-07-11 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883150#comment-16883150 ] Hatem Helal commented on PARQUET-1623: -- Yes, will post one soon.  Working on a unittest

[jira] [Commented] (PARQUET-1623) [C++] Invalid memory access with a magic number of records

2019-07-11 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883126#comment-16883126 ] Hatem Helal commented on PARQUET-1623: -- I think I might understand what is happening here: when

[jira] [Commented] (PARQUET-1623) [C++] Invalid memory access with a magic number of records

2019-07-11 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883112#comment-16883112 ] Hatem Helal commented on PARQUET-1623: -- Here is the ASAN stack for the test:  [https

[jira] [Created] (PARQUET-1623) [C++] Invalid memory access with a magic number of records

2019-07-11 Thread Hatem Helal (JIRA)
Hatem Helal created PARQUET-1623: Summary: [C++] Invalid memory access with a magic number of records Key: PARQUET-1623 URL: https://issues.apache.org/jira/browse/PARQUET-1623 Project: Parquet

[jira] [Commented] (PARQUET-1169) [C++] Segment fault when using NextBatch of parquet::arrow::ColumnReader in parquet-cpp

2019-07-02 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876821#comment-16876821 ] Hatem Helal commented on PARQUET-1169: -- [~frankfang], could you try this again using arrow master

[jira] [Commented] (PARQUET-1565) [C++] SEGV in FromParquetSchema with corrupt file from PARQUET-1481

2019-04-18 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821152#comment-16821152 ] Hatem Helal commented on PARQUET-1565: -- This is a somewhat esoteric problem but the fix seems

[jira] [Created] (PARQUET-1565) [C++] SEGV in FromParquetSchema with corrupt file from PARQUET-1481

2019-04-18 Thread Hatem Helal (JIRA)
Hatem Helal created PARQUET-1565: Summary: [C++] SEGV in FromParquetSchema with corrupt file from PARQUET-1481 Key: PARQUET-1565 URL: https://issues.apache.org/jira/browse/PARQUET-1565 Project

[jira] [Commented] (PARQUET-1540) [C++] Set shared library version for linux and mac builds

2019-03-06 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785475#comment-16785475 ] Hatem Helal commented on PARQUET-1540: -- This is a duplicate of ARROW-3185 > [C++] Set sha

[jira] [Resolved] (PARQUET-1540) [C++] Set shared library version for linux and mac builds

2019-03-06 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hatem Helal resolved PARQUET-1540. -- Resolution: Duplicate > [C++] Set shared library version for linux and mac bui

[jira] [Commented] (PARQUET-1540) [C++] Set shared library version for linux and mac builds

2019-03-04 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783267#comment-16783267 ] Hatem Helal commented on PARQUET-1540: -- This was discussed on the [mailing list|https

[jira] [Created] (PARQUET-1540) [C++] Set shared library version for linux and mac builds

2019-02-25 Thread Hatem Helal (JIRA)
Hatem Helal created PARQUET-1540: Summary: [C++] Set shared library version for linux and mac builds Key: PARQUET-1540 URL: https://issues.apache.org/jira/browse/PARQUET-1540 Project: Parquet

[jira] [Commented] (PARQUET-1482) [C++] Unable to read data from parquet file generated with parquetjs

2019-01-03 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733248#comment-16733248 ] Hatem Helal commented on PARQUET-1482: -- [~wesmckinn], my colleague [~rdmello] is working on a fix

[jira] [Commented] (PARQUET-1482) [C++] Unable to read data from parquet file generated with parquetjs

2018-12-21 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726884#comment-16726884 ] Hatem Helal commented on PARQUET-1482: -- I think this is a problem in parquet-cpp since I've

[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726840#comment-16726840 ] Hatem Helal commented on PARQUET-1481: -- Great, thanks for that [~wesmckinn]! > [C++] SEGV w

[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726795#comment-16726795 ] Hatem Helal commented on PARQUET-1481: -- Sure, a colleague used a text editor to make a random

[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726757#comment-16726757 ] Hatem Helal commented on PARQUET-1481: -- Managed to reproduce this using a simple test using latest

[jira] [Updated] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hatem Helal updated PARQUET-1481: - Attachment: corrupt.parquet > [C++] SEGV when reading corrupt parquet f

[jira] [Created] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Hatem Helal (JIRA)
Hatem Helal created PARQUET-1481: Summary: [C++] SEGV when reading corrupt parquet file Key: PARQUET-1481 URL: https://issues.apache.org/jira/browse/PARQUET-1481 Project: Parquet Issue Type

Re: parquet-arrow estimate file size

2018-12-11 Thread Hatem Helal
I think if I've understood the problem correctly, you could use the parquet::arrow::FileWriter https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.h#L128 The basic pattern is to use an object to manage the FileWriter lifetime, call the WriteTable method for each row group,

Re: Regarding Apache Parquet Project

2018-12-11 Thread Hatem Helal
Hi Arjit, I'm new around here too but interested to hear what the others on this list have to say. For C++ development, I've recommend reading through the examples: https://github.com/apache/arrow/tree/master/cpp/examples/parquet and the command-line tools:

[jira] [Created] (PARQUET-1473) [C++] Add helper function that converts ParquetVersion to human-friendly string

2018-12-10 Thread Hatem Helal (JIRA)
Hatem Helal created PARQUET-1473: Summary: [C++] Add helper function that converts ParquetVersion to human-friendly string Key: PARQUET-1473 URL: https://issues.apache.org/jira/browse/PARQUET-1473

[jira] [Updated] (PARQUET-1458) parquet::CompressionToString not recognizing brotli compression

2018-11-14 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hatem Helal updated PARQUET-1458: - Labels: (was: pull) > parquet::CompressionToString not recognizing brotli compress

[jira] [Updated] (PARQUET-1458) parquet::CompressionToString not recognizing brotli compression

2018-11-14 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hatem Helal updated PARQUET-1458: - Labels: pull (was: ) > parquet::CompressionToString not recognizing brotli compress

[jira] [Updated] (PARQUET-1458) parquet::CompressionToString not recognizing brotli compression

2018-11-14 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hatem Helal updated PARQUET-1458: - Priority: Trivial (was: Major) > parquet::CompressionToString not recognizing bro

[jira] [Updated] (PARQUET-1458) parquet::CompressionToString not recognizing brotli compression

2018-11-14 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hatem Helal updated PARQUET-1458: - Description: It looks like we just need to add a case to handle the brotli codec [here|[https

[jira] [Commented] (PARQUET-1458) parquet::CompressionToString not recognizing brotli compression

2018-11-14 Thread Hatem Helal (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686773#comment-16686773 ] Hatem Helal commented on PARQUET-1458: -- Looking into fixing this. > parquet::CompressionToStr

[jira] [Created] (PARQUET-1458) parquet::CompressionToString not recognizing brotli compression

2018-11-14 Thread Hatem Helal (JIRA)
Hatem Helal created PARQUET-1458: Summary: parquet::CompressionToString not recognizing brotli compression Key: PARQUET-1458 URL: https://issues.apache.org/jira/browse/PARQUET-1458 Project: Parquet