[jira] [Commented] (PARQUET-2045) ConsecutiveChunkList's length field should be long instead of int
[ https://issues.apache.org/jira/browse/PARQUET-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341789#comment-17341789 ] Xianjin YE commented on PARQUET-2045: - I'd like to propose a fix that changing the length field in ConsecutiveChunkList to long. > ConsecutiveChunkList's length field should be long instead of int > - > > Key: PARQUET-2045 > URL: https://issues.apache.org/jira/browse/PARQUET-2045 > Project: Parquet > Issue Type: Bug >Reporter: Xianjin YE >Assignee: Xianjin YE >Priority: Major > Attachments: image-2021-05-10-17-12-00-083.png, > image-2021-05-10-17-14-45-401.png > > > Hi, we encountered some read failure for large column chunk(size > > Int.MaxValue). After some debugging, the buggy code is that > ConsecutiveChunkList's length field is int, and it overflows when the > uncompressed size of one ColumnChunk is large than Int.MaxValue. > > Below is the exception stack: > !image-2021-05-10-17-12-00-083.png! > > The column size is some what: > !image-2021-05-10-17-14-45-401.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (PARQUET-2045) ConsecutiveChunkList's length field should be long instead of int
Xianjin YE created PARQUET-2045: --- Summary: ConsecutiveChunkList's length field should be long instead of int Key: PARQUET-2045 URL: https://issues.apache.org/jira/browse/PARQUET-2045 Project: Parquet Issue Type: Bug Reporter: Xianjin YE Assignee: Xianjin YE Attachments: image-2021-05-10-17-12-00-083.png, image-2021-05-10-17-14-45-401.png Hi, we encountered some read failure for large column chunk(size > Int.MaxValue). After some debugging, the buggy code is that ConsecutiveChunkList's length field is int, and it overflows when the uncompressed size of one ColumnChunk is large than Int.MaxValue. Below is the exception stack: !image-2021-05-10-17-12-00-083.png! The column size is some what: !image-2021-05-10-17-14-45-401.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (PARQUET-1257) GetRecordBatchReader in parquet/arrow/reader.h should be able to specify chunksize
Xianjin YE created PARQUET-1257: --- Summary: GetRecordBatchReader in parquet/arrow/reader.h should be able to specify chunksize Key: PARQUET-1257 URL: https://issues.apache.org/jira/browse/PARQUET-1257 Project: Parquet Issue Type: Improvement Components: parquet-cpp Reporter: Xianjin YE see [https://github.com/apache/parquet-cpp/pull/445] comments -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1166) [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h
[ https://issues.apache.org/jira/browse/PARQUET-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266954#comment-16266954 ] Xianjin YE commented on PARQUET-1166: - All right then, I will send pr soon and will try to reuse Arrow's code whenever possible. > [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h > - > > Key: PARQUET-1166 > URL: https://issues.apache.org/jira/browse/PARQUET-1166 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Xianjin YE > > Hi, I'd like to proposal a new API to better support splittable reading for > Parquet File. > The intent for this API is that we can selective reading RowGroups(normally > be contiguous, but can be arbitrary as long as the row_group_idxes are sorted > and unique, [1, 3, 5] for example). > The proposed API would be something like this: > {code:java} > ::arrow::Status GetRecordBatchReader(const std::vector& > row_group_indices, > > std::shared_ptr<::arrow::RecordBatchReader>* out); > > ::arrow::Status GetRecordBatchReader(const std::vector& > row_group_indices, > const > std::vector& column_indices, > > std::shared_ptr<::arrow::RecordBatchReader>* out); > {code} > With new API, we can split Parquet file into RowGroups and can be processed > by multiple tasks(maybe be on different hosts, like the Map task in MapReduce) > [~wesmckinn][~xhochy] What do you think? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1166) [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h
[ https://issues.apache.org/jira/browse/PARQUET-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE updated PARQUET-1166: Description: Hi, I'd like to proposal a new API to better support splittable reading for Parquet File. The intent for this API is that we can selective reading RowGroups(normally be contiguous, but can be arbitrary as long as the row_group_idxes are sorted and unique, [1, 3, 5] for example). The proposed API would be something like this: {code:java} ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, const std::vector& column_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); {code} With new API, we can split Parquet file into RowGroups and can be processed by multiple tasks(maybe be on different hosts, like the Map task in MapReduce) [~wesmckinn]@xch was: Hi, I'd like to proposal a new API to better support splittable reading for Parquet File. The intent for this API is that we can selective reading RowGroups(normally be contiguous, but can be arbitrary as long as the row_group_idxes are sorted and unique, [1, 3, 5] for example). The proposed API would be something like this: {code:java} ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, const std::vector& column_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); {code} With new API, we can split Parquet file into RowGroups and can be processed by multiple tasks(maybe be on different hosts, like the Map task in MapReduce) > [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h > - > > Key: PARQUET-1166 > URL: https://issues.apache.org/jira/browse/PARQUET-1166 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Xianjin YE > > Hi, I'd like to proposal a new API to better support splittable reading for > Parquet File. > The intent for this API is that we can selective reading RowGroups(normally > be contiguous, but can be arbitrary as long as the row_group_idxes are sorted > and unique, [1, 3, 5] for example). > The proposed API would be something like this: > {code:java} > ::arrow::Status GetRecordBatchReader(const std::vector& > row_group_indices, > > std::shared_ptr<::arrow::RecordBatchReader>* out); > > ::arrow::Status GetRecordBatchReader(const std::vector& > row_group_indices, > const > std::vector& column_indices, > > std::shared_ptr<::arrow::RecordBatchReader>* out); > {code} > With new API, we can split Parquet file into RowGroups and can be processed > by multiple tasks(maybe be on different hosts, like the Map task in MapReduce) > [~wesmckinn]@xch -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1166) [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h
[ https://issues.apache.org/jira/browse/PARQUET-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE updated PARQUET-1166: Description: Hi, I'd like to proposal a new API to better support splittable reading for Parquet File. The intent for this API is that we can selective reading RowGroups(normally be contiguous, but can be arbitrary as long as the row_group_idxes are sorted and unique, [1, 3, 5] for example). The proposed API would be something like this: {code:java} ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, const std::vector& column_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); {code} With new API, we can split Parquet file into RowGroups and can be processed by multiple tasks(maybe be on different hosts, like the Map task in MapReduce) [~wesmckinn][~xhochy] What do you think? was: Hi, I'd like to proposal a new API to better support splittable reading for Parquet File. The intent for this API is that we can selective reading RowGroups(normally be contiguous, but can be arbitrary as long as the row_group_idxes are sorted and unique, [1, 3, 5] for example). The proposed API would be something like this: {code:java} ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, const std::vector& column_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); {code} With new API, we can split Parquet file into RowGroups and can be processed by multiple tasks(maybe be on different hosts, like the Map task in MapReduce) [~wesmckinn]@xch > [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h > - > > Key: PARQUET-1166 > URL: https://issues.apache.org/jira/browse/PARQUET-1166 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Xianjin YE > > Hi, I'd like to proposal a new API to better support splittable reading for > Parquet File. > The intent for this API is that we can selective reading RowGroups(normally > be contiguous, but can be arbitrary as long as the row_group_idxes are sorted > and unique, [1, 3, 5] for example). > The proposed API would be something like this: > {code:java} > ::arrow::Status GetRecordBatchReader(const std::vector& > row_group_indices, > > std::shared_ptr<::arrow::RecordBatchReader>* out); > > ::arrow::Status GetRecordBatchReader(const std::vector& > row_group_indices, > const > std::vector& column_indices, > > std::shared_ptr<::arrow::RecordBatchReader>* out); > {code} > With new API, we can split Parquet file into RowGroups and can be processed > by multiple tasks(maybe be on different hosts, like the Map task in MapReduce) > [~wesmckinn][~xhochy] What do you think? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1166) [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h
Xianjin YE created PARQUET-1166: --- Summary: [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h Key: PARQUET-1166 URL: https://issues.apache.org/jira/browse/PARQUET-1166 Project: Parquet Issue Type: Improvement Components: parquet-cpp Reporter: Xianjin YE Hi, I'd like to proposal a new API to better support splittable reading for Parquet File. The intent for this API is that we can selective reading RowGroups(normally be contiguous, but can be arbitrary as long as the row_group_idxes are sorted and unique, [1, 3, 5] for example). The proposed API would be something like this: {code:java} ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); ::arrow::Status GetRecordBatchReader(const std::vector& row_group_indices, const std::vector& column_indices, std::shared_ptr<::arrow::RecordBatchReader>* out); {code} With new API, we can split Parquet file into RowGroups and can be processed by multiple tasks(maybe be on different hosts, like the Map task in MapReduce) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-970) Add Add Lz4 and Zstd compression codecs
[ https://issues.apache.org/jira/browse/PARQUET-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE updated PARQUET-970: --- Description: https://github.com/facebook/zstd looks quite promising, I'd like to add a compressor in parquet-cpp. Lz4 and Zstd codecs are added as parquet-format has already added these codecs. was:https://github.com/facebook/zstd looks quite promising, I'd like to add a compressor in parquet-cpp. > Add Add Lz4 and Zstd compression codecs > --- > > Key: PARQUET-970 > URL: https://issues.apache.org/jira/browse/PARQUET-970 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Xianjin YE > > https://github.com/facebook/zstd looks quite promising, I'd like to add a > compressor in parquet-cpp. > Lz4 and Zstd codecs are added as parquet-format has already added these > codecs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-970) Add Add Lz4 and Zstd compression codecs
[ https://issues.apache.org/jira/browse/PARQUET-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE updated PARQUET-970: --- Summary: Add Add Lz4 and Zstd compression codecs (was: Add ZstdCompressor for parquet-cpp compressor interface) > Add Add Lz4 and Zstd compression codecs > --- > > Key: PARQUET-970 > URL: https://issues.apache.org/jira/browse/PARQUET-970 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Xianjin YE > > https://github.com/facebook/zstd looks quite promising, I'd like to add a > compressor in parquet-cpp. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-970) Add ZstdCompressor for parquet-cpp compressor interface
[ https://issues.apache.org/jira/browse/PARQUET-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099507#comment-16099507 ] Xianjin YE commented on PARQUET-970: Great. I would do some experiment when I get some spare time... > Add ZstdCompressor for parquet-cpp compressor interface > --- > > Key: PARQUET-970 > URL: https://issues.apache.org/jira/browse/PARQUET-970 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Xianjin YE > > https://github.com/facebook/zstd looks quite promising, I'd like to add a > compressor in parquet-cpp. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-1012) parquet-cpp and parquet-mr version parse inconsistent
[ https://issues.apache.org/jira/browse/PARQUET-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035787#comment-16035787 ] Xianjin YE commented on PARQUET-1012: - [~mdeepak] I think this issued is fixed by PARQUET-349, but only in parquet-mr 1.9. We may need to add a build hash yo make parquet-mr 1.8.2(Spark 2.1) happy. > parquet-cpp and parquet-mr version parse inconsistent > - > > Key: PARQUET-1012 > URL: https://issues.apache.org/jira/browse/PARQUET-1012 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Deepak Majeti > > Spark 2.1 uses parquet-mr(common) 1.8.2 which requires created_by to match > certain pattern. I found the following exception when using spark to read > parquet file generated by parquet-cpp. > 17/05/31 16:33:53 WARN CorruptStatistics: Ignoring statistics because > created_by could not be parsed (see PARQUET-251): parquet-cpp version 1.0.0 > org.apache.parquet.VersionParser$VersionParseException: Could not parse > created_by: parquet-cpp version 1.0.0 using format: (.+) version ((.*) > )?\(build ?(.*)\) > Proposal to fix this issue: set created_by to the certain pattern. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PARQUET-1012) parquet-cpp and parquet-mr version parse inconsistent
[ https://issues.apache.org/jira/browse/PARQUET-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035787#comment-16035787 ] Xianjin YE edited comment on PARQUET-1012 at 6/3/17 3:52 AM: - [~mdeepak] I think this issued is fixed by PARQUET-349, but only in parquet-mr 1.9. We may need to add a build hash to make parquet-mr 1.8.2(Spark 2.1) happy. was (Author: advancedxy): [~mdeepak] I think this issued is fixed by PARQUET-349, but only in parquet-mr 1.9. We may need to add a build hash yo make parquet-mr 1.8.2(Spark 2.1) happy. > parquet-cpp and parquet-mr version parse inconsistent > - > > Key: PARQUET-1012 > URL: https://issues.apache.org/jira/browse/PARQUET-1012 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Deepak Majeti > > Spark 2.1 uses parquet-mr(common) 1.8.2 which requires created_by to match > certain pattern. I found the following exception when using spark to read > parquet file generated by parquet-cpp. > 17/05/31 16:33:53 WARN CorruptStatistics: Ignoring statistics because > created_by could not be parsed (see PARQUET-251): parquet-cpp version 1.0.0 > org.apache.parquet.VersionParser$VersionParseException: Could not parse > created_by: parquet-cpp version 1.0.0 using format: (.+) version ((.*) > )?\(build ?(.*)\) > Proposal to fix this issue: set created_by to the certain pattern. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-1012) parquet-cpp and parquet-mr version parse inconsistent
[ https://issues.apache.org/jira/browse/PARQUET-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031451#comment-16031451 ] Xianjin YE commented on PARQUET-1012: - Thanks. > parquet-cpp and parquet-mr version parse inconsistent > - > > Key: PARQUET-1012 > URL: https://issues.apache.org/jira/browse/PARQUET-1012 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Deepak Majeti > > Spark 2.1 uses parquet-mr(common) 1.8.2 which requires created_by to match > certain pattern. I found the following exception when using spark to read > parquet file generated by parquet-cpp. > 17/05/31 16:33:53 WARN CorruptStatistics: Ignoring statistics because > created_by could not be parsed (see PARQUET-251): parquet-cpp version 1.0.0 > org.apache.parquet.VersionParser$VersionParseException: Could not parse > created_by: parquet-cpp version 1.0.0 using format: (.+) version ((.*) > )?\(build ?(.*)\) > Proposal to fix this issue: set created_by to the certain pattern. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PARQUET-1012) parquet-cpp and parquet-mr version parse inconsistent
Xianjin YE created PARQUET-1012: --- Summary: parquet-cpp and parquet-mr version parse inconsistent Key: PARQUET-1012 URL: https://issues.apache.org/jira/browse/PARQUET-1012 Project: Parquet Issue Type: Improvement Reporter: Xianjin YE Spark 2.1 uses parquet-mr(common) 1.8.2 which requires created_by to match certain pattern. I found the following exception when using spark to read parquet file generated by parquet-cpp. 17/05/31 16:33:53 WARN CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-cpp version 1.0.0 org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-cpp version 1.0.0 using format: (.+) version ((.*) )?\(build ?(.*)\) Proposal to fix this issue: set created_by to the certain pattern. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PARQUET-1012) parquet-cpp and parquet-mr version parse inconsistent
[ https://issues.apache.org/jira/browse/PARQUET-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE updated PARQUET-1012: Component/s: parquet-cpp > parquet-cpp and parquet-mr version parse inconsistent > - > > Key: PARQUET-1012 > URL: https://issues.apache.org/jira/browse/PARQUET-1012 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Xianjin YE > > Spark 2.1 uses parquet-mr(common) 1.8.2 which requires created_by to match > certain pattern. I found the following exception when using spark to read > parquet file generated by parquet-cpp. > 17/05/31 16:33:53 WARN CorruptStatistics: Ignoring statistics because > created_by could not be parsed (see PARQUET-251): parquet-cpp version 1.0.0 > org.apache.parquet.VersionParser$VersionParseException: Could not parse > created_by: parquet-cpp version 1.0.0 using format: (.+) version ((.*) > )?\(build ?(.*)\) > Proposal to fix this issue: set created_by to the certain pattern. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-995) [C++] Int96 reader in parquet_arrow uses size of Int96Type instead of Int96
[ https://issues.apache.org/jira/browse/PARQUET-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019758#comment-16019758 ] Xianjin YE commented on PARQUET-995: Hi, [~wesmckinn] how is this issue found? I was haunted by this bug all day today. In our application, the segfault showed up in the core dump pointed to some innocent code(not related to parquet) due to the bad memory access. I tried everything and until I found this issue, it occurred to me this might be the root cause. And indeed the problem is solved when I merged the latest parquet code. > [C++] Int96 reader in parquet_arrow uses size of Int96Type instead of Int96 > --- > > Key: PARQUET-995 > URL: https://issues.apache.org/jira/browse/PARQUET-995 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Blocker > Fix For: cpp-1.1.0 > > > This produces a segfault when reading {{alltypes_plain.parquet}} with > parquet::arrow. I will see if I can reproduce with a test case. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (PARQUET-924) [C++] Persist original type metadata from Arrow schemas
[ https://issues.apache.org/jira/browse/PARQUET-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE reassigned PARQUET-924: -- Assignee: Xianjin YE > [C++] Persist original type metadata from Arrow schemas > --- > > Key: PARQUET-924 > URL: https://issues.apache.org/jira/browse/PARQUET-924 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Xianjin YE > > This will enable us to convert back to the original type in some cases > (DictionaryArray, Time with seconds) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (PARQUET-914) [C++] Throw more informative exception when user writes too many values to a column in a row group
[ https://issues.apache.org/jira/browse/PARQUET-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE reassigned PARQUET-914: -- Assignee: Xianjin YE > [C++] Throw more informative exception when user writes too many values to a > column in a row group > -- > > Key: PARQUET-914 > URL: https://issues.apache.org/jira/browse/PARQUET-914 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Xianjin YE > > In > https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L159 > if the user writes more values than the size of the row group, the message > in the exception raised is misleading -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-970) Add ZstdCompressor for parquet-cpp compressor interface
[ https://issues.apache.org/jira/browse/PARQUET-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994098#comment-15994098 ] Xianjin YE commented on PARQUET-970: Thanks for your input. The ZstdCompressor interface in Arrow works for me. I can work on that after [~wesmckinn] moves the interfaces. And It's more convince if we have applied it in the arrow lib before proposing to parquet-format. > Add ZstdCompressor for parquet-cpp compressor interface > --- > > Key: PARQUET-970 > URL: https://issues.apache.org/jira/browse/PARQUET-970 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Xianjin YE > > https://github.com/facebook/zstd looks quite promising, I'd like to add a > compressor in parquet-cpp. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PARQUET-970) Add ZstdCompressor for parquet-cpp compressor interface
[ https://issues.apache.org/jira/browse/PARQUET-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE updated PARQUET-970: --- Description: https://github.com/facebook/zstd looks quite promising, I'd like to add a compressor in parquet-cpp. (was: https://github.com/facebook/zstd looks quite promising, I'd like a compressor support in parquet-cpp.) > Add ZstdCompressor for parquet-cpp compressor interface > --- > > Key: PARQUET-970 > URL: https://issues.apache.org/jira/browse/PARQUET-970 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Xianjin YE > > https://github.com/facebook/zstd looks quite promising, I'd like to add a > compressor in parquet-cpp. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PARQUET-970) Add ZstdCompressor for parquet-cpp compressor interface
Xianjin YE created PARQUET-970: -- Summary: Add ZstdCompressor for parquet-cpp compressor interface Key: PARQUET-970 URL: https://issues.apache.org/jira/browse/PARQUET-970 Project: Parquet Issue Type: New Feature Reporter: Xianjin YE Assignee: Xianjin YE https://github.com/facebook/zstd looks quite promising, I'd like a compressor support in parquet-cpp. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PARQUET-970) Add ZstdCompressor for parquet-cpp compressor interface
[ https://issues.apache.org/jira/browse/PARQUET-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianjin YE updated PARQUET-970: --- Component/s: parquet-cpp > Add ZstdCompressor for parquet-cpp compressor interface > --- > > Key: PARQUET-970 > URL: https://issues.apache.org/jira/browse/PARQUET-970 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Xianjin YE >Assignee: Xianjin YE > > https://github.com/facebook/zstd looks quite promising, I'd like a compressor > support in parquet-cpp. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-924) [C++] Persist original type metadata from Arrow schemas
[ https://issues.apache.org/jira/browse/PARQUET-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993129#comment-15993129 ] Xianjin YE commented on PARQUET-924: Sounds reasonable. [~wesmckinn] you can assign this to me if no one is working on this. Will work on this tomorrow if i have some spare time. > [C++] Persist original type metadata from Arrow schemas > --- > > Key: PARQUET-924 > URL: https://issues.apache.org/jira/browse/PARQUET-924 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney > > This will enable us to convert back to the original type in some cases > (DictionaryArray, Time with seconds) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-924) [C++] Persist original type metadata from Arrow schemas
[ https://issues.apache.org/jira/browse/PARQUET-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993007#comment-15993007 ] Xianjin YE commented on PARQUET-924: How should we persist original Arrow Schema? Flat Buffer Message in the key value metadata? > [C++] Persist original type metadata from Arrow schemas > --- > > Key: PARQUET-924 > URL: https://issues.apache.org/jira/browse/PARQUET-924 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney > > This will enable us to convert back to the original type in some cases > (DictionaryArray, Time with seconds) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-936) [C++] parquet::arrow::WriteTable can enter infinite loop if chunk_size is 0
[ https://issues.apache.org/jira/browse/PARQUET-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992993#comment-15992993 ] Xianjin YE commented on PARQUET-936: Is anyone working on this? If not, [~wesmckinn] you can assign it to me in case of another duplicate work. > [C++] parquet::arrow::WriteTable can enter infinite loop if chunk_size is 0 > --- > > Key: PARQUET-936 > URL: https://issues.apache.org/jira/browse/PARQUET-936 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Wes McKinney > Fix For: cpp-1.1.0 > > > See also ARROW-723 -- This message was sent by Atlassian JIRA (v6.3.15#6346)