[jira] [Updated] (PARQUET-2369) Clarify Support for Pages Compressed with Multiple GZIP Members

2023-11-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2369: Fix Version/s: format-2.10.0 > Clarify Support for Pages Compressed with Multiple GZIP

[jira] [Updated] (PARQUET-2369) Clarify Support for Pages Compressed with Multiple GZIP Members

2023-11-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2369: Component/s: parquet-format > Clarify Support for Pages Compressed with Multiple GZIP

[jira] [Updated] (PARQUET-2369) Clarify Support for Pages Compressed with Multiple GZIP Members

2023-11-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2369: Priority: Major (was: Trivial) > Clarify Support for Pages Compressed with Multiple

[jira] [Updated] (PARQUET-1646) [C++] Use arrow::Buffer for buffered dictionary indices in DictEncoder instead of std::vector

2023-11-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1646: Fix Version/s: cpp-15.0.0 (was: cpp-14.0.0) > [C++] Use

[jira] [Updated] (PARQUET-2099) [C++] Statistics::num_values() is misleading

2023-11-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2099: Fix Version/s: cpp-15.0.0 (was: cpp-14.0.0) > [C++]

[jira] [Updated] (PARQUET-2321) allow customized buffer size when creating ArrowInputStream for a column PageReader

2023-11-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2321: Fix Version/s: cpp-15.0.0 (was: cpp-14.0.0) > allow customized

[jira] [Resolved] (PARQUET-2238) Spec and parquet-mr disagree on DELTA_BYTE_ARRAY encoding

2023-09-26 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2238. - Resolution: Duplicate > Spec and parquet-mr disagree on DELTA_BYTE_ARRAY encoding >

[jira] [Updated] (PARQUET-1646) [C++] Use arrow::Buffer for buffered dictionary indices in DictEncoder instead of std::vector

2023-08-24 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1646: Fix Version/s: cpp-14.0.0 (was: cpp-13.0.0) > [C++] Use

[jira] [Updated] (PARQUET-2321) allow customized buffer size when creating ArrowInputStream for a column PageReader

2023-08-24 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2321: Fix Version/s: cpp-14.0.0 (was: cpp-13.0.0) > allow customized

[jira] [Updated] (PARQUET-2099) [C++] Statistics::num_values() is misleading

2023-08-24 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2099: Fix Version/s: cpp-14.0.0 (was: cpp-13.0.0) > [C++]

[jira] [Updated] (PARQUET-2323) Use bit vector to store Prebuffered column chunk index

2023-07-28 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2323: Fix Version/s: cpp-13.0.0 (was: cpp-14.0.0) > Use bit vector to

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-07-26 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747306#comment-17747306 ] Antoine Pitrou commented on PARQUET-: - bq. Should we just keep the specs as is and let the

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-07-25 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746834#comment-17746834 ] Antoine Pitrou commented on PARQUET-: - There are other implementations arond, so I would be

[jira] [Updated] (PARQUET-2323) Use bit vector to store Prebuffered column chunk index

2023-07-19 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2323: Fix Version/s: cpp-14.0.0 (was: cpp-13.0.0) > Use bit vector to

[jira] [Resolved] (PARQUET-2323) Use bit vector to store Prebuffered column chunk index

2023-07-19 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2323. - Resolution: Fixed Issue resolved by pull request 36649

[jira] [Comment Edited] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-06-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733129#comment-17733129 ] Antoine Pitrou edited comment on PARQUET- at 6/15/23 3:32 PM: --

[jira] [Resolved] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-06-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-. - Resolution: Fixed Resolved in https://github.com/apache/parquet-format/pull/211 >

[jira] [Commented] (PARQUET-2310) [Doc] Add implementation status / matrix

2023-06-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733100#comment-17733100 ] Antoine Pitrou commented on PARQUET-2310: - This was originally proposed in

[jira] [Commented] (PARQUET-2310) [Doc] Add implementation status / matrix

2023-06-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733099#comment-17733099 ] Antoine Pitrou commented on PARQUET-2310: - cc [~wgtmac] [~gszadovszky] [~alippai] > [Doc] Add

[jira] [Created] (PARQUET-2310) [Doc] Add implementation status / matrix

2023-06-15 Thread Antoine Pitrou (Jira)
Antoine Pitrou created PARQUET-2310: --- Summary: [Doc] Add implementation status / matrix Key: PARQUET-2310 URL: https://issues.apache.org/jira/browse/PARQUET-2310 Project: Parquet Issue

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693891#comment-17693891 ] Antoine Pitrou commented on PARQUET-: - Yes, this is why I've filed this under

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-02-27 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693870#comment-17693870 ] Antoine Pitrou commented on PARQUET-: - > I don't understand. Isn't length the part of

[jira] [Resolved] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-19 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2231. - Resolution: Fixed Closed by PR https://github.com/apache/parquet-format/pull/189 >

[jira] [Updated] (PARQUET-152) Encoding issue with fixed length byte arrays

2023-01-16 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-152: --- Component/s: parquet-mr > Encoding issue with fixed length byte arrays >

[jira] [Commented] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-16 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677300#comment-17677300 ] Antoine Pitrou commented on PARQUET-2231: - [~rok] [~shanhuang] [~muthunagappan] [~jinshang] FYI

[jira] [Created] (PARQUET-2231) [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY

2023-01-16 Thread Antoine Pitrou (Jira)
Antoine Pitrou created PARQUET-2231: --- Summary: [Format] Encoding spec incorrect for DELTA_BYTE_ARRAY Key: PARQUET-2231 URL: https://issues.apache.org/jira/browse/PARQUET-2231 Project: Parquet

[jira] [Commented] (PARQUET-152) Encoding issue with fixed length byte arrays

2023-01-16 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677297#comment-17677297 ] Antoine Pitrou commented on PARQUET-152: It would be nice if the encodings spec had been updated

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-01-04 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654524#comment-17654524 ] Antoine Pitrou commented on PARQUET-: - cc [~julienledem] [~pnarang] [~rdblue]

[jira] [Created] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-01-04 Thread Antoine Pitrou (Jira)
Antoine Pitrou created PARQUET-: --- Summary: [Format] RLE encoding spec incorrect for v2 data pages Key: PARQUET- URL: https://issues.apache.org/jira/browse/PARQUET- Project: Parquet

[jira] [Resolved] (PARQUET-2218) [Format] Clarify CRC computation

2023-01-03 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2218. - Resolution: Fixed Fixed by PR https://github.com/apache/parquet-format/pull/188 >

[jira] [Commented] (PARQUET-2221) [Format] Encoding spec incorrect for dictionary fallback

2023-01-03 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654025#comment-17654025 ] Antoine Pitrou commented on PARQUET-2221: - cc [~julienledem] [~pnarang] [~rdblue]

[jira] [Updated] (PARQUET-52) Improve the encoding fall back mechanism for Parquet 2.0

2023-01-03 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-52: -- Description: https://github.com/apache/incubator-parquet-mr/pull/74 -> moved to

[jira] [Created] (PARQUET-2221) [Format] Encoding spec incorrect for dictionary fallback

2023-01-03 Thread Antoine Pitrou (Jira)
Antoine Pitrou created PARQUET-2221: --- Summary: [Format] Encoding spec incorrect for dictionary fallback Key: PARQUET-2221 URL: https://issues.apache.org/jira/browse/PARQUET-2221 Project: Parquet

[jira] [Updated] (PARQUET-796) Delta Encoding is not used when dictionary enabled

2023-01-03 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-796: --- Priority: Major (was: Critical) > Delta Encoding is not used when dictionary enabled >

[jira] [Updated] (PARQUET-2218) [Format] Clarify CRC computation

2022-12-13 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2218: Description: The format spec on CRC checksumming felt ambiguous when trying to implement

[jira] [Created] (PARQUET-2218) [Format] Clarify CRC computation

2022-12-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created PARQUET-2218: --- Summary: [Format] Clarify CRC computation Key: PARQUET-2218 URL: https://issues.apache.org/jira/browse/PARQUET-2218 Project: Parquet Issue Type:

[jira] [Commented] (PARQUET-1629) Page-level CRC checksum verification for DataPageV2

2022-12-13 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646612#comment-17646612 ] Antoine Pitrou commented on PARQUET-1629: - [~mwish] for the record. Perhaps you would be

[jira] [Resolved] (PARQUET-2204) TypedColumnReaderImpl::Skip should reuse scratch space

2022-12-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2204. - Fix Version/s: cpp-11.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Assigned] (PARQUET-2204) TypedColumnReaderImpl::Skip should reuse scratch space

2022-12-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2204: --- Assignee: fatemah > TypedColumnReaderImpl::Skip should reuse scratch space >

[jira] [Updated] (PARQUET-2204) TypedColumnReaderImpl::Skip should reuse scratch space

2022-12-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2204: Component/s: parquet-cpp > TypedColumnReaderImpl::Skip should reuse scratch space >

[jira] [Updated] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2022-12-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1222: Fix Version/s: format-2.10.0 > Specify a well-defined sorting order for float and double

[jira] [Resolved] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2022-12-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-1222. - Resolution: Fixed > Specify a well-defined sorting order for float and double types >

[jira] [Assigned] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2022-12-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-1222: --- Assignee: Micah Kornfield > Specify a well-defined sorting order for float and

[jira] [Assigned] (PARQUET-2215) Document how DELTA_BINARY_PACKED handles overflow for deltas

2022-11-23 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2215: --- Assignee: Antoine Pitrou > Document how DELTA_BINARY_PACKED handles overflow for

[jira] [Resolved] (PARQUET-2206) Microbenchmark for ColumnReadaer ReadBatch and Skip

2022-11-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2206. - Fix Version/s: cpp-11.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Updated] (PARQUET-2206) Microbenchmark for ColumnReadaer ReadBatch and Skip

2022-11-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2206: Component/s: parquet-cpp > Microbenchmark for ColumnReadaer ReadBatch and Skip >

[jira] [Assigned] (PARQUET-2206) Microbenchmark for ColumnReadaer ReadBatch and Skip

2022-11-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2206: --- Assignee: fatemah > Microbenchmark for ColumnReadaer ReadBatch and Skip >

[jira] [Updated] (PARQUET-2210) Skip pages based on header metadata using a callback

2022-11-09 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2210: Component/s: parquet-cpp > Skip pages based on header metadata using a callback >

[jira] [Updated] (PARQUET-2210) [C++] Skip pages based on header metadata using a callback

2022-11-09 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2210: Summary: [C++] Skip pages based on header metadata using a callback (was: Skip pages

[jira] [Assigned] (PARQUET-2211) [C++] Print ColumnMetaData.encoding_stats field

2022-11-06 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2211: --- Assignee: Gang Wu > [C++] Print ColumnMetaData.encoding_stats field >

[jira] [Resolved] (PARQUET-2211) [C++] Print ColumnMetaData.encoding_stats field

2022-11-06 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2211. - Fix Version/s: cpp-11.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Assigned] (PARQUET-2209) [C++] Optimize skip for the case that number of values to skip equals page size

2022-11-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2209: --- Assignee: fatemah > [C++] Optimize skip for the case that number of values to

[jira] [Resolved] (PARQUET-2209) [C++] Optimize skip for the case that number of values to skip equals page size

2022-11-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2209. - Fix Version/s: cpp-11.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Assigned] (PARQUET-2188) Add SkipRecords API to RecordReader

2022-10-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2188: --- Assignee: fatemah > Add SkipRecords API to RecordReader >

[jira] [Resolved] (PARQUET-2188) Add SkipRecords API to RecordReader

2022-10-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2188. - Fix Version/s: cpp-11.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Updated] (PARQUET-2209) [C++] Optimize skip for the case that number of values to skip equals page size

2022-10-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2209: Summary: [C++] Optimize skip for the case that number of values to skip equals page size

[jira] [Updated] (PARQUET-2209) [C++] Optimize skip for the case that number of values to skip equals page size

2022-10-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2209: Component/s: parquet-cpp > [C++] Optimize skip for the case that number of values to

[jira] [Updated] (PARQUET-1646) [C++] Use arrow::Buffer for buffered dictionary indices in DictEncoder instead of std::vector

2022-10-26 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1646: Fix Version/s: cpp-11.0.0 (was: cpp-10.0.0) > [C++] Use

[jira] [Updated] (PARQUET-2099) [C++] Statistics::num_values() is misleading

2022-10-26 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2099: Fix Version/s: cpp-11.0.0 (was: cpp-10.0.0) > [C++]

[jira] [Assigned] (PARQUET-2179) Add a test for skipping repeated fields

2022-10-18 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2179: --- Assignee: fatemah > Add a test for skipping repeated fields >

[jira] [Resolved] (PARQUET-2179) Add a test for skipping repeated fields

2022-10-18 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2179. - Fix Version/s: cpp-10.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2022-09-30 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611445#comment-17611445 ] Antoine Pitrou commented on PARQUET-1222: - I agree with [~gszadovszky] for elevating these

[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2022-09-30 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611444#comment-17611444 ] Antoine Pitrou commented on PARQUET-1222: - (side note: the ML is mostly a firehose of

[jira] [Assigned] (PARQUET-2187) Add Parquet file containing a boolean column with RLE encoding to paquet

2022-09-29 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned PARQUET-2187: --- Assignee: Nishanth > Add Parquet file containing a boolean column with RLE

[jira] [Resolved] (PARQUET-2187) Add Parquet file containing a boolean column with RLE encoding to paquet

2022-09-29 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2187. - Resolution: Fixed > Add Parquet file containing a boolean column with RLE encoding to

[jira] [Created] (PARQUET-2186) [Java] parquet-mr fails compiling

2022-09-12 Thread Antoine Pitrou (Jira)
Antoine Pitrou created PARQUET-2186: --- Summary: [Java] parquet-mr fails compiling Key: PARQUET-2186 URL: https://issues.apache.org/jira/browse/PARQUET-2186 Project: Parquet Issue Type: Bug

[jira] [Updated] (PARQUET-2182) Handle unknown logical types

2022-09-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2182: Component/s: parquet-mr > Handle unknown logical types > >

[jira] [Updated] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2022-08-29 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-758: --- Summary: [Format] HALF precision FLOAT Logical type (was: HALF precision FLOAT Logical

[jira] [Updated] (PARQUET-1158) [C++] Basic RowGroup filtering

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1158: Fix Version/s: (was: cpp-9.0.0) > [C++] Basic RowGroup filtering >

[jira] [Updated] (PARQUET-1430) [C++] Add tests for C++ tools

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1430: Fix Version/s: (was: cpp-9.0.0) > [C++] Add tests for C++ tools >

[jira] [Updated] (PARQUET-1199) [C++] Support writing (and test reading) boolean values with RLE encoding

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1199: Fix Version/s: (was: cpp-9.0.0) > [C++] Support writing (and test reading) boolean

[jira] [Updated] (PARQUET-1515) [C++] Disable LZ4 codec

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1515: Fix Version/s: (was: cpp-9.0.0) > [C++] Disable LZ4 codec > ---

[jira] [Updated] (PARQUET-1614) [C++] Reuse arrow::Buffer used as scratch space for decryption in Thrift deserialization hot path

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1614: Fix Version/s: (was: cpp-9.0.0) > [C++] Reuse arrow::Buffer used as scratch space

[jira] [Updated] (PARQUET-1634) [C++] Factor out data/dictionary page writes to allow for page buffering

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1634: Fix Version/s: (was: cpp-9.0.0) > [C++] Factor out data/dictionary page writes to

[jira] [Updated] (PARQUET-1646) [C++] Use arrow::Buffer for buffered dictionary indices in DictEncoder instead of std::vector

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1646: Fix Version/s: cpp-10.0.0 (was: cpp-9.0.0) > [C++] Use

[jira] [Updated] (PARQUET-1653) [C++] Deprecated BIT_PACKED level decoding is probably incorrect

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1653: Fix Version/s: (was: cpp-9.0.0) > [C++] Deprecated BIT_PACKED level decoding is

[jira] [Updated] (PARQUET-1657) [C++] Change Bloom filter implementation to use xxhash

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1657: Fix Version/s: (was: cpp-9.0.0) > [C++] Change Bloom filter implementation to use

[jira] [Updated] (PARQUET-1814) [C++] TestInt96ParquetIO failure on Windows

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1814: Fix Version/s: (was: cpp-9.0.0) > [C++] TestInt96ParquetIO failure on Windows >

[jira] [Updated] (PARQUET-1859) [C++] Require error message when using ParquetException::EofException

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1859: Fix Version/s: (was: cpp-9.0.0) > [C++] Require error message when using

[jira] [Updated] (PARQUET-2099) [C++] Statistics::num_values() is misleading

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2099: Fix Version/s: cpp-10.0.0 (was: cpp-9.0.0) > [C++]

[jira] [Updated] (PARQUET-1416) [C++] Deprecate parquet/api/* in favor of simpler public API "parquet/api.h"

2022-08-22 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1416: Fix Version/s: (was: cpp-9.0.0) > [C++] Deprecate parquet/api/* in favor of simpler

[jira] [Resolved] (PARQUET-2124) Bad DCHECK For Intermixed Dictionary Encoding

2022-02-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2124. - Fix Version/s: cpp-8.0.0 Resolution: Fixed Issue resolved by pull request 12427

[jira] [Resolved] (PARQUET-2123) Invalid memory access in ScanFileContents

2022-02-15 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2123. - Fix Version/s: cpp-8.0.0 Resolution: Fixed Issue resolved by pull request 12423

[jira] [Resolved] (PARQUET-2119) Parquet CPP DeltaBitPackDecoder Check Failure

2022-02-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2119. - Fix Version/s: cpp-8.0.0 Resolution: Fixed Issue resolved by pull request 12365

[jira] [Updated] (PARQUET-1614) [C++] Reuse arrow::Buffer used as scratch space for decryption in Thrift deserialization hot path

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1614: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++] Reuse

[jira] [Updated] (PARQUET-1657) [C++] Change Bloom filter implementation to use xxhash

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1657: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++] Change Bloom

[jira] [Updated] (PARQUET-1199) [C++] Support writing (and test reading) boolean values with RLE encoding

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1199: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++] Support writing

[jira] [Updated] (PARQUET-1814) [C++] TestInt96ParquetIO failure on Windows

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1814: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++]

[jira] [Updated] (PARQUET-1430) [C++] Add tests for C++ tools

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1430: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++] Add tests for

[jira] [Updated] (PARQUET-1646) [C++] Use arrow::Buffer for buffered dictionary indices in DictEncoder instead of std::vector

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1646: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++] Use

[jira] [Updated] (PARQUET-2099) [C++] Statistics::num_values() is misleading

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2099: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++]

[jira] [Updated] (PARQUET-1634) [C++] Factor out data/dictionary page writes to allow for page buffering

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1634: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++] Factor out

[jira] [Updated] (PARQUET-1653) [C++] Deprecated BIT_PACKED level decoding is probably incorrect

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1653: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++] Deprecated

[jira] [Updated] (PARQUET-1859) [C++] Require error message when using ParquetException::EofException

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1859: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++] Require error

[jira] [Updated] (PARQUET-2118) [C++] thift_internal.h assumes shared_ptr type in some cases

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2118: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > [C++]

[jira] [Resolved] (PARQUET-2118) [C++] thift_internal.h assumes shared_ptr type in some cases

2022-02-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved PARQUET-2118. - Fix Version/s: cpp-7.0.0 Resolution: Fixed Issue resolved by pull request 12349

[jira] [Updated] (PARQUET-490) [C++] Incorporate DELTA_BINARY_PACKED value encoder into library and add unit tests

2022-01-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-490: --- Fix Version/s: (was: cpp-6.0.0) > [C++] Incorporate DELTA_BINARY_PACKED value encoder

[jira] [Reopened] (PARQUET-490) [C++] Incorporate DELTA_BINARY_PACKED value encoder into library and add unit tests

2022-01-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reopened PARQUET-490: Assignee: (was: Shan Huang) > [C++] Incorporate DELTA_BINARY_PACKED value encoder

[jira] [Commented] (PARQUET-490) [C++] Incorporate DELTA_BINARY_PACKED value encoder into library and add unit tests

2022-01-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17484609#comment-17484609 ] Antoine Pitrou commented on PARQUET-490: [~Bkief] Hmm, sorry. I went a bit overboard when

[jira] [Updated] (PARQUET-2115) Parquet Cpp Crash on Invalid Dictionary Bit Width

2022-01-31 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-2115: Fix Version/s: cpp-8.0.0 (was: cpp-7.0.0) > Parquet Cpp Crash on

  1   2   3   4   >