[jira] [Updated] (PARQUET-2343) Fixes NPE when rewriting file with multiple rowgroups

2023-09-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2343: - Fix Version/s: 1.14.0 > Fixes NPE when rewriting file with multiple rowgroups >

[jira] [Assigned] (PARQUET-2343) Fixes NPE when rewriting file with multiple rowgroups

2023-09-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2343: Assignee: Xianyang Liu > Fixes NPE when rewriting file with multiple rowgroups >

[jira] [Updated] (PARQUET-2344) Bump to Thirft 0.19.0

2023-09-04 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2344: - Fix Version/s: format-2.10.0 > Bump to Thirft 0.19.0 > - > > Key:

[jira] [Assigned] (PARQUET-2344) Bump to Thirft 0.19.0

2023-09-04 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2344: Assignee: Fokko Driesprong > Bump to Thirft 0.19.0 > - > >

[jira] [Created] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-11 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2346: Summary: Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9 Key: PARQUET-2346 URL: https://issues.apache.org/jira/browse/PARQUET-2346 Project: Parquet Issue Type:

[jira] [Commented] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764740#comment-17764740 ] Gang Wu commented on PARQUET-2346: -- Do you have any suggestion? TBH, I am not familiar with this issue

[jira] [Resolved] (PARQUET-2343) Fixes NPE when rewriting file with multiple rowgroups

2023-09-07 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2343. -- Resolution: Fixed > Fixes NPE when rewriting file with multiple rowgroups >

[jira] [Updated] (PARQUET-2343) Fixes NPE when rewriting file with multiple rowgroups

2023-09-07 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2343: - Fix Version/s: 1.13.2 > Fixes NPE when rewriting file with multiple rowgroups >

[jira] [Assigned] (PARQUET-2342) Parquet writer produced a corrupted file due to page value count overflow

2023-08-31 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2342: Assignee: Zamil Majdy > Parquet writer produced a corrupted file due to page value count

[jira] [Resolved] (PARQUET-2342) Parquet writer produced a corrupted file due to page value count overflow

2023-08-31 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2342. -- Fix Version/s: 1.14.0 Resolution: Fixed > Parquet writer produced a corrupted file due to

[jira] [Commented] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764933#comment-17764933 ] Gang Wu commented on PARQUET-2346: -- Thanks for the information! Probably it is not a good time to make

[jira] [Resolved] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2346. -- Resolution: Won't Do > Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9 >

[jira] [Closed] (PARQUET-2346) Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9

2023-09-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu closed PARQUET-2346. > Bump org.slf4j:slf4j-api from 1.7.12 to 2.0.9 > - > >

[jira] [Resolved] (PARQUET-2288) Bump exec-maven-plugin from 1.2.1 to 3.1.0

2023-08-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2288. -- Resolution: Fixed > Bump exec-maven-plugin from 1.2.1 to 3.1.0 >

[jira] [Updated] (PARQUET-2288) Bump exec-maven-plugin from 1.2.1 to 3.1.0

2023-08-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2288: - Affects Version/s: (was: 1.13.0) > Bump exec-maven-plugin from 1.2.1 to 3.1.0 >

[jira] [Updated] (PARQUET-2288) Bump exec-maven-plugin from 1.2.1 to 3.1.0

2023-08-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2288: - Fix Version/s: format-2.10.0 (was: 1.14.0) > Bump exec-maven-plugin from 1.2.1

[jira] [Commented] (PARQUET-2345) The Parquet Spec doesn't specify whether multiple columns are allowed to have the same name.

2023-09-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763125#comment-17763125 ] Gang Wu commented on PARQUET-2345: -- I didn't find any statement to disallow identical field names in

[jira] [Resolved] (PARQUET-2363) ParquetRewriter should encrypt the V2 page header

2023-10-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2363. -- Fix Version/s: 1.14.0 Assignee: Xianyang Liu Resolution: Fixed > ParquetRewriter

[jira] [Updated] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

2023-10-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2347: - Fix Version/s: 1.14.0 > Add interface layer between Parquet and Hadoop Configuration >

[jira] [Assigned] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

2023-10-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2347: Assignee: Atour Mousavi Gourabi > Add interface layer between Parquet and Hadoop Configuration

[jira] [Resolved] (PARQUET-2371) Resolve japicmp failure for CI

2023-11-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2371. -- Fix Version/s: 1.14.0 Assignee: Atour Mousavi Gourabi Resolution: Fixed > Resolve

[jira] [Resolved] (PARQUET-2366) Optimize random seek during rewriting

2023-10-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2366. -- Fix Version/s: 1.14.0 Assignee: Xianyang Liu Resolution: Fixed > Optimize random

[jira] [Resolved] (PARQUET-2347) Add interface layer between Parquet and Hadoop Configuration

2023-10-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2347. -- Resolution: Fixed > Add interface layer between Parquet and Hadoop Configuration >

[jira] [Resolved] (PARQUET-2365) Fixes NPE when rewriting column without column index

2023-11-04 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2365. -- Fix Version/s: 1.14.0 Assignee: Xianyang Liu Resolution: Fixed > Fixes NPE when

[jira] [Resolved] (PARQUET-2361) Reduce failure rate of unit test testParquetFileWithBloomFilterWithFpp

2023-10-18 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2361. -- Fix Version/s: 1.14.0 Assignee: Feng Jiajie Resolution: Fixed > Reduce failure rate

[jira] [Commented] (PARQUET-2352) Update parquet format spec to allow truncation of row group min/max stats

2023-09-19 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766995#comment-17766995 ] Gang Wu commented on PARQUET-2352: -- Thanks for opening the issue! Format change is not an easy topic

[jira] [Resolved] (PARQUET-2354) Apparent race condition in CharsetValidator

2023-09-28 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2354. -- Resolution: Fixed > Apparent race condition in CharsetValidator >

[jira] [Assigned] (PARQUET-2354) Apparent race condition in CharsetValidator

2023-09-28 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2354: Assignee: Piotr Findeisen > Apparent race condition in CharsetValidator >

[jira] [Updated] (PARQUET-2354) Apparent race condition in CharsetValidator

2023-09-28 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2354: - Fix Version/s: 1.14.0 > Apparent race condition in CharsetValidator >

[jira] [Resolved] (PARQUET-2348) Recompression/Re-encrypt should rewrite bloomfilter

2023-10-11 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2348. -- Fix Version/s: 1.14.0 Resolution: Fixed > Recompression/Re-encrypt should rewrite bloomfilter

[jira] [Assigned] (PARQUET-2348) Recompression/Re-encrypt should rewrite bloomfilter

2023-10-11 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2348: Assignee: Xianyang Liu > Recompression/Re-encrypt should rewrite bloomfilter >

[jira] [Resolved] (PARQUET-2358) Upgrade japicmp-maven-plugin to 0.16.0

2023-10-11 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2358. -- Resolution: Fixed > Upgrade japicmp-maven-plugin to 0.16.0 > --

[jira] [Commented] (PARQUET-2367) NegativeArraySizeException on read for parquet files written with large strings in some cases

2023-10-17 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776453#comment-17776453 ] Gang Wu commented on PARQUET-2367: -- Thanks for reporting this! I see the configs involve writing.

[jira] [Resolved] (PARQUET-2352) Update parquet format spec to allow truncation of row group min/max stats

2023-10-18 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2352. -- Fix Version/s: format-2.10.0 Assignee: Raunaq Morarka Resolution: Fixed > Update

[jira] [Resolved] (PARQUET-2362) Clarify parquet encoding

2023-10-14 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2362. -- Fix Version/s: format-2.10.0 Assignee: Letian Jiang Resolution: Fixed > Clarify

[jira] [Resolved] (PARQUET-2357) Modest refactor of CapacityByteArrayOutputStream

2023-10-14 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2357. -- Assignee: Feng Jiajie Resolution: Fixed > Modest refactor of CapacityByteArrayOutputStream >

[jira] [Assigned] (PARQUET-2349) Move from deprecated BytesCompressor/Decompressor to BytesInputCompressor/Decompressor

2023-10-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2349: Assignee: Atour Mousavi Gourabi > Move from deprecated BytesCompressor/Decompressor to >

[jira] [Resolved] (PARQUET-2349) Move from deprecated BytesCompressor/Decompressor to BytesInputCompressor/Decompressor

2023-10-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2349. -- Fix Version/s: 1.14.0 Resolution: Fixed > Move from deprecated BytesCompressor/Decompressor

[jira] [Commented] (PARQUET-2340) appendRowGroup will loose pageIndex

2023-08-22 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757769#comment-17757769 ] Gang Wu commented on PARQUET-2340: -- Do you have any special handling that ParquetRewriter cannot do?

[jira] [Resolved] (PARQUET-2333) Support bzip2 and xz compressions in the to-avro subcommand

2023-08-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2333. -- Fix Version/s: 1.14.0 Resolution: Fixed > Support bzip2 and xz compressions in the to-avro

[jira] [Commented] (PARQUET-2340) appendRowGroup will loose pageIndex

2023-08-22 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757506#comment-17757506 ] Gang Wu commented on PARQUET-2340: -- [~NathanKan] You may be interested in the method

[jira] [Commented] (PARQUET-2339) ArrayIndexOutOfBounds exception writing parquet from Avro in Apache Hudi

2023-08-21 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757176#comment-17757176 ] Gang Wu commented on PARQUET-2339: -- The config above uses three level list instead of the legacy two

[jira] [Resolved] (PARQUET-2372) Avoid unnecessary reading of RowGroup data during rewriting

2023-11-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2372. -- Fix Version/s: 1.14.0 Assignee: Xianyang Liu Resolution: Fixed > Avoid unnecessary

[jira] [Created] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2196: Summary: Support LZ4_RAW codec Key: PARQUET-2196 URL: https://issues.apache.org/jira/browse/PARQUET-2196 Project: Parquet Issue Type: Improvement

[jira] [Updated] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2196: - Description: There is a long history about the LZ4 interoperability of parquet files between

[jira] [Updated] (PARQUET-2196) Support LZ4_RAW codec

2022-09-27 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2196: - Description: There is a long history about the LZ4 interoperability of parquet files between

[jira] [Created] (PARQUET-2195) Add scan command to parquet-cli

2022-09-23 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2195: Summary: Add scan command to parquet-cli Key: PARQUET-2195 URL: https://issues.apache.org/jira/browse/PARQUET-2195 Project: Parquet Issue Type: Improvement

[jira] [Assigned] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2023-01-08 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2219: Assignee: Gang Wu > ParquetFileReader throws a runtime exception when a file contains only >

[jira] [Commented] (PARQUET-2221) [Format] Encoding spec incorrect for dictionary fallback

2023-01-03 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654256#comment-17654256 ] Gang Wu commented on PARQUET-2221: -- IMHO, the specs is authoritative to the reader implementation to

[jira] [Assigned] (PARQUET-2075) Unified Rewriter Tool

2022-12-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2075: Assignee: Gang Wu (was: Xinli Shang) > Unified Rewriter Tool > --- > >

[jira] [Commented] (PARQUET-2075) Unified Rewriter Tool

2022-12-09 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645349#comment-17645349 ] Gang Wu commented on PARQUET-2075: -- As discussed offline, I will work on it. So I just changed the

[jira] [Resolved] (PARQUET-2196) Support LZ4_RAW codec

2022-12-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2196. -- Resolution: Resolved > Support LZ4_RAW codec > - > > Key:

[jira] [Assigned] (PARQUET-2196) Support LZ4_RAW codec

2022-12-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2196: Assignee: Gang Wu > Support LZ4_RAW codec > - > > Key:

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2022-12-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648342#comment-17648342 ] Gang Wu commented on PARQUET-2219: -- According to the error message, it seems that empty row group is

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

2022-12-15 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648350#comment-17648350 ] Gang Wu commented on PARQUET-2219: -- cc [~emkornfield] > ParquetFileReader throws a runtime exception

[jira] [Commented] (PARQUET-1404) [C++] Add index pages to the format to support efficient page skipping to parquet-cpp

2022-12-01 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641967#comment-17641967 ] Gang Wu commented on PARQUET-1404: -- Hi [~mdeepak], I am working onĀ 

[jira] [Assigned] (PARQUET-1404) [C++] Add index pages to the format to support efficient page skipping to parquet-cpp

2022-12-01 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-1404: Assignee: Gang Wu (was: Deepak Majeti) > [C++] Add index pages to the format to support

[jira] [Commented] (PARQUET-1404) [C++] Add index pages to the format to support efficient page skipping to parquet-cpp

2022-12-01 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641983#comment-17641983 ] Gang Wu commented on PARQUET-1404: -- Hi [~encodedgeek], I will definitely go through your change as

[jira] [Created] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-01-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2228: Summary: ParquetRewriter supports more than one input file Key: PARQUET-2228 URL: https://issues.apache.org/jira/browse/PARQUET-2228 Project: Parquet Issue Type:

[jira] [Created] (PARQUET-2229) ParquetRewriter supports masking and encrypting the same column

2023-01-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2229: Summary: ParquetRewriter supports masking and encrypting the same column Key: PARQUET-2229 URL: https://issues.apache.org/jira/browse/PARQUET-2229 Project: Parquet

[jira] [Created] (PARQUET-2230) Add a new rewrite command powered by ParquetRewriter

2023-01-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2230: Summary: Add a new rewrite command powered by ParquetRewriter Key: PARQUET-2230 URL: https://issues.apache.org/jira/browse/PARQUET-2230 Project: Parquet Issue Type:

[jira] [Created] (PARQUET-2227) Refactor different file rewriters to use single implementation

2023-01-14 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2227: Summary: Refactor different file rewriters to use single implementation Key: PARQUET-2227 URL: https://issues.apache.org/jira/browse/PARQUET-2227 Project: Parquet

[jira] [Commented] (PARQUET-1622) Add BYTE_STREAM_SPLIT encoding

2023-01-17 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678056#comment-17678056 ] Gang Wu commented on PARQUET-1622: -- The issue raised by [~mwish] above may also exist in the

[jira] [Comment Edited] (PARQUET-1622) Add BYTE_STREAM_SPLIT encoding

2023-01-17 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678056#comment-17678056 ] Gang Wu edited comment on PARQUET-1622 at 1/18/23 3:05 AM: --- The issue raised

[jira] [Commented] (PARQUET-2233) Parquet Travis CI jobs to be turned off February 15th

2023-01-24 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680439#comment-17680439 ] Gang Wu commented on PARQUET-2233: -- I see there is a comment in the travis yaml file saying that the

[jira] [Resolved] (PARQUET-2227) Refactor different file rewriters to use single implementation

2023-01-29 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2227. -- Resolution: Fixed > Refactor different file rewriters to use single implementation >

[jira] [Created] (PARQUET-2211) [C++] Print ColumnMetaData.encoding_stats field

2022-11-01 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2211: Summary: [C++] Print ColumnMetaData.encoding_stats field Key: PARQUET-2211 URL: https://issues.apache.org/jira/browse/PARQUET-2211 Project: Parquet Issue Type:

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-07 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697463#comment-17697463 ] Gang Wu commented on PARQUET-2254: -- Here are two questions: 1) creating bloom filters without explicit

[jira] [Commented] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699694#comment-17699694 ] Gang Wu commented on PARQUET-2256: -- Apache ORC supports compression of bloom filter. It would be nice

[jira] [Created] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

2023-03-13 Thread Gang Wu (Jira)
Gang Wu created PARQUET-2257: Summary: [Format] Add bloom_filter_length to ColumnMetaData Key: PARQUET-2257 URL: https://issues.apache.org/jira/browse/PARQUET-2257 Project: Parquet Issue Type:

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699686#comment-17699686 ] Gang Wu commented on PARQUET-2255: -- These are good questions. Let me try to answer them from the

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699691#comment-17699691 ] Gang Wu commented on PARQUET-2255: -- cc [~gszadovszky] [~emkornfi...@gmail.com] > BloomFilter and

[jira] [Updated] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2256: - Component/s: (was: parquet-cpp) > Adding Compression for BloomFilter >

[jira] [Updated] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-13 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2256: - Component/s: parquet-format > Adding Compression for BloomFilter > --

[jira] [Resolved] (PARQUET-2202) Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte

2023-03-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2202. -- Resolution: Fixed > Redundant String allocation on the hot path in >

[jira] [Resolved] (PARQUET-2164) CapacityByteArrayOutputStream overflow while writing causes negative row group sizes to be written

2023-03-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2164. -- Fix Version/s: (was: 1.12.3) Resolution: Fixed > CapacityByteArrayOutputStream overflow

[jira] [Resolved] (PARQUET-2103) crypto exception in print toPrettyJSON

2023-03-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2103. -- Resolution: Fixed > crypto exception in print toPrettyJSON > --

[jira] [Resolved] (PARQUET-2185) ParquetReader constructed using builder fails to read encrypted files

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2185. -- Resolution: Fixed > ParquetReader constructed using builder fails to read encrypted files >

[jira] [Resolved] (PARQUET-2197) Document uniform encryption

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2197. -- Resolution: Fixed > Document uniform encryption > --- > >

[jira] [Resolved] (PARQUET-2154) ParquetFileReader should close its input stream when `filterRowGroups` throw Exception in constructor

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2154. -- Resolution: Fixed > ParquetFileReader should close its input stream when `filterRowGroups` throw >

[jira] [Resolved] (PARQUET-2224) Publish SBOM artifacts

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2224. -- Resolution: Fixed > Publish SBOM artifacts > -- > > Key:

[jira] [Resolved] (PARQUET-2161) Row positions are computed incorrectly when range or offset metadata filter is used

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2161. -- Resolution: Fixed > Row positions are computed incorrectly when range or offset metadata filter >

[jira] [Resolved] (PARQUET-2155) Upgrade protobuf version to 3.17.3

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2155. -- Assignee: Chao Sun Resolution: Fixed > Upgrade protobuf version to 3.17.3 >

[jira] [Resolved] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2134. -- Resolution: Fixed > Incorrect type checking in HadoopStreams.wrap >

[jira] [Resolved] (PARQUET-2138) Add ShowBloomFilterCommand to parquet-cli

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2138. -- Resolution: Fixed > Add ShowBloomFilterCommand to parquet-cli >

[jira] [Resolved] (PARQUET-2159) Parquet bit-packing de/encode optimization

2023-03-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2159. -- Fix Version/s: (was: 1.13.0) Resolution: Fixed > Parquet bit-packing de/encode

[jira] [Updated] (PARQUET-2252) Make some methods public to allow external projects to implement page skipping

2023-03-25 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu updated PARQUET-2252: - Issue Type: Improvement (was: New Feature) > Make some methods public to allow external projects to

[jira] [Resolved] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-1711. -- Resolution: Fixed > [parquet-protobuf] stack overflow when work with well known json type >

[jira] [Resolved] (PARQUET-2176) Parquet writers should allow for configurable index/statistics truncation

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2176. -- Resolution: Fixed > Parquet writers should allow for configurable index/statistics truncation >

[jira] [Resolved] (PARQUET-2169) Upgrade Avro to version 1.11.1

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2169. -- Resolution: Fixed > Upgrade Avro to version 1.11.1 > -- > >

[jira] [Resolved] (PARQUET-2167) CLI show footer command fails if Parquet file contains date fields

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2167. -- Resolution: Fixed > CLI show footer command fails if Parquet file contains date fields >

[jira] [Resolved] (PARQUET-2192) Add Java 17 build test to GitHub action

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2192. -- Resolution: Fixed > Add Java 17 build test to GitHub action >

[jira] [Resolved] (PARQUET-2191) Upgrade Scala to 2.12.17

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2191. -- Resolution: Fixed > Upgrade Scala to 2.12.17 > > > Key:

[jira] [Resolved] (PARQUET-2177) Fix parquet-cli not to fail showing descriptions

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2177. -- Resolution: Fixed > Fix parquet-cli not to fail showing descriptions >

[jira] [Resolved] (PARQUET-2195) Add scan command to parquet-cli

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2195. -- Resolution: Fixed > Add scan command to parquet-cli > --- > >

[jira] [Resolved] (PARQUET-2208) Add details to nested column encryption config doc and exception text

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2208. -- Assignee: Gidon Gershinsky Resolution: Fixed > Add details to nested column encryption config

[jira] [Resolved] (PARQUET-2198) Vulnerabilities in jackson-databind

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu resolved PARQUET-2198. -- Resolution: Fixed > Vulnerabilities in jackson-databind > --- > >

[jira] [Assigned] (PARQUET-2195) Add scan command to parquet-cli

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2195: Assignee: Gang Wu > Add scan command to parquet-cli > --- > >

[jira] [Assigned] (PARQUET-2224) Publish SBOM artifacts

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Wu reassigned PARQUET-2224: Assignee: Dongjoon Hyun > Publish SBOM artifacts > -- > >

[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-26 Thread Gang Wu (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705145#comment-17705145 ] Gang Wu commented on PARQUET-2224: -- Thanks for reminding me. I have assigned it to you. [~dongjoon]

  1   2   3   4   >