Re: [VOTE] Release Apache Parquet 1.12.4 RC0

2023-03-28 Thread Chao Sun
+1 (non-binding). Verified checksum & signature, and ran all the tests locally. Thanks Gang! On Tue, Mar 28, 2023 at 9:37 AM Gidon Gershinsky wrote: > > +1 > > Verified signature and ran the tests. Thanks Gang and all contributors! > > Cheers, Gidon > > > On Tue, Mar 28, 2023 at 5:19 PM Xinli

[jira] [Created] (PARQUET-2213) Add an alternative InputFile.newStream that allow an input range

2022-11-10 Thread Chao Sun (Jira)
Chao Sun created PARQUET-2213: - Summary: Add an alternative InputFile.newStream that allow an input range Key: PARQUET-2213 URL: https://issues.apache.org/jira/browse/PARQUET-2213 Project: Parquet

[jira] [Created] (PARQUET-2203) Make ParquetReadOptions and HadoopReadOptions extendable

2022-10-24 Thread Chao Sun (Jira)
Chao Sun created PARQUET-2203: - Summary: Make ParquetReadOptions and HadoopReadOptions extendable Key: PARQUET-2203 URL: https://issues.apache.org/jira/browse/PARQUET-2203 Project: Parquet

Re: Fail to read back written large parquet file

2022-08-23 Thread Chao Sun
in my case was not with Parquet but my implementation of > the `OutputFile` wrapper providing `PositionOutputStream`. > > Would it make sense to do changes to the writer to crash on negative > offsets rather than continue and produce unreadable results. > > On Fri, Aug 5, 2022 at

Re: Fail to read back written large parquet file

2022-08-05 Thread Chao Sun
Seems the file was corrupted during write. There's a similar issue https://issues.apache.org/jira/browse/PARQUET-2164 we found recently. On Fri, Aug 5, 2022 at 3:40 AM Steve Loughran wrote: > > tha has to be an integer wraparound...something is using a signed int for > position, so when it goes

[jira] [Commented] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2022-08-05 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575952#comment-17575952 ] Chao Sun commented on PARQUET-2160: --- {quote} ... only it happens after the decompress call, may I ask

[jira] [Commented] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2022-08-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575579#comment-17575579 ] Chao Sun commented on PARQUET-2160: --- Hmm it does need to allocate extra heap memory and then read

[jira] [Commented] (PARQUET-2160) Close decompression stream to free off-heap memory in time

2022-08-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575487#comment-17575487 ] Chao Sun commented on PARQUET-2160: --- {quote} After I made this change to decompress, I found off-heap

[jira] [Updated] (PARQUET-2155) Upgrade protobuf version to 3.17.3

2022-07-20 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated PARQUET-2155: -- Summary: Upgrade protobuf version to 3.17.3 (was: Upgrade protobuf version to 3.20.1) > Upgr

[jira] [Created] (PARQUET-2155) Upgrade protobuf version to 3.20.1

2022-06-09 Thread Chao Sun (Jira)
Chao Sun created PARQUET-2155: - Summary: Upgrade protobuf version to 3.20.1 Key: PARQUET-2155 URL: https://issues.apache.org/jira/browse/PARQUET-2155 Project: Parquet Issue Type: Improvement

Re: [ANNOUNCEMENT] Gidon Gershinsky as Apache Parquet PMC

2021-11-24 Thread Chao Sun
Congratulations Gidon! On Wed, Nov 24, 2021 at 1:27 PM Xinli shang wrote: > Hi all, > > The Project Management Committee (PMC) for Apache Parquet has invited Gidon > Gershinsky to become a PMC member and we are pleased to announce that he > has accepted. > > Congratulations and welcome, Gidon!

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Chao Sun
+1 (non-binding). - tested on the Spark side and all tests passed, including the issue in SPARK-36696 - verified signature and checksum of the release Thanks Xinli for driving the release work! Chao On Tue, Sep 14, 2021 at 3:01 AM Gabor Szadovszky wrote: > Thanks for the new RC, Xinli. > >

[jira] [Commented] (PARQUET-2090) [C++] Parquet writes incorrect file_offset

2021-09-13 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414737#comment-17414737 ] Chao Sun commented on PARQUET-2090: --- Thanks [~emkornfield], you are right - I missed the comment

[jira] [Created] (PARQUET-2084) Upgrade Thrift to 0.14.2

2021-09-01 Thread Chao Sun (Jira)
Chao Sun created PARQUET-2084: - Summary: Upgrade Thrift to 0.14.2 Key: PARQUET-2084 URL: https://issues.apache.org/jira/browse/PARQUET-2084 Project: Parquet Issue Type: Improvement

[jira] [Created] (PARQUET-2083) Expose getFieldPath from ColumnIO

2021-08-31 Thread Chao Sun (Jira)
Chao Sun created PARQUET-2083: - Summary: Expose getFieldPath from ColumnIO Key: PARQUET-2083 URL: https://issues.apache.org/jira/browse/PARQUET-2083 Project: Parquet Issue Type: Improvement

Re: Any Parquet implementations might be impacted by PARQUET-2078

2021-08-31 Thread Chao Sun
Thanks Gabor. The Spark community is in the process of releasing Spark 3.2.0 with Parquet 1.12. Any idea when a new release will be available with the fix? we may need to hold off the Spark release for that. Chao On Mon, Aug 30, 2021 at 6:31 AM Gabor Szadovszky wrote: > It turned out that

[jira] [Created] (PARQUET-2061) Add a new API in `PageReadStore` to return row ranges directly

2021-06-28 Thread Chao Sun (Jira)
Chao Sun created PARQUET-2061: - Summary: Add a new API in `PageReadStore` to return row ranges directly Key: PARQUET-2061 URL: https://issues.apache.org/jira/browse/PARQUET-2061 Project: Parquet

[jira] [Created] (PARQUET-2052) Integer overflow when writing huge binary using dictionary encoding

2021-05-20 Thread Chao Sun (Jira)
Chao Sun created PARQUET-2052: - Summary: Integer overflow when writing huge binary using dictionary encoding Key: PARQUET-2052 URL: https://issues.apache.org/jira/browse/PARQUET-2052 Project: Parquet

[jira] [Created] (PARQUET-2050) Expose repetition & definition level from ColumnIO

2021-05-14 Thread Chao Sun (Jira)
Chao Sun created PARQUET-2050: - Summary: Expose repetition & definition level from ColumnIO Key: PARQUET-2050 URL: https://issues.apache.org/jira/browse/PARQUET-2050 Project: Parquet Issue

Re: [Announce] new committer: Gidon Gershinsky

2021-04-07 Thread Chao Sun
Congrats Gidon! On Wed, Apr 7, 2021 at 8:27 AM Micah Kornfield wrote: > Congrats Gidon, well deserved. > > On Wed, Apr 7, 2021 at 5:10 AM Nándor Kollár wrote: > > > Congrats Gidon! > > > > On 2021/04/07 11:55:45, Gabor Szadovszky wrote: > > > The Project Management Committee (PMC) for Apache

Re: [ANNOUNCE] New Parquet PMC member - Xinli Shang

2020-11-10 Thread Chao Sun
Congrats Xinli! On Tue, Nov 10, 2020 at 4:10 AM Gara Walid wrote: > Congrats Xinli! > > Cheers, > Walid > > On Tue, Nov 10, 2020, 7:50 AM Driesprong, Fokko > wrote: > > > Cool! Well deserved Xinli! > > > > Cheers, Fokko > > > > Op di 10 nov. 2020 om 07:35 schreef Gabor Szadovszky : > > > > >

Re: Add 'prune' and 'mask' tools to Parquet-tools/cli

2020-02-19 Thread Chao Sun
Bumping it up. Would love to get some feedback from community. Best, Chao On Sun, Feb 16, 2020 at 7:20 PM Xinli shang wrote: > Hi all, > > I am developing tools to prune or mask some Parquet file columns for > cost-saving or security & compliance purposes. I want to collect your > thoughts on

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Chao Sun
I'm +1 on the change for the Rust side as well. It probably won't be as disruptive as the C++ side. On Wed, May 29, 2019 at 7:09 AM Wes McKinney wrote: > I'm in favor of making the change -- it's slightly disruptive for > library-users, but the fix is no more complicated than a >

Re: [DISCUSS] Rust add adapter for parquet

2018-11-28 Thread Chao Sun
ticular, would be really useful within the existing > code base. > > > > Paddy > > > > Get Outlook for iOS > > > > From: Chao Sun > > Sent: Wednesday, November 21, 2018 2:42 PM > > To: d...@arrow.apache.org &

Re: [DISCUSS] Rust add adapter for parquet

2018-11-21 Thread Chao Sun
windows? I would be willing to help get windows support > working after the fact, although I know very little about parquet right now. > > > > Are there other strategies for dealing with this? > > > > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> f

Re: [DISCUSS] Rust add adapter for parquet

2018-11-21 Thread Chao Sun
>> On Wed, Nov 21, 2018 at 4:49 AM Andy Grove > wrote: > >> > >> > This sounds like a great idea. > >> > > >> > With support for both CSV and Parquet in the Arrow crate, it would be > nice > >> > to design a standard

Re: [DISCUSS] Rust add adapter for parquet

2018-11-20 Thread Chao Sun
ug 20, 2018 at 11:03 AM Renjie Liu > wrote: > > > > > Yes, it's a mistake, sorry for that > > > > > > > > > On Mon, Aug 20, 2018 at 10:57 AM Chao Sun wrote: > > > > > >> (s/flink/arrow - it is a mistake?) > > >> > > >> Thanks Renjie

Re: [VOTE] Accept donation of Parquet Rust implementation

2018-03-26 Thread Chao Sun
Hi, sorry for jumping in here. Is there any update on this vote? Thanks. Best, Chao On Fri, Mar 16, 2018 at 2:50 PM, Lars Volker wrote: > +1 (non-binding) > > On Fri, Mar 16, 2018 at 9:19 AM, Daniel Weeks > wrote: > > > +1 > > > > On Tue, Mar 6,

[jira] [Commented] (PARQUET-1249) Clarify encoding schemes for boolean types

2018-03-23 Thread Chao Sun (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411943#comment-16411943 ] Chao Sun commented on PARQUET-1249: --- Thanks! > Clarify encoding schemes for boolean ty

[jira] [Assigned] (PARQUET-1249) Clarify encoding schemes for boolean types

2018-03-23 Thread Chao Sun (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned PARQUET-1249: - Assignee: Chao Sun > Clarify encoding schemes for boolean ty

[jira] [Commented] (PARQUET-1249) Clarify encoding schemes for boolean types

2018-03-23 Thread Chao Sun (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411939#comment-16411939 ] Chao Sun commented on PARQUET-1249: --- Trying to make my first contribution here. Can someone add me

[jira] [Created] (PARQUET-1249) Clarify encoding schemes for boolean types

2018-03-17 Thread Chao Sun (JIRA)
Chao Sun created PARQUET-1249: - Summary: Clarify encoding schemes for boolean types Key: PARQUET-1249 URL: https://issues.apache.org/jira/browse/PARQUET-1249 Project: Parquet Issue Type

Re: Contributing parquet-rs to Apache?

2018-02-28 Thread Chao Sun
t be able to claim copyright? > > It sounds like we will just need to start a resolution on the Incubator's > general list to accept the code and have you and Ivan sign ICLAs. > > rb > > On Mon, Feb 26, 2018 at 11:45 AM, Chao Sun <sunc...@apache.org> wrote: > >> T

Re: Contributing parquet-rs to Apache?

2018-02-26 Thread Chao Sun
; > So I guess the question is: who owns the code and will those people or > organizations work on it once it is moved to Apache? > > rb > > On Fri, Feb 23, 2018 at 11:06 PM, Chao Sun <chao.apa...@gmail.com> wrote: > > > Thanks Wes. We are ready to start the IP cl

Re: Contributing parquet-rs to Apache?

2018-02-23 Thread Chao Sun
the email thread earlier. > > > > I will try fixing some issues of the milestone 1, so that we could have > the > > read part complete. > > > > > > Cheers, > > > > Ivan > > On Fri, 16 Feb 2018 at 5:33 PM, Chao Sun <sunc...@apache.org> w

Contributing parquet-rs to Apache?

2018-02-15 Thread Chao Sun
Hi, Just joined this mailing list. Ivan and me have been working on a Rust implementation of Parquet for some time. It still lacks many features but the eventual goal is to contribute it to the Apache community. I saw a few weeks ago there's a discussion