Parquet community sync meeting notes 5/28/2024

2024-05-28 Thread Xinli shang
better to clarify from the spec. -- Xinli Shang

Updated invitation: Parquet Sync @ Monthly from 7am to 8am on the fourth Tuesday from Tue Feb 27 to Mon May 27 (PST) (dev@parquet.apache.org)

2024-05-23 Thread Xinli shang
-GUESTS=0:mailto:uber.com_ 53454131313931326e6441766530387468426c616b656c793756432d343836313237@resour ce.calendar.google.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE ;CN=Xinli shang;X-NUM-GUESTS=0:mailto:sha...@uber.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ

Updated invitation: Parquet Sync @ Tue May 28, 2024 9am - 10am (PDT) (dev@parquet.apache.org)

2024-05-23 Thread Xinli shang
:20240523T154428Z ORGANIZER;CN=Xinli shang:mailto:sha...@uber.com UID:e0nn7qc9q58dv974d5gmrql...@google.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE ;CN=Xinli shang;X-NUM-GUESTS=0:mailto:sha...@uber.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS

Updated invitation: Parquet Sync @ Monthly from 7am to 8am on the fourth Tuesday from Tue Apr 25, 2023 to Mon May 27 (PDT) (dev@parquet.apache.org)

2024-05-23 Thread Xinli shang
:mailto:uber.com_ 53454131313931326e6441766530387468426c616b656c793756432d343836313237@resour ce.calendar.google.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE ;CN=Xinli shang;X-NUM-GUESTS=0:mailto:sha...@uber.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT

Updated invitation: Parquet Sync @ Monthly from 9am to 10am on the fourth Tuesday (PDT) (dev@parquet.apache.org)

2024-05-23 Thread Xinli shang
. com_53454131313931326e6441766530387468426c616b656c793756432d343836313237@re source.calendar.google.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE ;CN=Xinli shang;X-NUM-GUESTS=0:mailto:sha...@uber.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP

Canceled event: Parquet Sync @ Tue May 28, 2024 7am - 8am (PDT) (dev@parquet.apache.org)

2024-05-23 Thread Xinli shang
;CN=Xinli shang;X-NUM-GUESTS=0:mailto:sha...@uber.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;CN=ma tthew.m.tur...@outlook.com;X-NUM-GUESTS=0:mailto:matthew.m.turner@outlook.c om ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;CN=ga bor.szadovs

Re: Interest in Parquet V3

2024-05-19 Thread Xinli shang
> > > > Thanks, > > Micah > > > > [1] https://github.com/maxi-k/btrblocks > > [2] https://github.com/facebookincubator/nimble > > [3] https://blog.lancedb.com/lance-v2/ > > [4] https://github.com/apache/arrow/issues/39676 > > [5] https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34 > > > -- Xinli Shang

[ANNOUNCE] New Parquet PMC Member: Gang Wu

2024-05-11 Thread Xinli shang
Hi all, As a Parquet committer, Gang Wu has remained very active and instructive in the community. The Parquet community invited him to be a PMC member, and he accepted. It's my pleasure to announce that Gang is now officially a PMC member of Apache Parquet. Congratulations, Gang! Xinli Shang

Re: [VOTE] Release Apache Parquet 1.14.0 RC1

2024-05-07 Thread Xinli shang
t; > > > > > > > > https://github.com/apache/parquet-mr/tree/fe9179414906cc19b550d13d2819b4e16fddf8a1 > > > > > > > > > > > > > > The release tarball, signature, and checksums are here: > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.14.0-rc1/ > > > > > > > > > > > > > > You can find the KEYS file here: > > > > > > > * https://downloads.apache.org/parquet/KEYS > > > > > > > > > > > > > > Binary artifacts are staged in Nexus here: > > > > > > > * > > > > > > > > > > > > https://repository.apache.org/content/groups/staging/org/apache/parquet/ > > > > > > > > > > > > > > This release includes important changes: > > > > > > > > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/parquet-mr/blob/parquet-1.14.x/CHANGES.md#version-1140 > > > > > > > > > > > > > > Since RC0 one commit has been added: > > > > > > > https://github.com/apache/parquet-mr/pull/1342 > > > > > > > > > > > > > > Please download, verify, and test. > > > > > > > > > > > > > > Please vote in the next 72 hours. > > > > > > > > > > > > > > [ ] +1 Release this as Apache Parquet 1.14.0 > > > > > > > [ ] +0 > > > > > > > [ ] -1 Do not release this because... > > > > > > > > > > > > > > Kind regards, > > > > > > > Fokko > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.14.0 RC0

2024-04-30 Thread Xinli shang
> > > > https://github.com/apache/parquet-mr/tree/af0740229929337e1395fd24253a4ed787df2db3 > > > > > > > > > > > > > > The release tarball, signature, and checksums are here: > > > > > > > * > > > > > > > > > > > > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.14.0-rc0 > > > > > > > > > > > > > > You can find the KEYS file here: > > > > > > > * https://downloads.apache.org/parquet/KEYS > > > > > > > > > > > > > > Binary artifacts are staged in Nexus here: > > > > > > > * > > > > > > > > > > > > https://repository.apache.org/content/groups/staging/org/apache/parquet/ > > > > > > > > > > > > > > This release includes important changes: > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/parquet-mr/blob/parquet-1.14.x/CHANGES.md#version-1140 > > > > > > > > > > > > > > Please download, verify, and test. > > > > > > > > > > > > > > Please vote in the next 72 hours. > > > > > > > > > > > > > > [ ] +1 Release this as Apache Parquet 1.14.0 > > > > > > > [ ] +0 > > > > > > > [ ] -1 Do not release this because... > > > > > > > > > > > > > > Best, > > > > > > > Gang > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Xinli Shang

Parquet Sync meeting notes - April 23 2024

2024-04-23 Thread Xinli shang
4/23/2024 Attendee Fokko Driesprong, Vinoo Ganesh, Xinli Shang Parquet-mr 1.14 release: 1. Fokko and Gang will discuss starting the release soon 2. There are a few breaking changes we need to make to ensure backward compatibility and do proper testing 2. Vinoo will shadow and do some

Parquet sync meeting notes - March 26 2024

2024-03-26 Thread Xinli shang
Hi all, These are the meeting notes of today's sync meeting. 3/26/2024 Attendees: Gábor Szádovszky, Vinoo Ganesh, Xinli Shang 1. Parquet-mr 1.14 release - target for mid of 2024 2. Vulnerabilities findings - done. 3. Java and scala files in format repo removal - start

Re: [VOTE] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64

2024-03-13 Thread Xinli shang
+1 (binding) Sorry for being late and thanks for working on it! Xinli Shang On Fri, Mar 8, 2024 at 8:28 AM Micah Kornfield wrote: > +1 (non-binding) > > On Thursday, March 7, 2024, Gang Wu wrote: > > > +1 (non-binding) > > > > Best, > > Gang > >

Parquet community sync meeting - Feb 2024

2024-02-27 Thread Xinli shang
Hi all, These are notes for today's sync meeting! 2/27/2024 Attendee Fokko Driesprong, Vinoo Ganesh, Xinli Shang 1. Parquet-mr 1.14 release - target for mid of 2024 2. Vulnerabilities findings - the code isn’t used anymore. We will remove them - AI: Vinoo. 3. Some

Updated invitation: Parquet Sync @ Monthly from 7am to 8am on the fourth Tuesday (PST) (dev@parquet.apache.org)

2024-02-20 Thread Xinli shang
:mailto:uber.com_ 53454131313931326e6441766530387468426c616b656c793756432d343836313237@resour ce.calendar.google.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE ;CN=Xinli shang;X-NUM-GUESTS=0:mailto:sha...@uber.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS

Updated invitation: Parquet Sync @ Monthly from 7am to 8am on the fourth Tuesday from Tue Jul 25, 2023 to Mon Feb 26 (PDT) (dev@parquet.apache.org)

2024-02-20 Thread Xinli shang
-GUESTS=0:mailto:uber.com_ 53454131313931326e6441766530387468426c616b656c793756432d343836313237@resour ce.calendar.google.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE ;CN=Xinli shang;X-NUM-GUESTS=0:mailto:sha...@uber.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ

Re: [WIP][Proposal] PARQUET-2430: Add parquet joiner

2024-02-18 Thread Xinli shang
t; > > JIRA: https://issues.apache.org/jira/browse/PARQUET-2430 > > > PR's description and JIRA ticket contains all the details, please check > > it > > > out. The feature is not yet ready to merge, it is just a proposal for > > now. > > > I wanted to ask a PARQUET community opinion if you see any obstacles > for > > > adding it? We find it very useful and plan to use it and if PARQUET > > > community finds no issues with it I can add tests, javadocs and polish > it > > > so we can add this new feature to PARQUET. > > > > > > > > > Max. > > > > > > -- Xinli Shang

Re: Fast nullify of columns?

2024-01-03 Thread Xinli shang
/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java#L743C13 > > > ' for (int i = 0; i < totalChunkValues; i++) {...' > > > > > > Could a single call be made per column + row-group to write enough > > > information to: > > > A) keep the column present (in schema and as a Column chunk) > > > B) set Column rowCount and num_nulls= totalChunkValues > > > > > > > > > e.g. perhaps write a single 'empty' page which has: > > > 1) valueCount and rowCount = totalChunkValues > > > 2) Statistics.num_nulls set to totalChunkValues > > > > > > Thanks, Paul > > > > > > -- Xinli Shang

Canceled event: Parquet Sync @ Tue Dec 26, 2023 7am - 8am (PST) (dev@parquet.apache.org)

2023-12-13 Thread Xinli shang
31313931326e6441766530387468426c616b656c793756432d343836313...@resource.cal endar.google.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;CN=Xinli shang;X-NUM-GUESTS=0:mailto:sha...@uber.com ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;CN=ma tthew.m.tur...@outlook.com;X-NUM

Re: [VOTE][RESULT][FORMAT] Add repetition, definition and variable length size metadata statistics

2023-11-17 Thread Xinli shang
0 PM Gidon Gershinsky > wrote: > > > +1 (binding) > > > > Cheers, Gidon > > > > > > On Tue, Nov 14, 2023 at 5:31 AM Xinli shang > > wrote: > > > > > Yeah, we need one more PMC to vote. If you can help, appreciate it. > >

Re: [VOTE] Release Apache Parquet Format 2.10.0 RC0

2023-11-17 Thread Xinli shang
t; This release includes important changes listed below: > > * > > > > > https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-2100 > > * https://issues.apache.org/jira/projects/PARQUET/versions/12350092 > > > > Please download, verify, and test. > > > > This vote will be open for at least 72 hours. > > > > [ ] +1 Release this as Apache Parquet Format 2.10.0 > > [ ] +0 > > [ ] -1 Do not release this because... > > > > Thanks, > > Gang > > > -- Xinli Shang

Re: [VOTE][FORMAT] Add repetition, definition and variable length size metadata statistics

2023-11-13 Thread Xinli shang
; > Thanks, > > Micah > > > > On Wed, Nov 8, 2023 at 8:55 AM Gábor Szádovszky > wrote: > > > > > +1 (binding) > > > > > > Cheers, > > > Gabor > > > > > > On 2023/11/07 02:46:37 Xinli shang wrote: > > >

Re: [VOTE][FORMAT] Add repetition, definition and variable length size metadata statistics

2023-11-06 Thread Xinli shang
+1 (binding) On Mon, Nov 6, 2023 at 4:56 PM Gang Wu wrote: > +1 (non-binding) > > Best, > Gang > > On Tue, Nov 7, 2023 at 3:57 AM Ed Seidl wrote: > > > +1 (non-binding) > > > > Thanks! > > Ed > > > -- Xinli Shang

Re: [VOTE][Format] Add Float16 type to specification

2023-10-05 Thread Xinli shang
> > This vote will be open for at least 72 hours. > > > > [ ] +1 Add this type to the format specification > > [ ] +0 > > [ ] -1 Do not add this type to the format specification because... > > > > Thanks! > > > > Ben > > > > [1]: https://en.wikipedia.org/wiki/Half-precision_floating-point_format > > > > > > > > -- Xinli Shang

Re: Drop parquet-thrift

2023-10-03 Thread Xinli shang
.com/apache/parquet-mr/pull/1156>. Therefore it is hard > > to > > > test if we break anything. > > > > > > It looks like parquet-thrift is not used by anyone anymore > > > <https://mvnrepository.com/artifact/org.apache.parquet/parquet-thrift > >. > > I > > > would suggest removing the module from the repository > > > <https://github.com/apache/parquet-mr/pull/1158> unless anyone > objects. > > > > > > Kind regards, Fokko > > > > > > -- Xinli Shang

Re: [Request] Send automated notifications to a separate mailing-list

2023-08-27 Thread Xinli shang
mailing-list > > > > Best, > > Gang > > > > On Tue, Aug 22, 2023 at 8:49 AM Xinli shang > wrote: > > > >> It is a good idea. Thank Antonie for the proposal. > >> > >> On Tue, Aug 22, 2023 at 2:03 AM Julien Le Dem >

Re: [Request] Send automated notifications to a separate mailing-list

2023-08-21 Thread Xinli shang
; > For the record, we did this move in Apache Arrow and never came back. > > > > Thanks in advance > > > > Antoine. > > > > > > > -- Xinli Shang

Parquet Sync meeting notes - July 2023

2023-07-25 Thread Xinli shang
7/25/2023 Attendees (Gidon Gershinsky , Gang Wu, Chao Sun, Xinli Shang, Jiashen Zhang) Review data masking 1. The current design is to implement on the reader side and it is lightweight 2. When KMS returned access denied and the session-based flag is enabled, a null value

Re: Rewrite Parquet List columns

2023-07-23 Thread Xinli shang
rying to extend the Parquet Rewrite tool to be able to > read those parquet and only rewrite the list columns as Level 3. Any > pointers on which classes or APIs i should leverage for this purpose? Any > pointers would be appreciated. > > -- > Take Care, > Rajesh Mahindra > -- Xinli Shang

Parquet monthly sync meeting notes - June 2023

2023-06-28 Thread Xinli shang
Hi all, These are the meeting notes for June 2023 sync meeting. 6/28/2023 Attendees (Yi He, Xinli Shang, Jiashen Zhang) 1. Add more data masking cases than nullify. The initial draft <https://docs.google.com/document/d/1JJrEOAoZDswkwTeKmFD2drZXK60ADdrkaYlMMuMwUV0/edit> i

Re: Bloom filters for full-text search and predicate pushdown

2023-06-15 Thread Xinli shang
we look forward to your proposal and POC. If you want to come to discuss this week's sync meeting, you are more than welcome. I added you. Xinli Shang On Thu, Jun 15, 2023 at 4:38 AM Antoine Pitrou wrote: > > Hi, > > This would require standardizing on a specific tokenizati

Re: [ANNOUNCE] Apache Parquet release 1.13.1

2023-05-22 Thread Xinli shang
Java artifacts are available from Maven Central. > > Thanks to everyone for contributing and voting! > > Kind regards, Fokko > -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.13.1 RC0

2023-05-13 Thread Xinli shang
> > > Handy commands for verifying the release: > > > > * > > > https://iceberg.apache.org/how-to-release/#validating-a-source-release-candidate > > > > Replace Iceberg with Parquet :) > > > > > > Please download, verify, and test. > > > > > > Please vote in the next 72 hours. > > > > > > [ ] +1 Release this as Apache Parquet 1.13.1 > > > > [ ] +0 > > > > [ ] -1 Do not release this because... > > -- Xinli Shang

Re: [DISCUSS] Time to release parquet format 2.10.0?

2023-05-13 Thread Xinli shang
his up, I think that would be a great idea! > > > > > > > > > > Kind regards, > > > > > Fokko > > > > > > > > > > Op do 27 apr 2023 om 09:52 schreef Gang Wu : > > > > > > > > > > > Hi, > > > > > > > > > > > > The latest parquet format is v2.9.0 [1] which was released two > > years > > > ago. > > > > > > Is it a good time to release the next version? If there is no > > > objection, > > > > > I > > > > > > can > > > > > > volunteer to be the release manager. > > > > > > > > > > > > [1] > > https://github.com/apache/parquet-format/blob/master/CHANGES.md > > > > > > > > > > > > Best, > > > > > > Gang > > > > > > > > > > > > > > > > > > > > > -- Xinli Shang

Parquet sync meeting notes - April 2023

2023-04-28 Thread Xinli shang
Hi all, Here is the meeting notes for today's Parquet sync meeting. 4/28/2023 Attendee (Shenxuan Liu, Fokko Driesprong, Gang Wu, Jiashen Zhang, Xinli Shang ) 1. Post-release 1.13.0 1. Iceberg upgraded to 1.13.0 bumped the Hadoop support to Hadoop 3 but we didn’t notice

Re: [DISCUSS] Release of Apache Parquet 1.13.1

2023-04-25 Thread Xinli shang
? The new version 1.13.0 is just released and I am not sure if there are more issues coming so that we can put together the fixes into 1.13.1. Is Iceberg urgently blocked on this? Xinli Shang On Tue, Apr 25, 2023 at 6:51 PM Gang Wu wrote: > That sounds good to me. > > I have just

[jira] [Comment Edited] (PARQUET-2276) ParquetReader reads do not work with Hadoop version 2.8.5

2023-04-22 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715304#comment-17715304 ] Xinli Shang edited comment on PARQUET-2276 at 4/22/23 4:36 PM: --- [~a2l]Did

[jira] [Comment Edited] (PARQUET-2276) ParquetReader reads do not work with Hadoop version 2.8.5

2023-04-22 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715304#comment-17715304 ] Xinli Shang edited comment on PARQUET-2276 at 4/22/23 4:36 PM: --- [~a2l

[jira] [Commented] (PARQUET-2276) ParquetReader reads do not work with Hadoop version 2.8.5

2023-04-22 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715304#comment-17715304 ] Xinli Shang commented on PARQUET-2276: -- [~Aufderhar]Did you try Hadoop 2.9.x? I agree

Re: [VOTE] Release Apache Parquet 1.13.0 RC0

2023-04-03 Thread Xinli shang
16c94ea99%7C0%7C0%7C638160504241192076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=YmU60HCl776s6O4hvu%2FNFFXZY1ij9E0z9HquzmeJDxc%3D=0 > > > > Please download, verify, and test. > > > > Please vote in the next 72 hours. > > > > [ ] +1 Release this as Apache Parquet 1.13.0 > > [ ] +0 > > [ ] -1 Do not release this because... > > > > Best regards, > > Gang > > > -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.12.4 RC0

2023-03-30 Thread Xinli shang
g/parquet/KEYS. > > > > > > > > > > > > On 2023/03/28 17:01:30 Chao Sun wrote: > > > > > > > +1 (non-binding). Verified checksum & signature, and ran all > the > > > > tests > > > > > > locally. > > &

Re: [VOTE] Release Apache Parquet 1.12.4 RC0

2023-03-28 Thread Xinli shang
3000%7C%7C%7C=SXURCILyTz6SYb3iNPEnedkgjMk%2BA%2FLYHyS4TvT4bbM%3D=0 > > > > Please download, verify, and test. > > > > Please vote in the next 72 hours. > > > > [ ] +1 Release this as Apache Parquet 1.12.4 > > [ ] +0 > > [ ] -1 Do not release this because... > > > > Best regards, > > Gang > > > -- Xinli Shang

[jira] [Commented] (PARQUET-1690) Integer Overflow of BinaryStatistics#isSmallerThan()

2023-03-17 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701789#comment-17701789 ] Xinli Shang commented on PARQUET-1690: -- It is a quite long time ago. I don't remember. Yeah

Gang Wu as new Apache Parquet committer

2023-02-27 Thread Xinli shang
The Project Management Committee (PMC) for Apache Parquet has invited Gang Wu (gangwu) to become a committer and we are pleased to announce that he has accepted. Congratulations and welcome, Gang! -- Xinli Shang

[jira] [Commented] (PARQUET-2233) Parquet Travis CI jobs to be turned off February 15th

2023-01-25 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680705#comment-17680705 ] Xinli Shang commented on PARQUET-2233: -- [~Jiashen Zhang]Please have a look and we can discuss

[jira] [Commented] (PARQUET-2233) Parquet Travis CI jobs to be turned off February 15th

2023-01-24 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680363#comment-17680363 ] Xinli Shang commented on PARQUET-2233: -- Were you able to log in? > Parquet Travis CI j

[jira] [Comment Edited] (PARQUET-2233) Parquet Travis CI jobs to be turned off February 15th

2023-01-24 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680345#comment-17680345 ] Xinli Shang edited comment on PARQUET-2233 at 1/24/23 8:19 PM

[jira] [Commented] (PARQUET-2233) Parquet Travis CI jobs to be turned off February 15th

2023-01-24 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680345#comment-17680345 ] Xinli Shang commented on PARQUET-2233: -- In this Issue, we are going to migrate parquet-mr.git

[jira] [Created] (PARQUET-2233) Parquet Travis CI jobs to be turned off February 15th

2023-01-24 Thread Xinli Shang (Jira)
Xinli Shang created PARQUET-2233: Summary: Parquet Travis CI jobs to be turned off February 15th Key: PARQUET-2233 URL: https://issues.apache.org/jira/browse/PARQUET-2233 Project: Parquet

Parquet sync meeting notes 1/24/2023

2023-01-24 Thread Xinli shang
Attendees:( Gidon Gershinsky, Xinli Shang, Tim Miller, Vinoo) 1. Release new version 1. ZSTD stream closure bug fixes and a few other fixes are blocking issues. 2. PRs: 1. Parquet-2069 <https://github.com/apache/parquet-mr/pull/957>: Fix som

Re: Vectored IO in Parquet ( https://issues.apache.org/jira/browse/PARQUET-2171)

2022-10-08 Thread Xinli shang
menting, testing and > releasing this feature in the best possible way. > > I will be talking about all these in the upcoming Apache Conference NA next > week Tuesday, October 04, 4:10 PM CDT. It would be really great to meet > anyone who would be interested in getting involved in this. > > > > Thanks, > Mukund > -- Xinli Shang

Parquet community sync meeting notes - 9/27/2022

2022-09-27 Thread Xinli shang
9/27/2022 Attendees ( Gidon Gershinsky, Xinli Shang, Tim Miller, Jiasheng Zhang) 1. Parquet Cell-level encryption 1. Will open PRs after delivering it internally 2. Parquet-2069 <https://github.com/apache/parquet-mr/pull/957>: Fix some Avro schema issues, in g

[jira] [Created] (PARQUET-2183) Fix statistics issue of Column Encryptor

2022-09-02 Thread Xinli Shang (Jira)
Xinli Shang created PARQUET-2183: Summary: Fix statistics issue of Column Encryptor Key: PARQUET-2183 URL: https://issues.apache.org/jira/browse/PARQUET-2183 Project: Parquet Issue Type

Re: Interest in adding the float16 logical type to the Parquet spec

2022-08-24 Thread Xinli shang
popular: > https://en.wikipedia.org/wiki/Half-precision_floating-point_format ; I do > think that a demand exists for its support. I am new to the project, but am > happy to contribute development time if there is support for this feature, > and guidance. > > Warm regards, > > Anja > -- Xinli Shang

Parquet Sync meeting - July 26 2022

2022-07-26 Thread Xinli shang
Attendees ( Gidon Gershinsky, Xinli Shang, Tim Miller) 1. Release 1.12.3 1. Post release - no issue reported. 2. Parquet Cell-level encryption a. What if the user only partially has the keys but not all the hidden columns? Should we throw

Re: Review of Q2 Parquet report

2022-07-05 Thread Xinli shang
Thanks Gidon for pointing it out! On Tue, Jul 5, 2022 at 12:59 PM Gidon Gershinsky wrote: > nit: MR-1.12.3 released on 202*2*-05-26. > > Cheers, Gidon > > > On Tue, Jul 5, 2022 at 6:04 PM Xinli shang > wrote: > > > Hi all, > > > > The report below

Review of Q2 Parquet report

2022-07-05 Thread Xinli shang
, past quarter (-20% change) 17 PRs closed on GitHub, past quarter (-43% change -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.12.3 RC1

2022-05-26 Thread Xinli shang
Thank Julien, Gidon, and Yuming for verifying and voting! The vote passed! I will move forward with the next steps. On Wed, May 25, 2022 at 9:29 PM Julien Le Dem wrote: > +1 > Verified signatures and tested > > On Mon, May 23, 2022 at 4:23 PM Xinli shang > wrote: >

Meeting notes for Parquet monthly sync - 5/24/2022

2022-05-24 Thread Xinli shang
Hi all, This is the meeting notes for today's Parquet sync meeting. We just had a short one as everybody is busy now. We mainly focus on release now. Attendees (Timothy Miller(theo...@amazon.com), Gidon Gershinsky , Xinli Shang) Release 1.12.3 - In progress, email was sent out, waiting for 1

Re: [VOTE] Release Apache Parquet 1.12.3 RC1

2022-05-23 Thread Xinli shang
Parquet 1.12.3 RC1 > External Email > > +1. Downloaded, verified and tested. > > Cheers, Gidon > > > On Fri, May 20, 2022 at 8:49 PM Xinli shang > wrote: > > > Hi everyone, > > > > > > I propose the following RC to be released as th

[VOTE] Release Apache Parquet 1.12.3 RC1

2022-05-20 Thread Xinli shang
ext 72 hours. [ ] +1 Release this as Apache Parquet 1.12.3 [ ] +0 [ ] -1 Do not release this because... Xinli Shang PMC Chair of Apache Parquet TLM Uber Data Infra

Re: AvroParquetWriter write to s3

2022-05-15 Thread Xinli shang
ob/99fe75a823d4b02f4e90fa0dda06a1558d5617a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/SkipMd5CheckStrategy.java#L42 > > The issue is that I do not find a proper way to inject such configurations > into AvroParquetWriter. Is this possible? If yes, can you help to show how > to do it? > > Thanks > > Regin > -- Xinli Shang

Meeting notes for Parquet monthly sync - 4/27/2022

2022-04-27 Thread Xinli shang
4/27/2022 Attendees (Timothy Miller, Vinoo Ganesh, Satish K, Gidon Gershinsky, Xinli Shang, Huaxin Gao) 1. Cell-Level encryption 1. Internal implementation and rollout 2. Welcome new comments 2. Release 1.12.3 1. SNAPSHOT release - Gidon

[jira] [Commented] (PARQUET-1681) Avro's isElementType() change breaks the reading of some parquet(1.8.1) files

2022-04-08 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17519686#comment-17519686 ] Xinli Shang commented on PARQUET-1681: -- [~theosib-amazon]It seems different. > Avr

ASF Board Report Draft Review

2022-03-31 Thread Xinli shang
PRs opened on GitHub, past quarter (190% increase) 29 PRs closed on GitHub, past quarter (163% increase) dev@parquet.apache.org had a 65% decrease in traffic in the past quarter -- Xinli Shang

Re: Parquet Website Launched

2022-03-25 Thread Xinli shang
let me know if you have any feedback or feature requests. > > Thanks, > Vinoo Ganesh | vinoo.gan...@gmail.com > > > -- Xinli Shang

Parquet sync meeting notes 3/23/2022

2022-03-23 Thread Xinli shang
rquet writer for Iceberg (Adding a new constructor) 1. A diff will be sent out soon 4. New website (link <https://parquet.staged.apache.org/>) 1. Looks good, will make it formal -- Xinli Shang VP Apache Parquet PMC Chair, Tech Lead Manager at Uber Data Infra

Look for protobuf reviewers for PR-900

2022-03-20 Thread Xinli shang
Hi all, We have a PR <https://github.com/apache/parquet-mr/pull/900> related to Protobuf pending review. We are looking for people who are familiar with Protobbuf to review the change. If you can help, please review. Thanks. -- Xinli Shang

[jira] [Commented] (PARQUET-1595) Parquet proto writer de-nest Protobuf wrapper classes

2022-03-20 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509500#comment-17509500 ] Xinli Shang commented on PARQUET-1595: -- Is it a typo for Int32Value -> int64? > Parquet

Please review the design of Parquet-2116: Cell Level Encryption

2022-03-12 Thread Xinli shang
oyw5u0ywe> discussion. Any feedback is welcome! Feel free to make comment on the document directly. Thanks. -- Xinli Shang

[jira] [Updated] (PARQUET-2116) Cell Level Encryption

2022-03-12 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang updated PARQUET-2116: - External issue URL: https://docs.google.com/document/d

Re: Meeting notes for Parquet sync meeting - March 1st. 2022

2022-03-11 Thread Xinli shang
-MR? Also will it be a major/minor/or a > patch release. > > Thanks and Regards > Prakhar Jain > > > On Tue, Mar 1, 2022 at 9:34 AM Xinli shang > wrote: > > > 3/1/2022 > > > > Attendees: Xinli Shang, Gidon Gershinsky, Vinoo Ganesh > > > >1. >

Two blogs about Apache Parquet were just published on the Uber EngBlog site

2022-03-11 Thread Xinli shang
we have done with the community in the last 3 years around Parquet Modular Encryption. I would like to thank Gidon for his continuous collaborations with us! If you have any questions about the blog, feel free to reach out! Xinli Shang Tech Lead Manager at Uber Data Infra VP Apache Parquet PMC Chair

Meeting notes for Parquet sync meeting - March 1st. 2022

2022-03-01 Thread Xinli shang
3/1/2022 Attendees: Xinli Shang, Gidon Gershinsky, Vinoo Ganesh 1. The new website of Apache Parquet is to be launched 1. https://www.vinoo.io/ 2. Vinoo to send out an email to dev@ for a preview 2. Cell level encryption 1. Objective/Goals need

Re: Get uncompressed size of parquet file via parquet-cli

2022-02-20 Thread Xinli shang
> [1] > https://github.com/apache/parquet-mr/blob/master/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ParquetMetadataCommand.java#L123 > -- > Thanks & Regards > Deepak Gangwar > > -- Xinli Shang

[jira] [Comment Edited] (PARQUET-2127) Security risk in latest parquet-jackson-1.12.2.jar

2022-02-17 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494321#comment-17494321 ] Xinli Shang edited comment on PARQUET-2127 at 2/18/22, 2:23 AM: Thanks

[jira] [Commented] (PARQUET-2127) Security risk in latest parquet-jackson-1.12.2.jar

2022-02-17 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494321#comment-17494321 ] Xinli Shang commented on PARQUET-2127: -- Thanks for reporting [~phoebemaomao]! Will you be able

[jira] [Comment Edited] (PARQUET-2122) Adding Bloom filter to small Parquet file bloats in size X1700

2022-02-14 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492099#comment-17492099 ] Xinli Shang edited comment on PARQUET-2122 at 2/14/22, 4:56 PM

[jira] [Commented] (PARQUET-2122) Adding Bloom filter to small Parquet file bloats in size X1700

2022-02-14 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492099#comment-17492099 ] Xinli Shang commented on PARQUET-2122: -- [~junjie]Do you know why? > Adding Bloom filter to sm

Re: Parquet Column Resolution by ID

2022-02-11 Thread Xinli shang
fD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing > > >. > > We'd like to start a discussion on the doc and any feedback is welcome! > > > > Thanks, > > Huaxin > > > -- Xinli Shang

[jira] [Commented] (PARQUET-2117) Add rowPosition API in parquet record readers

2022-02-02 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17485949#comment-17485949 ] Xinli Shang commented on PARQUET-2117: -- Thanks for opening this Jira! Look forward to the PR

Re: Parquet sync meeting notes - 1/26/2022

2022-01-27 Thread Xinli shang
Here <https://docs.google.com/document/d/1Q-d98Os_aJahUynznPrWvXwWQeN0aFDRhZj3hXt_JOM> is the link for the Cell-Level encryption pre-design. Feel free to share the feedback in the file directly by adding comments. On Wed, Jan 26, 2022 at 9:51 AM Xinli shang wrote: > 1/26/2022 >

[jira] [Updated] (PARQUET-2116) Cell Level Encryption

2022-01-27 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang updated PARQUET-2116: - External issue URL: https://docs.google.com/document/d/1Q

[jira] [Created] (PARQUET-2116) Cell Level Encryption

2022-01-27 Thread Xinli Shang (Jira)
Xinli Shang created PARQUET-2116: Summary: Cell Level Encryption Key: PARQUET-2116 URL: https://issues.apache.org/jira/browse/PARQUET-2116 Project: Parquet Issue Type: Improvement

[jira] [Resolved] (PARQUET-2091) Fix release build error introduced by PARQUET-2043

2022-01-27 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang resolved PARQUET-2091. -- Resolution: Won't Fix > Fix release build error introduced by PARQUET-2

[jira] [Commented] (PARQUET-2098) Add more methods into interface of BlockCipher

2022-01-27 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17483225#comment-17483225 ] Xinli Shang commented on PARQUET-2098: -- [~gershinsky] Do you have time to work on it as we

[jira] [Resolved] (PARQUET-2112) Fix typo in MessageColumnIO

2022-01-27 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang resolved PARQUET-2112. -- Resolution: Fixed > Fix typo in MessageColum

Parquet sync meeting notes - 1/26/2022

2022-01-26 Thread Xinli shang
1/26/2022 Attendees: Xinli Shang, Gidon Gershinsky, Pavi Subenderan, Jason Zhang 1. Data masking 1. Pavi: Will create a PR by next week 2. PARQUET-2062 <https://issues.apache.org/jira/browse/PARQUET-2062> 3. Will have a high-level design sent ou

[jira] [Created] (PARQUET-2112) Fix typo in MessageColumnIO

2022-01-22 Thread Xinli Shang (Jira)
Xinli Shang created PARQUET-2112: Summary: Fix typo in MessageColumnIO Key: PARQUET-2112 URL: https://issues.apache.org/jira/browse/PARQUET-2112 Project: Parquet Issue Type: Improvement

Re: To be a Parquet contributor

2022-01-21 Thread Xinli shang
> really interested in Parquet and I would like to join our Parquet > community, could you help pull me into our community, such as inviting to > the channel or meetings etc? > > -- > Thanks, > Jiashen > -- Xinli Shang

[jira] [Commented] (PARQUET-2111) Support limit push down and stop early for RecordReader

2022-01-21 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480128#comment-17480128 ] Xinli Shang commented on PARQUET-2111: -- Look forward to the PR > Support limit push down and s

[jira] [Resolved] (PARQUET-2071) Encryption translation tool

2022-01-14 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang resolved PARQUET-2071. -- Resolution: Fixed > Encryption translation t

[jira] [Resolved] (PARQUET-1872) Add TransCompression Feature

2022-01-14 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang resolved PARQUET-1872. -- Resolution: Fixed > Add TransCompression Feat

[jira] [Resolved] (PARQUET-2105) Refactor the test code of creating the test file

2022-01-14 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang resolved PARQUET-2105. -- Resolution: Fixed > Refactor the test code of creating the test f

[jira] [Commented] (PARQUET-1889) Register a MIME type for the Parquet format.

2022-01-11 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17473147#comment-17473147 ] Xinli Shang commented on PARQUET-1889: -- +1 on [~westonpace]'s point > Register a MIME t

Review of the ASF Board Report for Parquet

2022-01-05 Thread Xinli shang
opened in JIRA, past quarter (-75% change) 11 issues closed in JIRA, past quarter (-45% change) 7 commits in the past quarter (-85% change) 7 code contributors in the past quarter (-53% change) 11 PRs opened on GitHub, past quarter (-47% change) -- Xinli Shang

Re: Parquet-tools Replacement

2022-01-04 Thread Xinli shang
. > > Thanks, > Vinoo Ganesh | vinoo.gan...@gmail.com > > > > > On Tue, Jan 4, 2022 at 12:49 PM Xinli shang > wrote: > > > Hi Vinoo, > > > > Thanks for bringing this up! Yes, they are deprecated. The recommended > > replacement is to use Parque

[jira] [Commented] (PARQUET-1911) Add way to disables statistics on a per column basis

2022-01-04 Thread Xinli Shang (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17468759#comment-17468759 ] Xinli Shang commented on PARQUET-1911: -- [~panthony] Thanks for working on this! Just FYI

Re: Parquet-tools Replacement

2022-01-04 Thread Xinli shang
s.apache.org/jira/browse/PARQUET-1666 too. Is there a > recommended replacement for parquet-tools? If so, could someone point me to > it? Thanks! > > Thanks, > Vinoo Ganesh | vinoo.gan...@gmail.com > > > -- Xinli Shang

About the security issue of log4j for Parquet

2021-12-12 Thread Xinli shang
Hi all, Most of you must have known of the severe security issue( https://www.randori.com/blog/cve-2021-44228) in log4j. I just want to have a short update that Paquet doesn't have a dependency on the log4j versions that are impacted. Have a good weekend! -- Xinli Shang

  1   2   3   4   >