Please review Parquet-1396 - Crypto Interface for Schema Activation of Parquet Encryption

2019-01-17 Thread Xinli shang
t application just encrypt that column automatically. This makes end user easy to manage the encryptions of their columns. Thanks in advance for your time! Looking forward to your feedbacks! -- Xinli Shang Uber Big Data Team

Re: [VOTE] Modular Encryption design sign-off

2018-11-30 Thread Xinli shang
tes out ColumnMetaData. > > > > > >- This contains details that aren’t required, like using the > > > > > >“.parquet.encrypted” file extension. > > > > > >- The only time the new magic bytes are mentioned is after the > > > > footer > > > > > >and encryption metadata, but the diagram shows that the first > > > bytes > > > > > in the > > > > > >file are updated as well. This is also only in the encrypted > > > footer > > > > > mode. > > > > > >Should PARE magic bytes be used in plaintext footer mode? > > > > > > > > > > > > Minor note: I would also prefer to vote on GCM, leaving out CTR > for > > > now > > > > > > and adding it once the GCM spec is finished. That way we can > > > > concentrate > > > > > on > > > > > > a single cipher mode instead of thinking about multiple modes at > > > once. > > > > > > > > > > > > On Tue, Oct 16, 2018 at 2:44 AM Anna Szonyi > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >> +1 (non-binding) > > > > > >> > > > > > >> On Tue, Oct 16, 2018 at 11:14 AM Nandor Kollar > > > > > >> > > > > > >> wrote: > > > > > >> > > > > > >> > +1 (non-binding) > > > > > >> > On Tue, Oct 16, 2018 at 10:59 AM 俊杰陈 > > wrote: > > > > > >> > > > > > > > >> > > +1 (non-binding) > > > > > >> > > > > > > > >> > > > > > > > >> > > Zoltan Ivanfi 于2018年10月16日周二 > > > 下午4:46写道: > > > > > >> > > > > > > > >> > > > +1 (binding) > > > > > >> > > > > > > > > >> > > > Cheers, > > > > > >> > > > > > > > > >> > > > Zoltan > > > > > >> > > > > > > > > >> > > > On Tue, Oct 16, 2018 at 10:11 AM Gidon Gershinsky < > > > > > gg5...@gmail.com > > > > > >> > > > > > > >> > > > wrote: > > > > > >> > > > > > > > > >> > > > > Hello Parquet developers, > > > > > >> > > > > > > > > > >> > > > > Per the last sync discussion, it is time to call for a > > vote > > > on > > > > > the > > > > > >> > > > Parquet > > > > > >> > > > > Modular Encryption design sign-off. The design doc can > be > > > > found > > > > > at > > > > > >> > the > > > > > >> > > > > encryption branch of the parquet-format repository, > > > > > >> > > > > > > > > > >> > > > > > > > > https://github.com/apache/parquet-format/blob/encryption/Encryption.md > > > . > > > > > >> > > > > > > > > > >> > > > > The design is stable by now. This work had started 10 > > months > > > > > ago, > > > > > >> has > > > > > >> > > > been > > > > > >> > > > > extensively reviewed - and implemented (in Java, > > partially > > > in > > > > > >> C++), > > > > > >> > by a > > > > > >> > > > > number of folks from different companies. To continue > with > > > the > > > > > >> > > > > implementation pull requests, we need the design to be > > > > formally > > > > > >> > signed > > > > > >> > > > off > > > > > >> > > > > by the community. > > > > > >> > > > > > > > > > >> > > > > Cheers, Gidon > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > -- > > > > > >> > > Thanks & Best Regards > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > -- > > > > > > Ryan Blue > > > > > > Software Engineer > > > > > > Netflix > > > > > > > > > > > > > > > > > > > > > -- > > > > > Ryan Blue > > > > > Software Engineer > > > > > Netflix > > > > > > > > > > > > > > > > > > -- > > > Ryan Blue > > > Software Engineer > > > Netflix > > > > > > > > -- > Ryan Blue > Software Engineer > Netflix > -- Xinli Shang

Re: Parquet Sync

2019-04-16 Thread Xinli shang
Thanks, Julien! I will work with Lars for the rotation. On Mon, Apr 15, 2019 at 10:54 PM Julien Le Dem wrote: > It would be fine to have a rotation. > > On Mon, Apr 15, 2019 at 10:44 PM Lars Volker > wrote: > > > Hi, > > > > I'd be happy to help. I have organized a few of these in the past,

Re: New committer: Nandor Kollar

2019-06-25 Thread Xinli shang
andor Kollar to become a committer and we are pleased to announce > that > > > he > > > > has accepted. > > > > > > > > Congratulations and welcome, Nandor! > > > > > > > > Br, > > > > > > > > Zoltan > > > > > > > > > > > > > -- > > > Thanks & Best Regards > > > > > > -- Xinli Shang

Re: New committer: Fokko Driesprong

2019-06-25 Thread Xinli shang
> > Driesprong to become a committer and we are pleased to announce that > he > > > has > > > > accepted. > > > > > > > > Congratulations and welcome, Fokko! > > > > > > > > Br, > > > > > > > > Zoltan > > > > > > > > > > > > > -- > > > Thanks & Best Regards > > > > > > -- Xinli Shang

Parquet Sync - Meeting notes

2019-04-30 Thread Xinli shang
4/30/2019 Attendee: Zoltan and Several other folks(Cloudera) Brian (SaS?) Ryan Blue(Netflix) Julien(WeWorks) Wes McKinney(Ursa Labs) Gidon Gershinsky(IBM) Steven(?) Anikt(?) Deepak(?) Xinli Shang(Uber) Topics: 1. Key signing issue 1. Zoltan/Julien/Ryan: 1

Re: Parquet Sync - Meeting notes

2019-04-30 Thread Xinli shang
ryption properties. Currently, we implement its schema driven implementation, but it can be implemented in another way too. I will send out the design soon. Xinli On Tue, Apr 30, 2019 at 12:30 PM Xinli shang wrote: > 4/30/2019 > > Attendee: > > Zoltan and Several other folks(Clouder

Re: Parquet Sync

2019-04-20 Thread Xinli shang
20, 2019 at 8:04 AM Brian Bowman wrote: > Does the sync happen on Google Hangout? Could someone please provide a > link on where to sign up/connect? > > Thanks, > > Brian > > > On Apr 18, 2019, at 12:51 PM, Xinli shang > wrote: > > > > EXTERNAL >

Re: Parquet Sync

2019-04-18 Thread Xinli shang
gt; > > Also occasionally finding a good time for the meeting. > > > Any takers? This could be a rotating duty as well. > > > Thank you > > > Julien > > > > > > -- Xinli Shang

Re: Parquet Sync

2019-04-15 Thread Xinli shang
nd posting them on the list. > Also occasionally finding a good time for the meeting. > Any takers? This could be a rotating duty as well. > Thank you > Julien > -- Xinli Shang

Next sync up meeting

2019-06-28 Thread Xinli shang
in the meeting. 1. Welcome new PMCs 2. Parquet 11 release validation 3. Encryption -- Xinli Shang

Re: High level interface to Parquet encryption

2019-06-28 Thread Xinli shang
t; > https://docs.google.com/document/d/1boH6HPkG0ZhgxcaRkGk3QpZ8X_J91uXZwVGwYN45St4/edit?usp=sharing > > > I've created PARQUET-1568 > <https://issues.apache.org/jira/browse/PARQUET-1568> for this one. Both > title and description of the Jira are subject to change. The doc [

Re: [DISCUSS] Release Apache Parquet Format Version 2.7.0

2019-09-16 Thread Xinli shang
ing, and arrow git repos; they all understand the > Bloom filters as written before the most recent changes. > -- Xinli Shang

Re: Parquet Sync Meeting Notes

2019-07-17 Thread Xinli shang
Gidon pointed out that the encryption parquet-format PR is the one below only. Sorry for the confusion. https://github.com/apache/parquet-format/pull/142 On Wed, Jul 17, 2019 at 10:57 AM Xinli shang wrote: > 7/17/2019 > > Attendee: > > Ryan Blue(Netflix) > > Jame(Netflix)

Parquet Sync Meeting Notes

2019-07-17 Thread Xinli shang
7/17/2019 Attendee: Ryan Blue(Netflix) Jame(Netflix) Gidon Gershinsky(IBM) Steven(Yelp) Deepak and several other folks (Vertica) Xinli Shang(Uber) Junjie Chen Topics: 1. Column Encryption 1. Gidon: 1. C++ version code review: Have addressed all feedbacks

Re: New PMC member: Gabor Szadovszky

2019-06-28 Thread Xinli shang
or > > Szadovszky to become a member of the PMC and we are pleased to announce > > that he has accepted. > > > > Congratulations, Gabor! > > > > Br, > > > > Zoltan > > > -- Xinli Shang

Re: Parquet sync zoom - invalid meeting ID

2019-11-21 Thread Xinli shang
Can you try https://uber.zoom.us/j/142456544? On Thu, Nov 21, 2019 at 9:07 AM Gabor Szadovszky wrote: > Hi, > > Is it just me who cannot join to the meeting? It says "Invalid meeting > ID"... > > Cheers, > Gabor > -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-21 Thread Xinli shang
593f126701da4-26amp-3Bdata-3D02-257C01-257Cyumwang-2540ebay.com-257C8d588ca5855842a94bed08d7683e1221-257C46326bff992841a0baca17c16c94ea99-257C0-257C0-257C637092488114756267-26amp-3Bsdata-3DToLFrTB9lU-252FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg-253D-26amp-3Breserved-3D0=DwIFaQ=r2dcLCtU9q6n0vrtnDw9vg=FQ88AmOZ4TMjDdqNBGu-ag=CoznEc8bzT5Gkp9UNE3EwFMcEadunf3b2ewl8BcbNjI=UVepLy1MDaX4CT1EPcDESsF_lCp6B_Wf73oJw4j_xnE= > > > > > > > > > > > > > > > > > > The release tarball, signature, and checksums are here: > > > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__nam01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fdist.apache.org-252Frepos-252Fdist-252Fdev-252Fparquet-252Fapache-2Dparquet-2D1.11.0-2Drc7-26amp-3Bdata-3D02-257C01-257Cyumwang-2540ebay.com-257C8d588ca5855842a94bed08d7683e1221-257C46326bff992841a0baca17c16c94ea99-257C0-257C0-257C637092488114756267-26amp-3Bsdata-3DMPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k-253D-26amp-3Breserved-3D0=DwIFaQ=r2dcLCtU9q6n0vrtnDw9vg=FQ88AmOZ4TMjDdqNBGu-ag=CoznEc8bzT5Gkp9UNE3EwFMcEadunf3b2ewl8BcbNjI=ZwnVnpNGRVFQ3_Hw_sJoUBl6U3CCbT0-uTRMzUQiKJc= > > > > > > > > > > > > > > > > > > You can find the KEYS file here: > > > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__nam01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fapache.org-252Fdist-252Fparquet-252FKEYS-26amp-3Bdata-3D02-257C01-257Cyumwang-2540ebay.com-257C8d588ca5855842a94bed08d7683e1221-257C46326bff992841a0baca17c16c94ea99-257C0-257C0-257C637092488114756267-26amp-3Bsdata-3DIwG4MUGsP2lVzlD4bwZUEPuEAPUg-252FHXRYtxc5CQupBM-253D-26amp-3Breserved-3D0=DwIFaQ=r2dcLCtU9q6n0vrtnDw9vg=FQ88AmOZ4TMjDdqNBGu-ag=CoznEc8bzT5Gkp9UNE3EwFMcEadunf3b2ewl8BcbNjI=RA0T1Q_BTgA6gwN8EK2CBeZ0nf7340zDgEMadjjqXmQ= > > > > > > > > > > > > > > > > > > Binary artifacts are staged in Nexus here: > > > > > > > > > * > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__nam01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Frepository.apache.org-252Fcontent-252Fgroups-252Fstaging-252Forg-252Fapache-252Fparquet-252F-26amp-3Bdata-3D02-257C01-257Cyumwang-2540ebay.com-257C8d588ca5855842a94bed08d7683e1221-257C46326bff992841a0baca17c16c94ea99-257C0-257C0-257C637092488114756267-26amp-3Bsdata-3DlHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ-253D-26amp-3Breserved-3D0=DwIFaQ=r2dcLCtU9q6n0vrtnDw9vg=FQ88AmOZ4TMjDdqNBGu-ag=CoznEc8bzT5Gkp9UNE3EwFMcEadunf3b2ewl8BcbNjI=kdM7O8WCtNwj3f7wg3YHQZu2kAaBfh4QjWfG3i5b690= > > > > > > > > > > > > > > > > > > This release includes the changes listed at: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__nam01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com-252Fapache-252Fparquet-2Dmr-252Fblob-252Fapache-2Dparquet-2D1.11.0-2Drc7-252FCHANGES.md-26amp-3Bdata-3D02-257C01-257Cyumwang-2540ebay.com-257C8d588ca5855842a94bed08d7683e1221-257C46326bff992841a0baca17c16c94ea99-257C0-257C0-257C637092488114756267-26amp-3Bsdata-3D82BplI3bLAL6qArLHvVoYReZOk-252BboSP655rI8VX5Q5I-253D-26amp-3Breserved-3D0=DwIFaQ=r2dcLCtU9q6n0vrtnDw9vg=FQ88AmOZ4TMjDdqNBGu-ag=CoznEc8bzT5Gkp9UNE3EwFMcEadunf3b2ewl8BcbNjI=Pg6nebaAqfj7qh-_b_3PStcrWu-dpBVbjtY9OLp4_G4= > > > > > > > > > > > > > > > > > > Please download, verify, and test. > > > > > > > > > > > > > > > > > > Please vote in the next 72 hours. > > > > > > > > > > > > > > > > > > [ ] +1 Release this as Apache Parquet 1.11.0 > > > > > > > > > [ ] +0 > > > > > > > > > [ ] -1 Do not release this because... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Ryan Blue > > > > > Software Engineer > > > > > Netflix > > > > > > > > > > > > > > > > > > -- > > > Ryan Blue > > > Software Engineer > > > Netflix > > > > > > -- Xinli Shang

Re: Parquet sync zoom - invalid meeting ID

2019-11-21 Thread Xinli shang
can talk them in more detail in the makeup meeting. On Thu, Nov 21, 2019 at 9:15 AM Julien Le Dem wrote: > that worked, thanks! > > On Thu, Nov 21, 2019 at 9:11 AM Xinli shang > wrote: > > > Can you try > https://urldefense.proofpoint.com/v2/url?u=https-3A__uber.zoom.

Parquet Sync Meeting Notes

2019-12-04 Thread Xinli shang
production data. This is not a blocker. 3. PMC to validate the release - Gabor will send out email to ask PMCs. -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-12-05 Thread Xinli shang
I ran tests on production data that have file size 10K~200M with CRC checksum enabled and disabled as a comparison. There is no significant difference is seen. CRC Enabled Write used time: 14371 ms CRC Disabled Write used time: 14355 ms On Thu, Dec 5, 2019 at 11:09 AM Julien Le Dem wrote: > I

PARQUET-1685:Truncate Min/Max for Statistics

2019-10-28 Thread Xinli shang
t/d/1Mgb0dDXQJkgjouboDrGa9v06hWGJ0oPiwnmffXShQ_M> or in the Jira ticket PARQUET-1685 <https://issues.apache.org/jira/browse/PARQUET-1685>. -- Xinli Shang

Re: Stalebot

2019-10-23 Thread Xinli shang
ark > the > > Pull Request as stale after 60 days of inactivity, and close it after a > > week if there isn't any further activity. This will reduce the number of > > stale repositories. > > > > What do you think? > > > > Cheers, Fokko > -- Xinli Shang

Parquet sync meeting notes

2020-02-25 Thread Xinli shang
arquet-tools - PR <https://github.com/apache/parquet-mr/pull/755> is merged. Parquet-1792: Add ‘mask’ command to parquet-tools/cli - In progress Bloom Filter: - PR <https://github.com/apache/parquet-mr/pull/757> is reviewed. One last comment is under discussion. Xinli Shang

Parquet Sync Meeting Notes

2020-01-29 Thread Xinli shang
c575#diff-349956ea7b1c91fc8aabc3cf0c73f08f>) are not compatible with themselves after converting to Parquet. This causes the data cannot be read. The fix is ready and a PR will be created for review. -- Xinli Shang

Add 'prune' and 'mask' tools to Parquet-tools/cli

2020-02-16 Thread Xinli shang
+X in my testing) throughput. For large scale tables, high throughput is the key to success. Please note that this effort is not to replace data masking(obfuscation) effort (PARQUET-1376) which should be independent of this and move forward. Thanks for spending time reading! Any comments are welcome! -- Xinli Shang

Re: Spotless

2020-01-08 Thread Xinli shang
=DwIBaQ=r2dcLCtU9q6n0vrtnDw9vg=FQ88AmOZ4TMjDdqNBGu-ag=67nJPy6EMPJ0n6vvmvA69XKi8sc_o0JDgsVKKm54yh0=_FCuNEszM8Q4yJh3U-uzwvUywfvYSJtrxv4fXcLtcuc= > > > > > > WDYT? > > > > > > Cheers, Fokko > > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > > -- Xinli Shang

Parquet sync meeting notes - 4/28/2020

2020-04-28 Thread Xinli shang
-1774 <https://issues.apache.org/jira/browse/PARQUET-1774>: There is a finding in Spark. Gabor is waiting for the Spark community to verify the fix of Parquet-1174. No response from Spark yet. Xinli Shang -- Xinli Shang

Re: Fw: High level interface to Parquet encryption

2020-05-11 Thread Xinli shang
nforms that the Schema-driven design doc is ready too, and a link > will be sent soon. > > > All feedback from the community will be appreciated. > > Cheers, Gidon. > -- Xinli Shang

Parquet sync meeting notes

2020-03-24 Thread Xinli shang
ed compression method. - Concern from the community: We already have some issues between parquet-mr and parquet-cpp as an example. If we open the door for users to customize their own compression, then more compatibility issues could happen. -- Xinli Shang

ZSTD-JNI

2020-05-20 Thread Xinli shang
/luben/zstd-jni>? It is welcome to share any feedback on using this JNI. BTW, I am also trying out the AirCompressor <https://github.com/airlift/aircompressor> approach, but it seems the ZSTD compression level is not adjustable. -- Xinli Shang

Parquet sync meeting notes - 9/22/2020

2020-09-22 Thread Xinli shang
ulie will try to find someone. 2. Use a flag in the code to make it optional to reduce the risk. Please let me know if you have any questions. Xinli Shang | Tech Lead Manager @ Uber Data Infra -- Xinli Shang

Today's Parquet sync Zoom meeting

2020-05-26 Thread Xinli shang
ing ID: 352 377 8975 Password: 030115 -- Xinli Shang

Parquet sync meeting notes

2020-06-23 Thread Xinli shang
uet 1.11.1 release. a. Regression is found and the PR is 798 <https://github.com/apache/parquet-mr/pull/798>. We need people who are familiar with avro to have a look. -- Xinli Shang

Meeting notes for Parquet sync (July 2020)

2020-07-28 Thread Xinli shang
Attendees: Gidon, Gabor, Fokko, Xu, Sri, Xinli 1. Column Encryption 1. PR 800 - This is to merge to master and it is being reviewed. 1. One comment is about CRC. Since the encryption algorithm AES-GCM

Parquet sync meeting 11/24/2020

2020-11-24 Thread Xinli shang
2 API. Let’s bring other PMS/commuters to discuss for the next community meeting. 3. Parquet 1.12.0 a. Will cut RC release soon Please let me know if you have any questions. Xinli Shang | Tech Lead Manager @ Uber Data Infra -- Xinli Shang

Re: Could someone add me to the sync invite?

2020-12-23 Thread Xinli shang
members could justify keeping this code > > > around. > > > > > > > > > > > > https://mail-archives.apache.org/mod_mbox/parquet-dev/202010.mbox/%3CCAFGcCdUSDTRVfFNUJXh_tL-RPjxkQCZpEnr_s-sow8rPMg56rg%40mail.gmail.com%3E > > > > > > Jason Altekruse > > > > > > -- Xinli Shang

Parquet sync meeting notes - Oct 2020

2020-10-27 Thread Xinli shang
ant to adopt column encryption which is only available in Parquet 12. 2. Parquet 1.12.0 will be on Avro 1.10. Need to know if this is a problem. Chao will try it out. Please let me know if you have any questions. Xinli Shang | Tech Lead Manager @ Uber Data Infra

Parquet sync meeting notes Jan 26, 2021

2021-01-26 Thread Xinli shang
Hi all, Attendees: Xinli Shang, Gábor Szádovszky, Gidon Gershinsky, Jason Alterkruse Topics: 1. Parquet column encryption 1. The ported code in Presto will be open-sourced soon. 2. Crypto factory could be changed when new use cases come in(Gidon). 2

Parquet sync meeting May 2021

2021-05-25 Thread Xinli shang
5/25/2021 Attendees: Xinli Shang, Gábor Szádovszky, Gidon Gershinsky 1. Parquet 1.12.0 post release issues: 1. Release 1.12.1? Let's wait a bit since the testing and integration is still going on. Better to have more fixes for the release. 2. Two integer

Re: Decouple parquet-mr compression API from hadoop compression API

2021-06-30 Thread Xinli shang
> it I'm happy to review the related PRs. > > > > Cheers, > > Gabor > > > > On Thu, Jun 3, 2021 at 4:18 AM Dong, Xin wrote: > > > > > Hi, All, > > > Currently parquet-mr compression logic is using Hadoop compression > > > API which makes parquet-mr compression highly coupled with Hadoop. > > > Does community have any plan to decouple those two APIs? To make the > > > things easier, maybe we can just using api similar to Hadoop > > > compression APIs but belongs to parquet-mr namespace. And simply > > > change current codec to implements the new parquet-mr API. Any > thoughts? > > > Thanks, > > > Xin Dong > > > > > > > > > -- Xinli Shang

Parquet sync meeting notes - April 2021

2021-04-27 Thread Xinli shang
4/27/2021 Attendees: Xinli Shang, Gábor Szádovszky, Gidon Gershinsky, Xin Dong, Ryan Blue, Q. Xie 1. Parquet 1.12.0 post release issues: 1. Parquet-2027 - Page offset issue. Fixed. 2. Parquet-2026 - Not allowing empty file anymore 1. What

Reminder for today's Parquet sync meeting 9:00am PDT

2021-03-23 Thread Xinli shang
Hi all, This is the meeting information below. https://uber.zoom.us/j/3523778975 password is required Meeting ID: 352 377 8975 Password: 030115 -- Xinli Shang

Re: [RESULT] Release Apache Parquet 1.12.0 RC4

2021-03-25 Thread Xinli shang
dependencies for a local application that > > > streams > > > > data into protobuf-parquet files > > > > - confirmed data is correct and can be read with parquet-tools > compiled > > > > from parquet 1.11.1 > > > > > > > >

Parquet meeting sync notes 3/23/21

2021-03-23 Thread Xinli shang
3/23/2021 Attendees: Xinli Shang, Gábor Szádovszky, Gidon Gershinsky 1. Parquet 1.12.0 - RC4 1. Avro version upgrading issue. 1. The Parquet application can upgrade by itself if needed. It is a somewhat rush to include it. So we won’t upgrade

Re: [VOTE] Release Apache Parquet 1.12.0 RC3

2021-03-12 Thread Xinli shang
n and Parquet > Bloom Filter. See details at: > * > > https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0-rc3/CHANGES.md > > Please download, verify, and test. > > Please vote in the next 72 hours. > > [ ] +1 Release this as Apache Parquet 1.12.0 > [ ] +0 > [ ] -1 Do not release this because... > -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.12.0 RC4

2021-03-18 Thread Xinli shang
he next 72 hours. > > [ ] +1 Release this as Apache Parquet 1.12.0 > [ ] +0 > [ ] -1 Do not release this because... > -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.12.0 RC4

2021-03-22 Thread Xinli shang
s are staged in Nexus here: > > > * > > https://repository.apache.org/content/groups/staging/org/apache/parquet/ > > > > > > This release includes the features Parquet Modular Encryption and > Parquet > > > Bloom Filter. See details at: > > > * > > > > > > > > > https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0-rc4/CHANGES.md > > > > > > Please download, verify, and test. > > > > > > Please vote in the next 72 hours. > > > > > > [ ] +1 Release this as Apache Parquet 1.12.0 > > > [ ] +0 > > > [ ] -1 Do not release this because... > > > > > > -- Xinli Shang

Re: Re: [VOTE] Release Apache Parquet 1.12.0 RC3

2021-03-15 Thread Xinli shang
Thanks DB! In that case, I would like to change my vote to +1. On Sat, Mar 13, 2021 at 11:14 PM DB Tsai wrote: > This can be a couple lines fix in Iceberg side which we had and deployed > in our env, so it is not necessary to fail this vote. > > On 2021/03/13 01:03:04 Xinli shang

Parquet community sync meeting notes 2/23/2021

2021-02-23 Thread Xinli shang
Hi all, These are the meeting notes from today's community meeting. Date: 2/23/2021 Attendees: Xinli Shang, Gábor Szádovszky, Gidon Gershinsky, Ryan Blue 1. Iceberg and Parquet 1. Column ID v.s name 1. Column resolution: Parquet relies on the name, while

Re: [Announce] new committer: Gidon Gershinsky

2021-04-08 Thread Xinli shang
5:10 AM Nándor Kollár > > wrote: > > > > > > > > > Congrats Gidon! > > > > > > > > > > On 2021/04/07 11:55:45, Gabor Szadovszky wrote: > > > > > > The Project Management Committee (PMC) for Apache Parquet > > > > > > has invited Gidon Gershinsky to become a committer and we are > > pleased > > > > > > to announce that he has accepted. > > > > > > > > > > > > Welcome Gidon! > > > > > > > > > > > > > > > > > > > > > -- Xinli Shang

Parquet sync meeting notes - Aug 2021

2021-08-24 Thread Xinli shang
8/24/2021 Attendees: Xinli Shang, Gábor Szádovszky, Huaxin Gao 1. FilterAPI PR 1. Will be reviewed soon. 2. It will be in a minor release 2. High throughput column encryption rewriter 1. PR to be created (tests are done and look promising

Reminder for Parquet sync meeting tomorrow

2021-08-23 Thread Xinli shang
277=AOvVaw0yiaKJK3mhLz4zsbOmvfrQ> Meeting ID: 352 377 8975 Password: 030115 -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-15 Thread Xinli shang
thing passed > > Thanks Xinli and all for contributing to this release! > > Cheers, Gidon > > > On Wed, Sep 15, 2021 at 6:53 AM Xinli shang > wrote: > > > The vote to release 1.12.1 RC1 as Apache Parquet MR 1.12.1 is PASSED with > > the required three +1 binding votes

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Xinli shang
re staged in Nexus here: > > > > > > * > > https://repository.apache.org/content/groups/staging/org/apache/parquet/ > > > > > > > > > This release includes important changes listed > > > https://github.com/apache/parquet-mr/blob/parquet-1.12.x/CHANGES.md > > > > > > > > > Please download, verify, and test. > > > > > > > > > Please vote in the next 72 hours. > > > > > > > > > [ ] +1 Release this as Apache Parquet 1.12.1 > > > > > > [ ] +0 > > > > > > [ ] -1 Do not release this because... > > > > > > -- > > > Xinli Shang | Tech Lead Manager @ Uber Data Infra > > > > > > -- Xinli Shang

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Xinli shang
Julien Le Dem wrote: > +1 (binding) > I verified the signature > the build and tests pass (with java 8) > > On Tue, Sep 14, 2021 at 4:14 PM Xinli shang > wrote: > > > I also vote +1 (binding). Thanks everybody for verifying! > > > > On Tue, Sep 14, 2021 at 2:0

Re: [VOTE] Release Apache Parquet 1.12.1 RC0

2021-09-13 Thread Xinli shang
ps://github.com/apache/spark/pull/33969 > > Thanks, > Chao > > On Mon, Sep 13, 2021 at 8:54 AM Xinli shang > wrote: > > > Hi Gabor, > > > > Since this is a bug fix release, I just pulled the fixes if they seem not > > breaking. If you feel PARQUET-2043 is r

[VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-13 Thread Xinli shang
e next 72 hours. [ ] +1 Release this as Apache Parquet 1.12.1 [ ] +0 [ ] -1 Do not release this because... -- Xinli Shang | Tech Lead Manager @ Uber Data Infra

[VOTE] Release Apache Parquet 1.12.1 RC0

2021-09-11 Thread Xinli shang
ext 72 hours. [ ] +1 Release this as Apache Parquet 1.12.1 [ ] +0 [ ] -1 Do not release this because... -- Xinli Shang

Meeting notes of parquet-sync 9/28/2021

2021-09-28 Thread Xinli shang
9/27/2021 Attendees: Xinli Shang, Gábor Szádovszky, Gidon Gershinsky 1. 1.12.1 release 1. Done 2. PARQUET-2094 1. Done 3. FilterAPI 1. Will be done soon. 4. High throughput column encryption rewriter 1. The review

Re: Parquet-tools Replacement

2022-01-04 Thread Xinli shang
s.apache.org/jira/browse/PARQUET-1666 too. Is there a > recommended replacement for parquet-tools? If so, could someone point me to > it? Thanks! > > Thanks, > Vinoo Ganesh | vinoo.gan...@gmail.com > > > -- Xinli Shang

Re: Parquet-tools Replacement

2022-01-04 Thread Xinli shang
. > > Thanks, > Vinoo Ganesh | vinoo.gan...@gmail.com > > > > > On Tue, Jan 4, 2022 at 12:49 PM Xinli shang > wrote: > > > Hi Vinoo, > > > > Thanks for bringing this up! Yes, they are deprecated. The recommended > > replacement is to use Parque

Review of the ASF Board Report for Parquet

2022-01-05 Thread Xinli shang
opened in JIRA, past quarter (-75% change) 11 issues closed in JIRA, past quarter (-45% change) 7 commits in the past quarter (-85% change) 7 code contributors in the past quarter (-53% change) 11 PRs opened on GitHub, past quarter (-47% change) -- Xinli Shang

Reminder of tomorrow's Parquet sync meeting

2021-11-23 Thread Xinli shang
Meeting ID: 352 377 8975 Password: 030115 -- Xinli Shang

Parquet community meeting notes

2021-11-24 Thread Xinli shang
Hi all, These are the meeting notes for today's meeting. 11/24/2021 Attendees: Xinli Shang, Gábor Szádovszky, Gidon Gershinsky 1. 1.12.2 release 1. done 2. Filter API 1. done 3. High throughput column encryption rewriter 1. done

[ANNOUNCEMENT] Gidon Gershinsky as Apache Parquet PMC

2021-11-24 Thread Xinli shang
Hi all, The Project Management Committee (PMC) for Apache Parquet has invited Gidon Gershinsky to become a PMC member and we are pleased to announce that he has accepted. Congratulations and welcome, Gidon! -- Xinli Shang

About the security issue of log4j for Parquet

2021-12-12 Thread Xinli shang
Hi all, Most of you must have known of the severe security issue( https://www.randori.com/blog/cve-2021-44228) in log4j. I just want to have a short update that Paquet doesn't have a dependency on the log4j versions that are impacted. Have a good weekend! -- Xinli Shang

Re: Parquet Column Resolution by ID

2022-02-11 Thread Xinli shang
fD-MUZz8Iq4V9FXrr1WPsw/edit?usp=sharing > > >. > > We'd like to start a discussion on the doc and any feedback is welcome! > > > > Thanks, > > Huaxin > > > -- Xinli Shang

Two blogs about Apache Parquet were just published on the Uber EngBlog site

2022-03-11 Thread Xinli shang
we have done with the community in the last 3 years around Parquet Modular Encryption. I would like to thank Gidon for his continuous collaborations with us! If you have any questions about the blog, feel free to reach out! Xinli Shang Tech Lead Manager at Uber Data Infra VP Apache Parquet PMC Chair

Re: Meeting notes for Parquet sync meeting - March 1st. 2022

2022-03-11 Thread Xinli shang
-MR? Also will it be a major/minor/or a > patch release. > > Thanks and Regards > Prakhar Jain > > > On Tue, Mar 1, 2022 at 9:34 AM Xinli shang > wrote: > > > 3/1/2022 > > > > Attendees: Xinli Shang, Gidon Gershinsky, Vinoo Ganesh > > > >1. >

Please review the design of Parquet-2116: Cell Level Encryption

2022-03-12 Thread Xinli shang
oyw5u0ywe> discussion. Any feedback is welcome! Feel free to make comment on the document directly. Thanks. -- Xinli Shang

Re: Get uncompressed size of parquet file via parquet-cli

2022-02-20 Thread Xinli shang
> [1] > https://github.com/apache/parquet-mr/blob/master/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ParquetMetadataCommand.java#L123 > -- > Thanks & Regards > Deepak Gangwar > > -- Xinli Shang

Meeting notes for Parquet sync meeting - March 1st. 2022

2022-03-01 Thread Xinli shang
3/1/2022 Attendees: Xinli Shang, Gidon Gershinsky, Vinoo Ganesh 1. The new website of Apache Parquet is to be launched 1. https://www.vinoo.io/ 2. Vinoo to send out an email to dev@ for a preview 2. Cell level encryption 1. Objective/Goals need

Parquet sync meeting notes 3/23/2022

2022-03-23 Thread Xinli shang
rquet writer for Iceberg (Adding a new constructor) 1. A diff will be sent out soon 4. New website (link <https://parquet.staged.apache.org/>) 1. Looks good, will make it formal -- Xinli Shang VP Apache Parquet PMC Chair, Tech Lead Manager at Uber Data Infra

Re: Parquet Website Launched

2022-03-25 Thread Xinli shang
let me know if you have any feedback or feature requests. > > Thanks, > Vinoo Ganesh | vinoo.gan...@gmail.com > > > -- Xinli Shang

ASF Board Report Draft Review

2022-03-31 Thread Xinli shang
PRs opened on GitHub, past quarter (190% increase) 29 PRs closed on GitHub, past quarter (163% increase) dev@parquet.apache.org had a 65% decrease in traffic in the past quarter -- Xinli Shang

Look for protobuf reviewers for PR-900

2022-03-20 Thread Xinli shang
Hi all, We have a PR <https://github.com/apache/parquet-mr/pull/900> related to Protobuf pending review. We are looking for people who are familiar with Protobbuf to review the change. If you can help, please review. Thanks. -- Xinli Shang

Re: Parquet sync meeting notes - 1/26/2022

2022-01-27 Thread Xinli shang
Here <https://docs.google.com/document/d/1Q-d98Os_aJahUynznPrWvXwWQeN0aFDRhZj3hXt_JOM> is the link for the Cell-Level encryption pre-design. Feel free to share the feedback in the file directly by adding comments. On Wed, Jan 26, 2022 at 9:51 AM Xinli shang wrote: > 1/26/2022 >

Re: To be a Parquet contributor

2022-01-21 Thread Xinli shang
> really interested in Parquet and I would like to join our Parquet > community, could you help pull me into our community, such as inviting to > the channel or meetings etc? > > -- > Thanks, > Jiashen > -- Xinli Shang

Parquet sync meeting notes - 1/26/2022

2022-01-26 Thread Xinli shang
1/26/2022 Attendees: Xinli Shang, Gidon Gershinsky, Pavi Subenderan, Jason Zhang 1. Data masking 1. Pavi: Will create a PR by next week 2. PARQUET-2062 <https://issues.apache.org/jira/browse/PARQUET-2062> 3. Will have a high-level design sent ou

Re: [VOTE][FORMAT] Add repetition, definition and variable length size metadata statistics

2023-11-06 Thread Xinli shang
+1 (binding) On Mon, Nov 6, 2023 at 4:56 PM Gang Wu wrote: > +1 (non-binding) > > Best, > Gang > > On Tue, Nov 7, 2023 at 3:57 AM Ed Seidl wrote: > > > +1 (non-binding) > > > > Thanks! > > Ed > > > -- Xinli Shang

Re: Drop parquet-thrift

2023-10-03 Thread Xinli shang
.com/apache/parquet-mr/pull/1156>. Therefore it is hard > > to > > > test if we break anything. > > > > > > It looks like parquet-thrift is not used by anyone anymore > > > <https://mvnrepository.com/artifact/org.apache.parquet/parquet-thrift > >. > > I > > > would suggest removing the module from the repository > > > <https://github.com/apache/parquet-mr/pull/1158> unless anyone > objects. > > > > > > Kind regards, Fokko > > > > > > -- Xinli Shang

Re: [VOTE][Format] Add Float16 type to specification

2023-10-05 Thread Xinli shang
> > This vote will be open for at least 72 hours. > > > > [ ] +1 Add this type to the format specification > > [ ] +0 > > [ ] -1 Do not add this type to the format specification because... > > > > Thanks! > > > > Ben > > > > [1]: https://en.wikipedia.org/wiki/Half-precision_floating-point_format > > > > > > > > -- Xinli Shang

Re: [Request] Send automated notifications to a separate mailing-list

2023-08-27 Thread Xinli shang
mailing-list > > > > Best, > > Gang > > > > On Tue, Aug 22, 2023 at 8:49 AM Xinli shang > wrote: > > > >> It is a good idea. Thank Antonie for the proposal. > >> > >> On Tue, Aug 22, 2023 at 2:03 AM Julien Le Dem >

Re: [Request] Send automated notifications to a separate mailing-list

2023-08-21 Thread Xinli shang
; > For the record, we did this move in Apache Arrow and never came back. > > > > Thanks in advance > > > > Antoine. > > > > > > > -- Xinli Shang

Re: AvroParquetWriter write to s3

2022-05-15 Thread Xinli shang
ob/99fe75a823d4b02f4e90fa0dda06a1558d5617a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/SkipMd5CheckStrategy.java#L42 > > The issue is that I do not find a proper way to inject such configurations > into AvroParquetWriter. Is this possible? If yes, can you help to show how > to do it? > > Thanks > > Regin > -- Xinli Shang

[VOTE] Release Apache Parquet 1.12.3 RC1

2022-05-20 Thread Xinli shang
ext 72 hours. [ ] +1 Release this as Apache Parquet 1.12.3 [ ] +0 [ ] -1 Do not release this because... Xinli Shang PMC Chair of Apache Parquet TLM Uber Data Infra

Re: [VOTE] Release Apache Parquet 1.12.3 RC1

2022-05-26 Thread Xinli shang
Thank Julien, Gidon, and Yuming for verifying and voting! The vote passed! I will move forward with the next steps. On Wed, May 25, 2022 at 9:29 PM Julien Le Dem wrote: > +1 > Verified signatures and tested > > On Mon, May 23, 2022 at 4:23 PM Xinli shang > wrote: >

Re: [VOTE] Release Apache Parquet 1.12.3 RC1

2022-05-23 Thread Xinli shang
Parquet 1.12.3 RC1 > External Email > > +1. Downloaded, verified and tested. > > Cheers, Gidon > > > On Fri, May 20, 2022 at 8:49 PM Xinli shang > wrote: > > > Hi everyone, > > > > > > I propose the following RC to be released as th

Re: Review of Q2 Parquet report

2022-07-05 Thread Xinli shang
Thanks Gidon for pointing it out! On Tue, Jul 5, 2022 at 12:59 PM Gidon Gershinsky wrote: > nit: MR-1.12.3 released on 202*2*-05-26. > > Cheers, Gidon > > > On Tue, Jul 5, 2022 at 6:04 PM Xinli shang > wrote: > > > Hi all, > > > > The report below

Review of Q2 Parquet report

2022-07-05 Thread Xinli shang
, past quarter (-20% change) 17 PRs closed on GitHub, past quarter (-43% change -- Xinli Shang

Meeting notes for Parquet monthly sync - 5/24/2022

2022-05-24 Thread Xinli shang
Hi all, This is the meeting notes for today's Parquet sync meeting. We just had a short one as everybody is busy now. We mainly focus on release now. Attendees (Timothy Miller(theo...@amazon.com), Gidon Gershinsky , Xinli Shang) Release 1.12.3 - In progress, email was sent out, waiting for 1

Parquet Sync meeting - July 26 2022

2022-07-26 Thread Xinli shang
Attendees ( Gidon Gershinsky, Xinli Shang, Tim Miller) 1. Release 1.12.3 1. Post release - no issue reported. 2. Parquet Cell-level encryption a. What if the user only partially has the keys but not all the hidden columns? Should we throw

Meeting notes for Parquet monthly sync - 4/27/2022

2022-04-27 Thread Xinli shang
4/27/2022 Attendees (Timothy Miller, Vinoo Ganesh, Satish K, Gidon Gershinsky, Xinli Shang, Huaxin Gao) 1. Cell-Level encryption 1. Internal implementation and rollout 2. Welcome new comments 2. Release 1.12.3 1. SNAPSHOT release - Gidon

Re: Interest in adding the float16 logical type to the Parquet spec

2022-08-24 Thread Xinli shang
popular: > https://en.wikipedia.org/wiki/Half-precision_floating-point_format ; I do > think that a demand exists for its support. I am new to the project, but am > happy to contribute development time if there is support for this feature, > and guidance. > > Warm regards, > > Anja > -- Xinli Shang

Parquet community sync meeting notes - 9/27/2022

2022-09-27 Thread Xinli shang
9/27/2022 Attendees ( Gidon Gershinsky, Xinli Shang, Tim Miller, Jiasheng Zhang) 1. Parquet Cell-level encryption 1. Will open PRs after delivering it internally 2. Parquet-2069 <https://github.com/apache/parquet-mr/pull/957>: Fix some Avro schema issues, in g

Re: Vectored IO in Parquet ( https://issues.apache.org/jira/browse/PARQUET-2171)

2022-10-08 Thread Xinli shang
menting, testing and > releasing this feature in the best possible way. > > I will be talking about all these in the upcoming Apache Conference NA next > week Tuesday, October 04, 4:10 PM CDT. It would be really great to meet > anyone who would be interested in getting involved in this. > > > > Thanks, > Mukund > -- Xinli Shang

Parquet sync meeting notes 1/24/2023

2023-01-24 Thread Xinli shang
Attendees:( Gidon Gershinsky, Xinli Shang, Tim Miller, Vinoo) 1. Release new version 1. ZSTD stream closure bug fixes and a few other fixes are blocking issues. 2. PRs: 1. Parquet-2069 <https://github.com/apache/parquet-mr/pull/957>: Fix som

Gang Wu as new Apache Parquet committer

2023-02-27 Thread Xinli shang
The Project Management Committee (PMC) for Apache Parquet has invited Gang Wu (gangwu) to become a committer and we are pleased to announce that he has accepted. Congratulations and welcome, Gang! -- Xinli Shang

Re: [DISCUSS] Release of Apache Parquet 1.13.1

2023-04-25 Thread Xinli shang
? The new version 1.13.0 is just released and I am not sure if there are more issues coming so that we can put together the fixes into 1.13.1. Is Iceberg urgently blocked on this? Xinli Shang On Tue, Apr 25, 2023 at 6:51 PM Gang Wu wrote: > That sounds good to me. > > I have just

  1   2   3   >