Re: [ANNOUNCE] New Arrow PMC member: L. C. Hsieh

2022-09-04 Thread Gidon Gershinsky
Congrats Liang-Chi!! Cheers, Gidon On Sun, Sep 4, 2022 at 7:37 AM Micah Kornfield wrote: > Congrats! > > On Sat, Sep 3, 2022 at 8:19 PM QP Hou wrote: > > > Congrats Liang-Chi! > > > > On Sat, Sep 3, 2022 at 8:25 PM Remzi Yang <1371656737...@gmail.com> > wrote: > > > > > Congratulation

Re: [ANNOUNCE] New Arrow committer: Liang-Chi Hsieh

2022-04-27 Thread Gidon Gershinsky
Congrats Liang-Chi! Cheers, Gidon On Thu, Apr 28, 2022 at 4:17 AM Yang hao <1371656737...@gmail.com> wrote: > Congratulations Liang-Chi! > > From: Weston Pace > Date: Thursday, April 28, 2022 at 05:19 > To: dev@arrow.apache.org > Subject: Re: [ANNOUNCE] New Arrow committer: Liang-Chi Hsieh >

Fwd: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-03-09 Thread Gidon Gershinsky
Hi Antoine, All comments have been handled. Can we ask you to shepherd this PR for the reminder of its lifecycle? (hopefully, most of this is already behind us). https://github.com/apache/arrow/pull/8023 Cheers, Gidon -- Forwarded message - From: Gidon Gershinsky Date: Thu

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Gidon Gershinsky
> Also, FTR, a standalone LRU cache class is proposed here, which may > reduce the amount of original code in the Parquet encryption PR: > https://github.com/apache/arrow/pull/8716 > > Best regards > > Antoine. > > > Le 18/02/2021 à 16:40, Gidon Gershinsky a écrit : >

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Gidon Gershinsky
tructures. > > > I seem to recall the debate was how to model some class interactions to > determine what should be considered shared structures and what should not. > > On Wed, Feb 17, 2021 at 9:52 AM Gidon Gershinsky wrote: > > > This certainly sounds good to me. >

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
"main thread" comes from, but it > probably shouldn't exist in a C++ library. > > Regards > > Antoine. > > > > Le 17/02/2021 à 18:34, Gidon Gershinsky a écrit : > > Just to clarify. There are two options, which one do you refer to? A > design > > wi

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
be needed). Cheers, Gidon On Wed, Feb 17, 2021 at 2:40 PM Antoine Pitrou wrote: > > > Le 17/02/2021 à 12:47, Gidon Gershinsky a écrit : > > From the doc, > > "To maintain consistency with the style of parquet-cpp, the above > > structures should not be explici

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
e comments might be conflicting. One of the concerns > >> (that I would need to refresh myself on to offer an opinion which was > >> covered in Ben's doc) was the threading model we expect in the library. > >> > >> On Tue, Feb 16, 2021 at 8:

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
2021 at 8:03 AM Antoine Pitrou wrote: > > > > > Hi Gidon, > > > > Le 16/02/2021 à 16:42, Gidon Gershinsky a écrit : > > > Regarding the high-level layer, I think it waits for a progress at > > > > > > https://docs.google.com/document/d/11qz84ajys

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
proposals in this googledoc. Once their status is clarified, I hope Tham will be able to resume addressing the comments (I'll help with some of them if needed). Cheers, Gidon On Tue, Feb 16, 2021 at 6:03 PM Antoine Pitrou wrote: > > Hi Gidon, > > Le 16/02/2021 à 16:42, Gidon Gershi

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
d some > > benefit in having the Pyarrow API expose low-level Parquet encryption. > Then > > again, it might only be this one company and no one else cares. > > > > The arguments against, per Gidon Gershinsky: > > > > > * security: low-level encryption API

Re: [DISCUSS] Alternative design for KMS interaction in parquet-cpp

2020-11-16 Thread Gidon Gershinsky
er tasks are launched. > After we assemble this ahead-of-time set of keys it will not change during > the course of a read, so the > DecryptionKeyRetriever can safely access it without mutexes. I've added a > comment to the doc > > On Fri, Nov 13, 2020 at 3:09 AM Gidon Gershi

Re: [DISCUSS] Alternative design for KMS interaction in parquet-cpp

2020-11-13 Thread Gidon Gershinsky
Hi all, Glad to see the parquet-cpp progress on this! Can I suggest creating a googledoc for the technical discussion? The current md doc format seems to be harder for pinpointed comments. I got a few, but they are too minor for sending to the two mailing lists. Cheers, Gidon On Fri, Nov 13,

Re: Adding Parquet encryption support to PyArrow

2020-09-09 Thread Gidon Gershinsky
Thanks guys. I'll go over the intro sections to merge/streamline the text there. I've added a "commenter" access for all, so everybody could take part in the doc's discussion threads. For edit access, please contact Itamar (by pressing the request button). Cheers, Gidon On Wed, Sep 9, 2020 at

Re: Adding Parquet encryption support to PyArrow

2020-09-06 Thread Gidon Gershinsky
> > > Regards > > > > Antoine. > > > > > > Le 03/09/2020 à 22:31, Gidon Gershinsky a écrit : > > > Why would the low level API be exposed directly.. This will break the > > > interop between the two analytic ecosystems down the road. >

Re: Adding Parquet encryption support to PyArrow

2020-09-06 Thread Gidon Gershinsky
levels > are, and to what usage they correspond. > Is Parquet encryption used only with that Spark? While Spark > interoperability is important, Parquet files are more ubiquitous than that. > > Regards > > Antoine. > > > Le 03/09/2020 à 22:31, Gidon Gershinsky a écr

Re: Adding Parquet encryption support to PyArrow

2020-09-04 Thread Gidon Gershinsky
; Is Parquet encryption used only with that Spark? While Spark > interoperability is important, Parquet files are more ubiquitous than that. > > Regards > > Antoine. > > > Le 03/09/2020 à 22:31, Gidon Gershinsky a écrit : > > Why would the low level API be exposed directly

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky
Why would the low level API be exposed directly.. This will break the interop between the two analytic ecosystems down the road. Again, let me suggest leveraging the high level interface, based on the PropertiesDrivenCryptoFactory. It should address your technical requirements; if it doesn't, we

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky
Hi Antoine, Sounds good to me. This PR is already being actively reviewed, and it'd be good to have Itamar's assessment. Cheers, Gidon On Thu, Sep 3, 2020 at 6:01 PM Antoine Pitrou wrote: > > Hi Gidon, > > Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit : > > Hi

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky
Hi Itamar, My suggestion would be wrap a different API in Python - the high-level encryption interface of https://github.com/apache/arrow/pull/8023 This will enable interoperability with Apache Spark (and other frameworks), where we don't expose the low level parquet encryption API. If such a

Re: Property-driven Parquet encryption

2020-07-12 Thread Gidon Gershinsky
Hi Micah, Thanks for your comments here, and at the design googledoc. We'll get started, we've got the input we were looking for. Cheers, Gidon

Fwd: Property-driven Parquet encryption

2020-07-10 Thread Gidon Gershinsky
Sorry, Micah, and thanks again. Cheers, Gidon -- Forwarded message - From: Gidon Gershinsky Date: Fri, Jul 10, 2020 at 10:41 AM Subject: Re: Property-driven Parquet encryption To: dev , Hi Michah, Thanks! I was hoping for community feedback, it's better to discuss

Re: Property-driven Parquet encryption

2020-07-10 Thread Gidon Gershinsky
sure I understand. By column key metadata, do you mean the column_keys parameter? Cheers, Gidon > > > On Wed, Jul 8, 2020 at 11:06 PM Gidon Gershinsky wrote: > > > Ok, so we had a look with Tham at the current pyarrow and parquet-cpp > > configuration objects.

Fwd: Property-driven Parquet encryption

2020-07-09 Thread Gidon Gershinsky
; }; Cheers, Gidon -- Forwarded message - From: Gidon Gershinsky Date: Tue, Jul 7, 2020 at 9:35 AM Subject: Property-driven Parquet encryption To: dev Cc: tham Hi all, We are working on the Parquet modular encryption, and are currently adding a high-level interface that allows

Property-driven Parquet encryption

2020-07-07 Thread Gidon Gershinsky
Hi all, We are working on the Parquet modular encryption, and are currently adding a high-level interface that allows to encrypt/decrypt parquet files via properties only (without calling the low level API). In the spark/parquet-mr domain, we're using the Hadoop configuration properties for that

[jira] [Created] (ARROW-8018) Parquet Modular Encryption in parquet-cpp

2020-03-05 Thread Gidon Gershinsky (Jira)
Gidon Gershinsky created ARROW-8018: --- Summary: Parquet Modular Encryption in parquet-cpp Key: ARROW-8018 URL: https://issues.apache.org/jira/browse/ARROW-8018 Project: Apache Arrow Issue

Re: Merged C++ Parquet Encryption implementation PARQUET-1300

2019-11-08 Thread Gidon Gershinsky
Wes, Thank you for reviewing and merging this project. Regarding the note - we'll have interop testers in parquet-mr, so that cpp-written files, encrypted in various modes, would be tested by java readers - and vice versa. These manual tests could be run during development and ahead of releases.