[jira] [Created] (PARQUET-2062) Data masking(null) for column encryption

2021-06-30 Thread Xinli Shang (Jira)
Xinli Shang created PARQUET-2062:


 Summary: Data masking(null) for column encryption 
 Key: PARQUET-2062
 URL: https://issues.apache.org/jira/browse/PARQUET-2062
 Project: Parquet
  Issue Type: New Feature
  Components: parquet-mr
Reporter: Xinli Shang
Assignee: Xinli Shang


When user doesn't have permisson on a column that are encrypted by the column 
encryption feature (parquet-1178), returning a masked value could avoid an 
exception and let the call succeed. 

We would like to introduce the data masking with null values. The idea is when 
the user gets key access denied and the user can accept null(via a reading 
option flag), we would return null for the encrypted columns. This solution 
doesn't need to save extra columns for masked value and doesn't need to 
translate existing data. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Decouple parquet-mr compression API from hadoop compression API

2021-06-30 Thread Xinli shang
Thanks for working on this! Overall this is a good initiative to move away
from Hadoop. I just left some comments on the doc.

On Tue, Jun 29, 2021 at 2:02 AM Xu, Cheng A  wrote:

>
> https://docs.google.com/document/d/1Ki14QAj1TP8u-SXk-PiKsdVskDAH-IVGWulYTiK54SM/edit
> Doc is uploaded on Google Doc FYI.
>
> Thanks
> Cheng Xu
>
> -Original Message-
> From: Gabor Szadovszky 
> Sent: Friday, June 25, 2021 4:05 PM
> To: Parquet Dev 
> Subject: Re: Decouple parquet-mr compression API from hadoop compression
> API
>
> Hi XinDong,
>
> Could you please put it up to google docs so anyone in the community can
> comment in it?
>
> Cheers,
> Gabor
>
> On Fri, Jun 25, 2021 at 9:47 AM Dong, Xin  wrote:
>
> >  Hi, Gabor and Xinli,
> >
> >  We worked out a proposal for Parquet-mr pluggable compression framework.
> > It includes the work to decouple parquet-mr compression API from
> > Hadoop compression API. Please take a look and any comments are welcome.
> >
> > Thanks,
> > XinDong
> >
> > -Original Message-
> > From: Dong, Xin
> > Sent: Thursday, June 3, 2021 4:02 PM
> > To: dev@parquet.apache.org
> > Subject: RE: Decouple parquet-mr compression API from hadoop
> > compression API
> >
> > Hi Gabor,
> >
> > Thank you very much for the quick reply. We are interested in
> > separating parquet-mr compression API from Hadoop and will try to make
> > a proposal for the new API. Will let you know when we are ready and
> > your comments will be appreciated.
> >
> > Thanks,
> > Xin Dong
> >
> > -Original Message-
> > From: Gabor Szadovszky 
> > Sent: Thursday, June 3, 2021 3:49 PM
> > To: Parquet Dev 
> > Subject: Re: Decouple parquet-mr compression API from hadoop
> > compression API
> >
> > Hi Xin Dong,
> >
> > There are a couple of open jiras related to this. Like PARQUET-1812 <
> > https://issues.apache.org/jira/browse/PARQUET-1812> about using the
> > airlift implementation of the codecs or your own jiras about the
> > provider-aware codecs. I strongly agree on having compression codecs
> > that are independent from Hadoop. It would also be required to have
> > our own compression codecs shipped if we want to achieve the core
> > features support (
> > PARQUET-1950 ).
> > Feel free to create a jira about this. Also, if you want to invest on
> > it I'm happy to review the related PRs.
> >
> > Cheers,
> > Gabor
> >
> > On Thu, Jun 3, 2021 at 4:18 AM Dong, Xin  wrote:
> >
> > > Hi, All,
> > > Currently parquet-mr compression logic is using Hadoop compression
> > > API which makes parquet-mr compression highly coupled with Hadoop.
> > > Does community have any plan to decouple those two APIs? To make the
> > > things easier, maybe we can just using api similar to Hadoop
> > > compression APIs but belongs to parquet-mr namespace. And simply
> > > change current codec to implements the new parquet-mr API. Any
> thoughts?
> > > Thanks,
> > > Xin Dong
> > >
> > >
> >
>


-- 
Xinli Shang