Re: [DISCUSS] Parquet data masking/anonymization

2020-08-10 Thread Gidon Gershinsky
Hi Micah, Yep, we've been asking ourselves the same question; this is one of the reasons we take this slowly. The general answer is we want to help users to avoid the need to implement the masking mechanism (and the privacy leakage analysis tools) on their own. The idea is to create a common set

Re: [DISCUSS] Parquet data masking/anonymization

2020-08-07 Thread Micah Kornfield
Hi Gidon, Was there prior discussion on this on the mailing list? I left a comment on the document, but it isn't clear to me why this particular use-case needs to be part of the core parquet library, Are there motivating use-cases that wouldn't be served by an external library/application level?

[DISCUSS] Parquet data masking/anonymization

2020-08-04 Thread Gidon Gershinsky
Hi all, Now that the encryption mechanism is mostly complete, we are starting a long-term project on a new security feature on top of encryption. Called "data obfuscation", it combines masking and anonymization of sensitive data. https://issues.apache.org/jira/browse/PARQUET-1376 On the one