[GitHub] [parquet-mr] ggershinsky commented on pull request #615: PARQUET-1373: Encryption key tools

2020-07-01 Thread GitBox


ggershinsky commented on pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#issuecomment-652301855


   @gszadovszky the latest review round is addressed by 0355f65



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] ggershinsky commented on pull request #615: PARQUET-1373: Encryption key tools

2020-06-25 Thread GitBox


ggershinsky commented on pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#issuecomment-649588181


   @gszadovszky the 055bb39 is the squash of all commits that address the 
latest review round



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] ggershinsky commented on pull request #615: PARQUET-1373: Encryption key tools

2020-06-15 Thread GitBox


ggershinsky commented on pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#issuecomment-644145175


   > > ConcurrentMap has a segment synchronization for write operations, and 
allows for synchronization-free read operations; this makes it faster than 
HasMap with synchronized methods.
   > 
   > Yes, I know how `ConcurrentHashMap` works. What I wanted to say that you 
are using synchronization as well. As you already use a `ConcurrentMap` you 
might implement these synchronized code parts by using the methods of 
`ConcurrentMap`. I've put some examples that might work.
   
   Sounds good, and thank you for the examples! We've already applied your code 
(not pushed yet), it indeed allowed to remove the explicit synchronization, 
making the cache implementation cleaner and faster.
   
   > 
   > Please, check why Travis fails.
   > 
   
   Sorry, should have mentioned that it will take a few more commits to fully 
address this round of the comments (and fix the unitests). Once all commits are 
in, I will squash them to simplify the review, and will post a comment here.
   
   > Another point of view came up about handling sensitive data in memory. 
Java does not clean memory after garbage collecting objects. It means that 
sensitive data must be manually cleaned after used otherwise it might get 
compromised by another java application in the same jvm or even by another 
process after the jvm exists. Because of the same reason `String` objects shall 
never contain sensitive information as the `char[]` behind the object might not 
get garbage collected after the `String` object itself gets dropped.
   > I did not find any particular bad practice in the code or any examples of 
the listed situations just wanted to highlight that we shall think about this 
as well.
   
   Yep, keeping secret data in Java strings is a notorious problem. I think the 
general consensus is not to rely on gc or explicit byte wiping - but to 
remember that these Java processes must run in a trusted environment anyway, 
simply because they work with confidential information, ranging from the 
encryption keys to the sensitive data itself. Micro-managing the memory with 
confidential information is always hard, and is basically impossible with Java. 
It goes beyond Parquet. One example - the KMS Client implementations send 
secret tokens and fetch explicit encryption keys, using a custom HTTP library. 
There is no guarantee this library doesn't use strings (most likely, it does). 
Another example - the secret tokens are passed as a Hadoop property from Spark 
or another framework; this is likely to be implemented with strings. Moreover, 
the tokens are built in an access control system, then sent to a user, then 
sent to a Spark driver, then sent to Spark workers (or other framework 
components) - there is no way to control this, except to rely on HTTPS for the 
transport security, and on running framework drivers/workers in a trusted 
environment for the memory security.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] ggershinsky commented on pull request #615: PARQUET-1373: Encryption key tools

2020-06-14 Thread GitBox


ggershinsky commented on pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#issuecomment-643729252


   > There are a couple of concurrent code parts. We shall test somehow that 
the code is thread-safe. I know, it is not easy but if you execute the related 
code paths with several parallel threads (e.g. encrypting/decrypting several 
files in multiple threads) we might find some issues.
   
   
   we'll make the PR test to run multiple threads, writing a number of files in 
parallel (same for reading).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] ggershinsky commented on pull request #615: PARQUET-1373: Encryption key tools

2020-06-04 Thread GitBox


ggershinsky commented on pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#issuecomment-638737727


   > @ggershinsky, is it still in draft state?
   
   @gszadovszky The pr is completed, ready for a review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] ggershinsky commented on pull request #615: PARQUET-1373: Encryption key tools

2020-06-04 Thread GitBox


ggershinsky commented on pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#issuecomment-638677887


   the implementation is complete. with the recent addition of a unitest, the 
draft state can be switched off. I'll rebase it first, to make sure the Travis 
checks pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org