[jira] [Commented] (PARQUET-2071) Encryption translation tool
[ https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403039#comment-17403039 ] Gabor Szadovszky commented on PARQUET-2071: --- [~sha...@uber.com], sure, I am fine with having the "universal tool" and the required refactors be handled under the separate jira. > Encryption translation tool > > > Key: PARQUET-2071 > URL: https://issues.apache.org/jira/browse/PARQUET-2071 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > > When translating existing data to encryption state, we could develop a tool > like TransCompression to translate the data at page level to encryption state > without reading to record and rewrite. This will speed up the process a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-2071) Encryption translation tool
[ https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402670#comment-17402670 ] Xinli Shang commented on PARQUET-2071: -- I just drafted the tool and had [~gershinsky] to have an earlier look(Thanks Gidon!). It is working now and I just had a comparison with a regular tool(I simply write a tool that read each record and write it back immediately). The result is promising that it is 20X faster than the regular tool. [~gszadovszky] Are you open to having the tool merge in first and then we refactor all the existing similar tools to have the universal tool? If yes, I am going to make a PR shortly. > Encryption translation tool > > > Key: PARQUET-2071 > URL: https://issues.apache.org/jira/browse/PARQUET-2071 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > > When translating existing data to encryption state, we could develop a tool > like TransCompression to translate the data at page level to encryption state > without reading to record and rewrite. This will speed up the process a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-2071) Encryption translation tool
[ https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394098#comment-17394098 ] Xinli Shang commented on PARQUET-2071: -- Thanks, Gabor and Gidon! I think it is a good idea of 'universal tool' and load it for different use cases. I opened https://issues.apache.org/jira/browse/PARQUET-2075 for it. > Encryption translation tool > > > Key: PARQUET-2071 > URL: https://issues.apache.org/jira/browse/PARQUET-2071 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > > When translating existing data to encryption state, we could develop a tool > like TransCompression to translate the data at page level to encryption state > without reading to record and rewrite. This will speed up the process a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-2071) Encryption translation tool
[ https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393982#comment-17393982 ] Gidon Gershinsky commented on PARQUET-2071: --- A very useful tool, I'll be glad to review the pr. > Encryption translation tool > > > Key: PARQUET-2071 > URL: https://issues.apache.org/jira/browse/PARQUET-2071 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > > When translating existing data to encryption state, we could develop a tool > like TransCompression to translate the data at page level to encryption state > without reading to record and rewrite. This will speed up the process a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-2071) Encryption translation tool
[ https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393788#comment-17393788 ] Gabor Szadovszky commented on PARQUET-2071: --- I think it is a great idea to skip unnecessary deserialization/serialization steps in such cases. Meanwhile, we already have some tools with similar approach like trans-compression or prune columns. What do you think of implementing a more universal tool where you can configure the projection schema and the configuration of the target file. Then the tool can decide which level of deserialization/serialization is required. For example for trans-compression you need to decompress the pages while for encryption you don't. What do you think? > Encryption translation tool > > > Key: PARQUET-2071 > URL: https://issues.apache.org/jira/browse/PARQUET-2071 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Xinli Shang >Assignee: Xinli Shang >Priority: Major > > When translating existing data to encryption state, we could develop a tool > like TransCompression to translate the data at page level to encryption state > without reading to record and rewrite. This will speed up the process a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)