[GitHub] [parquet-mr] gszadovszky commented on pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-14 Thread GitBox


gszadovszky commented on PR #1014:
URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1382754916

   > * I'd prefer creating a new JIRA for this refactor to be a prerequisite. 
Merging multiple files to a single one with customized pruning, encryption, and 
codec is also in my mind and will be supported later. I will create separate 
JIRAs as sub-tasks of PARQUET-2075 and work on them progressively.
   
   Perfect! :)
   
   > * Putting the original `created_by` into `key_value_metadata` is a good 
idea. However, it is tricky if a file has been rewritten for several times. 
What about adding a key named `original_created_by` to `key_value_metadata` and 
concatenating all old `created_by`s to it?
   
   It sounds good to me. Maybe have the latest one at the beginning and use the 
separator `'\n'`?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-mr] gszadovszky commented on pull request #1014: PARQUET-2075: Implement unified file rewriter

2023-01-10 Thread GitBox


gszadovszky commented on PR #1014:
URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1377706622

   > @gszadovszky I Just want to check if you have time to have a look. @wgtmac 
just be nice to take over the work that we discussed earlier to have an 
aggregated rewriter.
   
   @shangxinli, I'll try to take a look this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org