gszadovszky commented on PR #1014:
URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1382754916
> * I'd prefer creating a new JIRA for this refactor to be a prerequisite.
Merging multiple files to a single one with customized pruning, encryption, and
codec is also in my mind and will be supported later. I will create separate
JIRAs as sub-tasks of PARQUET-2075 and work on them progressively.
Perfect! :)
> * Putting the original `created_by` into `key_value_metadata` is a good
idea. However, it is tricky if a file has been rewritten for several times.
What about adding a key named `original_created_by` to `key_value_metadata` and
concatenating all old `created_by`s to it?
It sounds good to me. Maybe have the latest one at the beginning and use the
separator `'\n'`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org