subject:"Discuss fast copy on write rfc\-68"

Re: Discuss fast copy on write rfc-68

2023-07-21 Thread Nicolas Paris

Definitely can't see a benefit to use 30MB row groups over just creating 30MB parquet files. I would add that stats indexes are on the file level, so it's in favor to using row groups size=file size. The only context it would help is when clustering is setup and targets 1GB files, w/ 128MB

Re: Discuss fast copy on write rfc-68

2023-07-20 Thread Nicolas Paris

Spliting parquet file into 5 row groups, leads to same benefit as creating 5 parquet files each 1 row group instead. Also the later can involve more parallelism for writes. Am I missing something? On July 20, 2023 12:38:54 PM UTC, sagar sumit wrote: >Good questions! The idea is to be able to

Re: Discuss fast copy on write rfc-68

2023-07-20 Thread sagar sumit

Good questions! The idea is to be able to skip rowgroups based on index. But, if we have to do a full snapshot load, then our wrapper should actually be doing batch GET on S3. Why incur 5x more calls. As for the update, I think this is in the context of COW. So, the footer will be recomputed

Discuss fast copy on write rfc-68

2023-07-20 Thread nicolas paris

Hi, Multiple idenpendant initiatives for fast copy on write have emerged (correct me if I am wrong): 1. https://github.com/apache/hudi/blob/f1afb1bf04abdc94a26d61dc302f36ec2bbeb15b/rfc/rfc-68/rfc-68.md 2. https://www.uber.com/en-FR/blog/fast-copy-on-write-within-apache-parquet/ The idea is to

Re: Discuss fast copy on write rfc-68

Re: Discuss fast copy on write rfc-68

Re: Discuss fast copy on write rfc-68

Discuss fast copy on write rfc-68

4 matches

Site Navigation

Mail list logo

Footer information