Re: Improved MOR spark reader

2023-07-24 Thread Nicolas Paris
>Jon is working on new Hudi Spark integration relying on a new >implementation of the ParquetFileFormat Sounds good, thanks for the pointer On July 24, 2023 5:54:55 AM UTC, Y Ethan Guo wrote: >Hi Nicolas, > >Thanks for bringing up the discussion. Spark's MOR snapshot relation >provides

Re: Improved MOR spark reader

2023-07-23 Thread Y Ethan Guo
Hi Nicolas, Thanks for bringing up the discussion. Spark's MOR snapshot relation provides different readers for different splits such as base-file-only split and regular split with base and log files.

Re: Improved MOR spark reader

2023-07-22 Thread Nicolas Paris
Just to clarify: the read path described is all about RT views here only, not related to RO. On July 22, 2023 8:14:09 PM UTC, Nicolas Paris wrote: >I have been playing with the starrocks MOR hudi reader recently and it does an >amazing work: it has two read paths: > >1. For partitions with log

Improved MOR spark reader

2023-07-22 Thread Nicolas Paris
I have been playing with the starrocks MOR hudi reader recently and it does an amazing work: it has two read paths: 1. For partitions with log files, use the merging logic 2. For partitions with only parquet files, use the cow read logic As you know, the first path is slow bcoz it has merging