[DISCUSS] Introduce incremental processing API in Hudi

2020-08-30 Thread vino yang
Hi everyone, For a long time, in the field of big data, people hope that the tools they use can give greater play to the processing and analysis capabilities of big data. At present, from the perspective of API, Hudi mostly provides APIs related to data ingestion, and relies on various big data

Hudi Writer vs Spark Parquet Writer - Sync

2020-08-30 Thread Kizhakkel Jose, Felix
Hello All, Hive has the bucketBy feature and spark is going to add support for HIVE style bucketBy support for data sources and once it’s implemented - its going to benefit largely on the read performance. So as HUDI is having different path while writing parquet data, are we planning to add

[ANNOUNCE] Hudi Community Weekly Update(2020-08-23 ~ 2020-08-30)

2020-08-30 Thread leesf
Dear community, Nice to share Hudi community weekly update for 2020-08-23 ~ 2020-08-30 with updates on discussion, features, bugfixs. === Discussion [Release] Hudi 0.6.0 has been released, it contains many features and bugfixes [1]

DevX, Test infra Rgdn

2020-08-30 Thread Sivabalan
As Hudi matures as a project, we need to get our devX and test infra rock solid. Availability of test utils and base classes for ease of writing more tests, stable integration tests, ease of debuggability, micro benchmarks, performance test infra, automating checkstyle formatting, nightly snapshot