Re: [DISCUSS] Refactor hudi-client module for better support of multiple engines

2021-09-15 Thread Raymond Xu
+1 that's a great improvement. On Wed, Sep 15, 2021 at 10:40 AM Sivabalan wrote: > ++1. definitely help's Hudi scale and makes it more maintainable. Thanks > for driving this effort. Mostly devs show interest in major features and > don't like to spend time in such foundational work. But as the

Re: [DISCUSS] Refactor hudi-client module for better support of multiple engines

2021-09-15 Thread Sivabalan
++1. definitely help's Hudi scale and makes it more maintainable. Thanks for driving this effort. Mostly devs show interest in major features and don't like to spend time in such foundational work. But as the project scales, these foundational work will have a higher returns in the long run. On

Re: [DISCUSS] Refactor hudi-client module for better support of multiple engines

2021-09-15 Thread Vinoth Chandar
Another +1 , HoodieData abstraction will go a long way in reducing LoC. Happy to work with you to see this through! I really encourage top contributors to the Flink and Java clients as well, actively review all PRs, given there are subtle differences everywhere. This will help us smoothly

Re: [DISCUSS] Refactor hudi-client module for better support of multiple engines

2021-09-15 Thread vino yang
Hi Ethan, Big +1 for the proposal. Actually, we have discussed this topic before.[1] Will review your refactor PR later. Best, Vino [1]: https://lists.apache.org/thread.html/r71d96d285c735b1611920fb3e7224c9ce6fd53d09bf0e8f144f4fcbd%40%3Cdev.hudi.apache.org%3E Y Ethan Guo 于2021年9月15日周三

[DISCUSS] Refactor hudi-client module for better support of multiple engines

2021-09-15 Thread Y Ethan Guo
Hi all, hudi-client module has core Hudi abstractions and client logic for different engines like Spark, Flink, and Java. While previous effort (HUDI-538 [1]) has decoupled the integration with Spark, there is quite some code duplication across different engines for almost the same logic due to