Hi Gary,
We can pass the constructed timeline and filesystem view into the IOHandle.
I think it makes sense for how Flink does things.
Thanks
Vinoth
On Fri, Sep 24, 2021 at 2:04 AM Gary Li wrote:
> Hi Vinoth,
>
> Currently, each executor of Flink has a timeline server I believe. Do you
>
Hi Vinoth,
Currently, each executor of Flink has a timeline server I believe. Do you
think we can avoid passing the timeline and filesystem view into the
IOHandle? I mean one IOHandle is handling the IO of one filegroup, and it
doesn't need to know the timeline and filesystem view of the table,
Thanks for the explanation. I get the streaming aspect better now. Esp in
Flink land. Timeline server and remote file system view are what the
defaults are. Assuming its a RPC call that takes 10-100 ms to the timeline
server, not sure how much room there is for optimization for loading of the
file
Hi Vinoth,
IMO the IOHandle should be as lightweight as possible, especially when we
want to do streaming and near-real-time update(possibly real-time in the
future?). Constructing the timeline and filesystem view inside the handle
is time-consuming. In some cases, some handles only write a few
Hi Gary,
So in effect you want to pull all the timeline filtering out of the handles
and pass a plan i.e what file slice to work on - to the handle?
That does sound cleaner. but we need to introduce this additional layer.
The timeline and filesystem view do live within the table, I believe today.
Hi Vinoth,
Thanks for your response. For HoodieIOHandle, IMO we could define the scope
of the Handle during the initialization, so we don't need to care about the
timeline and table view when actually writing the data. Is that possible? A
HoodieTable could have many Handles writing data at the
Hi Gary,
Thanks for the detailed response. Let me add my take on it.
>>HoodieFlinkMergeOnReadTable.upsert(List) to use the
AppendHandle.write(HoodieRecord) directly,
I have the same issue on JavaClient, for the Kafka Connect implementation.
I have an idea of how we can implement this. Will
Huge +1. Recently I am working on making the Flink writer in a streaming
fashion and found the List interface is limiting the
streaming power of Flink. By switching from
HoodieFlinkMergeOnReadTable.upsert(List) to use the
AppendHandle.write(HoodieRecord) directly, the throughput was almost
doubled
+1 that's a great improvement.
On Wed, Sep 15, 2021 at 10:40 AM Sivabalan wrote:
> ++1. definitely help's Hudi scale and makes it more maintainable. Thanks
> for driving this effort. Mostly devs show interest in major features and
> don't like to spend time in such foundational work. But as the
++1. definitely help's Hudi scale and makes it more maintainable. Thanks
for driving this effort. Mostly devs show interest in major features and
don't like to spend time in such foundational work. But as the project
scales, these foundational work will have a higher returns in the long run.
On
Another +1 , HoodieData abstraction will go a long way in reducing LoC.
Happy to work with you to see this through! I really encourage top
contributors to the Flink and Java clients as well,
actively review all PRs, given there are subtle differences everywhere.
This will help us smoothly
Hi Ethan,
Big +1 for the proposal.
Actually, we have discussed this topic before.[1]
Will review your refactor PR later.
Best,
Vino
[1]:
https://lists.apache.org/thread.html/r71d96d285c735b1611920fb3e7224c9ce6fd53d09bf0e8f144f4fcbd%40%3Cdev.hudi.apache.org%3E
Y Ethan Guo 于2021年9月15日周三
Hi all,
hudi-client module has core Hudi abstractions and client logic for
different engines like Spark, Flink, and Java. While previous effort
(HUDI-538 [1]) has decoupled the integration with Spark, there is quite
some code duplication across different engines for almost the same logic
due to
13 matches
Mail list logo