+1
I was thinking that we add a new utility and NOT extend DeltaStreamer by
adding a Sink interface, for the following reasons
- It will make it look like a generic Source => Sink ETL tool, which is
actually not our intention to support on Hudi. There are plenty of good
tools for that out there.
Hi,
Does your implementation read out offset ranges from Kafka partitions?
which means - we can create multiple spark input partitions per Kafka
partitions?
if so, +1 for overall goals here.
How does this affect ordering? Can you think about how/if Hudi write
operations can handle potentially
Hi Vinoth,
I am aligned with the first reason that you mentioned. Better to have a
separate tool to take care of this.
On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar
wrote:
> +1
>
> I was thinking that we add a new utility and NOT extend DeltaStreamer by
> adding a Sink interface, for the
Hi,
Yea, we can create multiple spark input partitions per Kafka partition.
I think the write operations can handle the potentially out-of-order events,
because before writing we need to preCombine the incoming events using
source-ordering-field and we also need to combineAndGetUpdateValue
Should we stop SparkContext?
| |
李杰
|
|
leedd1...@163.com
|
Replied Message
| From | lee |
| Date | 4/3/2023 11:09 |
| To | Sivabalan |
| Cc | dev@hudi.apache.org |
| Subject | Re: When using the HoodieDeltaStreamer, is there a corresponding
parameter that can control the number of