Re: [DISCUSS] Hudi Reverse Streamer

2023-04-03 Thread Vinoth Chandar
+1 I was thinking that we add a new utility and NOT extend DeltaStreamer by adding a Sink interface, for the following reasons - It will make it look like a generic Source => Sink ETL tool, which is actually not our intention to support on Hudi. There are plenty of good tools for that out there.

Re: [DISCUSS] split source of kafka partition by count

2023-04-03 Thread Vinoth Chandar
Hi, Does your implementation read out offset ranges from Kafka partitions? which means - we can create multiple spark input partitions per Kafka partitions? if so, +1 for overall goals here. How does this affect ordering? Can you think about how/if Hudi write operations can handle potentially

Re: [DISCUSS] Hudi Reverse Streamer

2023-04-03 Thread Pratyaksh Sharma
Hi Vinoth, I am aligned with the first reason that you mentioned. Better to have a separate tool to take care of this. On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar wrote: > +1 > > I was thinking that we add a new utility and NOT extend DeltaStreamer by > adding a Sink interface, for the

Re:Re: [DISCUSS] split source of kafka partition by count

2023-04-03 Thread 孔维
Hi, Yea, we can create multiple spark input partitions per Kafka partition. I think the write operations can handle the potentially out-of-order events, because before writing we need to preCombine the incoming events using source-ordering-field and we also need to combineAndGetUpdateValue

Re: When using the HoodieDeltaStreamer, is there a corresponding parameter that can control the number of cycles? For example, if I cycle 5 times, I stop accessing data

2023-04-03 Thread lee
Should we stop SparkContext? | | 李杰 | | leedd1...@163.com | Replied Message | From | lee | | Date | 4/3/2023 11:09 | | To | Sivabalan | | Cc | dev@hudi.apache.org | | Subject | Re: When using the HoodieDeltaStreamer, is there a corresponding parameter that can control the number of