Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-19 Thread weijie guo
Hi All, Thanks for all the feedback. If there are no more comments, I would like to start the vote thread, thanks again! Best regards, Weijie Xintong Song 于2024年2月20日周二 14:17写道: > Thanks for the updates. LGTM. > > Best, > > Xintong > > > > On Mon, Feb 19, 2024 at 10:51 AM weijie guo >

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-19 Thread Xintong Song
Thanks for the updates. LGTM. Best, Xintong On Mon, Feb 19, 2024 at 10:51 AM weijie guo wrote: > Thanks for the reply, Xintong. > > Based on your comments, I made the following changes to this FLIP: > > 1. Renaming `TwoInputStreamProcessFunction` and >

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-18 Thread weijie guo
Thanks for the reply, Xintong. Based on your comments, I made the following changes to this FLIP: 1. Renaming `TwoInputStreamProcessFunction` and `BroadcastTwoInputStreamProcessFunction` to `TwoInputNonBroadcastStreamProcessFunction` and `TwoInputBroadcastStreamProcessFunction`, respectively.

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-04 Thread Xintong Song
Thanks for updating the FLIP, Weijie. I think separating the TwoInputProcessFunction according to whether the input stream contains BroadcastStream makes sense. I have a few more comments. 1. I'd suggest the names `TwoInputNonBroadcastStreamProcessFunction` and

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-02-03 Thread weijie guo
Hi Xuannan and Xintong, Good point! After further consideration, I feel that we should make the Broadcast + NonKeyed/Keyed process function different from the normal TwoInputProcessFunction. Because the record from the broadcast input indeed correspond to all partitions, while the record from the

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-31 Thread Xintong Song
OK, I see your point. I think the demand for updating states and emitting outputs upon receiving a broadcast record makes sense. However, the way `KeyedBroadcastProcessFunction` supports this may not be optimal. E.g., if `Collector#collect` is called in `processBroadcastElement` but outside of

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-31 Thread Xuannan Su
Hi Weijie and Xingtong, Thanks for the reply! Please see my comments below. > Does this mean if we want to support (KeyedStream, BroadcastStream) -> > (KeyedStream), we must make sure that no data can be output upon processing > records from the input BroadcastStream? That's probably a

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-29 Thread weijie guo
Hi Xintong, Thanks for your reply. > Does this mean if we want to support (KeyedStream, BroadcastStream) -> (KeyedStream), we must make sure that no data can be output upon processing records from the input BroadcastStream? That's probably a reasonable limitation. I think so, this is the

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-29 Thread Xintong Song
Just trying to understand. > Is there a particular reason we do not support a > `TwoInputProcessFunction` to combine a KeyedStream with a > BroadcastStream to result in a KeyedStream? There seems to be a valid > use case where a KeyedStream is enriched with a BroadcastStream and > returns a

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-29 Thread weijie guo
Hi Xuannan, Thank you for your attention. > In the partitioning section, it says that "broadcast can only be used as a side-input of other Inputs." Could you clarify what is meant by "side-input"? If I understand correctly, it refer to one of the inputs of the `TwoInputStreamProcessFunction`. If

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-29 Thread weijie guo
Hi Yunfeng, Thank you for your attention > 1. Will we provide any API to support choosing which input to consume between the two inputs of TwoInputStreamProcessFunction? It would be helpful in online machine learning cases, where a process function needs to receive the first machine learning

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-29 Thread weijie guo
Hi Wencong: Thank you for your attention > Q1. Other DataStream types are converted into Non-Keyed DataStreams by using a "shuffle" operation to convert Input into output. Does this "shuffle" include the various repartition operations (rebalance/rescale/shuffle) from DataStream V1? Yes, The

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-29 Thread Xuannan Su
Hi Weijie, Thank you for driving the design of the new DataStream API. I have a few questions regarding the FLIP: 1. In the partitioning section, it says that "broadcast can only be used as a side-input of other Inputs." Could you clarify what is meant by "side-input"? If I understand correctly,

Re: [DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-25 Thread Yunfeng Zhou
Hi Weijie, Thanks for raising discussions about the new DataStream API. I have a few questions about the content of the FLIP. 1. Will we provide any API to support choosing which input to consume between the two inputs of TwoInputStreamProcessFunction? It would be helpful in online machine

Re:[DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2024-01-24 Thread Wencong Liu
Hi Weijie, Regarding FLIP-409, I have the following questions: Q1. Other DataStream types are converted into Non-Keyed DataStreams by using a "shuffle" operation to convert Input into output. Does this "shuffle" include the various repartition operations (rebalance/rescale/shuffle) from

[DISCUSS] FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction

2023-12-25 Thread weijie guo
Hi devs, I'd like to start a discussion about FLIP-409: DataStream V2 Building Blocks: DataStream, Partitioning and ProcessFunction [1]. As the first sub-FLIP for DataStream API V2, we'd like to discuss and try to answer some of the most fundamental questions in stream processing: 1. What