Re: Side Inputs vs. Connected Streams

2016-10-06 Thread Maximilian Michels
> Will the above ensure that dsSide is always available before ds3 elements 
> arrive on the connected stream. Am I correct in assuming that ds2 changes 
> will continue to be broadcast to ds3 (with no ordering guarantees between ds3 
> and dsSide, ofcourse).

Broadcasting is just a partition strategy. If you broadcast a stream,
all records part of the stream will be replicated to all partitions.
This is in contrast with a keyBy which would spread records among
partitions. So yes, in the example `DataStream ds4 =
ds3.connect(dsSide.broadcast());`, you can't assume dsSide is made
available first. The elements will arrive as they are sent over the
wire.

On Tue, Oct 4, 2016 at 11:45 AM, Till Rohrmann  wrote:
> Hi Sameer,
>
> the semantics of side inputs are not fully fledged out yet, as far as I
> know.
>
> The design doc says that one first waits for some data on the side input
> before starting processing the main input. Therefore, I would assume that
> you have some temporal ordering of the side input and main input compared to
> connected streams.
>
> But don't take this as guaranteed since this feature is still under
> development and the semantics are not fully decided yet.
>
> Cheers,
> Till
>
> On Mon, Oct 3, 2016 at 2:31 PM, Sameer W  wrote:
>>
>> Hi,
>>
>> I read the Side Inputs design document. How does it compare to using
>> ConnectedStreams with respect to handling the ordering of streams
>> transparently?
>>
>> One of the challenges I have with ConnectedStreams is I need to buffer
>> main input if the rules stream has not arrived yet. Does this automatically
>> go away with Side Inputs? Will the call to  String sideValue =
>>
>>getRuntimeContext().getSideInput(filterString);
>>
>> block if the side input is not available yet? And is the reverse also
>> true?
>>
>> Alternatively, if my rules are not large in number and I want to broadcast
>> them to all nodes is the below equivalent to using SideInputs where side
>> inputs are broadcast to all nodes and ensure that the side input is
>> evaluated before the main input:
>>
>> DataStream ds4 = ds3.connect(dsSide.broadcast());
>>
>> Will the above ensure that dsSide is always available before ds3 elements
>> arrive on the connected stream. Am I correct in assuming that ds2 changes
>> will continue to be broadcast to ds3 (with no ordering guarantees between
>> ds3 and dsSide, ofcourse).
>>
>>
>> Thanks,
>> Sameer
>>
>>
>>
>


Re: Side Inputs vs. Connected Streams

2016-10-04 Thread Till Rohrmann
Hi Sameer,

the semantics of side inputs are not fully fledged out yet, as far as I
know.

The design doc says that one first waits for some data on the side input
before starting processing the main input. Therefore, I would assume that
you have some temporal ordering of the side input and main input compared
to connected streams.

But don't take this as guaranteed since this feature is still under
development and the semantics are not fully decided yet.

Cheers,
Till

On Mon, Oct 3, 2016 at 2:31 PM, Sameer W  wrote:

> Hi,
>
> I read the Side Inputs
> 
> design document. How does it compare to using ConnectedStreams with respect
> to handling the ordering of streams transparently?
>
> One of the challenges I have with ConnectedStreams is I need to buffer
> main input if the rules stream has not arrived yet. Does this automatically
> go away with Side Inputs? Will the call to  String sideValue =
>
>getRuntimeContext().getSideInput(filterString);
> block if the side input is not available yet? And is the reverse also true?
>
> Alternatively, if my rules are not large in number and I want to broadcast
> them to all nodes is the below equivalent to using SideInputs where side
> inputs are broadcast to all nodes and ensure that the side input is
> evaluated before the main input:
>
> DataStream ds4 = ds3.connect(dsSide.broadcast());
>
> Will the above ensure that dsSide is always available before ds3 elements
> arrive on the connected stream. Am I correct in assuming that ds2 changes
> will continue to be broadcast to ds3 (with no ordering guarantees between
> ds3 and dsSide, ofcourse).
>
>
> Thanks,
> Sameer
>
>
>
>


Side Inputs vs. Connected Streams

2016-10-03 Thread Sameer W
Hi,

I read the Side Inputs

design document. How does it compare to using ConnectedStreams with respect
to handling the ordering of streams transparently?

One of the challenges I have with ConnectedStreams is I need to buffer main
input if the rules stream has not arrived yet. Does this automatically go
away with Side Inputs? Will the call to  String sideValue =

   getRuntimeContext().getSideInput(filterString);
block if the side input is not available yet? And is the reverse also true?

Alternatively, if my rules are not large in number and I want to broadcast
them to all nodes is the below equivalent to using SideInputs where side
inputs are broadcast to all nodes and ensure that the side input is
evaluated before the main input:

DataStream ds4 = ds3.connect(dsSide.broadcast());

Will the above ensure that dsSide is always available before ds3 elements
arrive on the connected stream. Am I correct in assuming that ds2 changes
will continue to be broadcast to ds3 (with no ordering guarantees between
ds3 and dsSide, ofcourse).


Thanks,
Sameer