Re: Multi-stream question

2018-04-12 Thread Fabian Hueske
Hi,

Ken's approach of having a joint data type and unioning the streams is
good. This will work seamlessly with checkpoints. Timo (in CC) used the
same approach to implement a prototype of a multi-way join.

A Tuple won't work though because the Tuple serializer does not support
null fields. You can use a Row or implement a custom, Either-like type.

Best, Fabian


TechnoMage  schrieb am Sa., 7. Apr. 2018, 17:25:

> Thanks for the Tuple suggestion, I may use that.  I was asking about
> building a custom operator (just an idea).  I have since decided I can
> decompose the problem into pairs of streams and emit a stream to the next
> CoFlatMap to get the result I need.  Now to see if the idea works ...
>
> Michael
>
> > On Apr 7, 2018, at 1:10 PM, Ken Krugler 
> wrote:
> >
> > Hi Michael,
> >
> > There isn’t an operator that takes three (or more) streams, AFAIK.
> >
> > There is a CoFlatMapFunction that takes two different streams in, which
> could be used for some types of joins.
> >
> > Streaming joins are (typically) windowed (bounded), by
> time/count/something, so if you can maintain the required windowed state in
> a ProcessFunction then you can implement whatever custom logic is required
> for your join case.
> >
> > And for creating a unioned stream of multiple data types, one easy way
> is via (e.g.) Tuple3, where only one of the three
> fields is non-null for each tuple.
> >
> > -- Ken
> >
> > PS - I think the u...@flink.apache.org 
> list is probably a better forum for this question.
> >
> >> On Apr 7, 2018, at 10:47 AM, TechnoMage  wrote:
> >>
> >> In my case I have more elaborate logic to select data from the
> streams.  They are not all the same logical type, though I may be able to
> represent them as the same Java type.  My main question is whether it is
> technically feasible to have a single operator that takes multiple streams
> as input.  For example Operator(stream1, stream2, stream3) and produces an
> output stream.  Can the checkpointing and other logic accomodate this if I
> write sufficient custom code in the operator?
> >>
> >> Michael
> >>
> >>> On Apr 7, 2018, at 10:42 AM, Ken Krugler 
> wrote:
> >>>
> >>> When you say “join” are you talking about a real join (so one or more
> fields can be used as a joining key), or some other operation?
> >>>
> >>> For more than two streams, you can do cascading window joins via
> multiple join()s that reduce your source streams down to a single stream.
> >>>
> >>> If the fields are the same across these streams, then a union()
> followed by say a ProcessFunction that implements your joining logic could
> work.
> >>>
> >>> Or you can convert all the streams to a common tuple format that
> consists of a unions the fields, so you can do a union() and then follow
> that with whatever logic is needed to actually do the join.
> >>>
> >>> Though I’m sure there are more elegant approaches :)
> >>>
> >>> — Ken
> >>>
> >>>
> >>>
>  On Apr 6, 2018, at 5:04 PM, Michael Latta 
> wrote:
> 
>  I would like to “join” several streams (>3) in a custom operator. Is
> this feasible in Flink?
> 
> 
>  Michael
> >>>
> >>> 
> >>> http://about.me/kkrugler
> >>> +1 530-210-6378
> >>>
> >>
> >
> > 
> > http://about.me/kkrugler
> > +1 530-210-6378
> >
>
>


Re: Multi-stream question

2018-04-07 Thread TechnoMage
Thanks for the Tuple suggestion, I may use that.  I was asking about building a 
custom operator (just an idea).  I have since decided I can decompose the 
problem into pairs of streams and emit a stream to the next CoFlatMap to get 
the result I need.  Now to see if the idea works ...

Michael

> On Apr 7, 2018, at 1:10 PM, Ken Krugler  wrote:
> 
> Hi Michael,
> 
> There isn’t an operator that takes three (or more) streams, AFAIK.
> 
> There is a CoFlatMapFunction that takes two different streams in, which could 
> be used for some types of joins.
> 
> Streaming joins are (typically) windowed (bounded), by time/count/something, 
> so if you can maintain the required windowed state in a ProcessFunction then 
> you can implement whatever custom logic is required for your join case.
> 
> And for creating a unioned stream of multiple data types, one easy way is via 
> (e.g.) Tuple3, where only one of the three fields is 
> non-null for each tuple.
> 
> -- Ken
> 
> PS - I think the u...@flink.apache.org  list is 
> probably a better forum for this question.
> 
>> On Apr 7, 2018, at 10:47 AM, TechnoMage  wrote:
>> 
>> In my case I have more elaborate logic to select data from the streams.  
>> They are not all the same logical type, though I may be able to represent 
>> them as the same Java type.  My main question is whether it is technically 
>> feasible to have a single operator that takes multiple streams as input.  
>> For example Operator(stream1, stream2, stream3) and produces an output 
>> stream.  Can the checkpointing and other logic accomodate this if I write 
>> sufficient custom code in the operator?
>> 
>> Michael
>> 
>>> On Apr 7, 2018, at 10:42 AM, Ken Krugler  
>>> wrote:
>>> 
>>> When you say “join” are you talking about a real join (so one or more 
>>> fields can be used as a joining key), or some other operation?
>>> 
>>> For more than two streams, you can do cascading window joins via multiple 
>>> join()s that reduce your source streams down to a single stream.
>>> 
>>> If the fields are the same across these streams, then a union() followed by 
>>> say a ProcessFunction that implements your joining logic could work.
>>> 
>>> Or you can convert all the streams to a common tuple format that consists 
>>> of a unions the fields, so you can do a union() and then follow that with 
>>> whatever logic is needed to actually do the join.
>>> 
>>> Though I’m sure there are more elegant approaches :)
>>> 
>>> — Ken
>>> 
>>> 
>>> 
 On Apr 6, 2018, at 5:04 PM, Michael Latta  wrote:
 
 I would like to “join” several streams (>3) in a custom operator. Is this 
 feasible in Flink?
 
 
 Michael
>>> 
>>> 
>>> http://about.me/kkrugler
>>> +1 530-210-6378
>>> 
>> 
> 
> 
> http://about.me/kkrugler
> +1 530-210-6378
> 



Re: Multi-stream question

2018-04-07 Thread Ken Krugler
Hi Michael,

There isn’t an operator that takes three (or more) streams, AFAIK.

There is a CoFlatMapFunction that takes two different streams in, which could 
be used for some types of joins.

Streaming joins are (typically) windowed (bounded), by time/count/something, so 
if you can maintain the required windowed state in a ProcessFunction then you 
can implement whatever custom logic is required for your join case.

And for creating a unioned stream of multiple data types, one easy way is via 
(e.g.) Tuple3, where only one of the three fields is 
non-null for each tuple.

-- Ken

PS - I think the u...@flink.apache.org  list is 
probably a better forum for this question.

> On Apr 7, 2018, at 10:47 AM, TechnoMage  wrote:
> 
> In my case I have more elaborate logic to select data from the streams.  They 
> are not all the same logical type, though I may be able to represent them as 
> the same Java type.  My main question is whether it is technically feasible 
> to have a single operator that takes multiple streams as input.  For example 
> Operator(stream1, stream2, stream3) and produces an output stream.  Can the 
> checkpointing and other logic accomodate this if I write sufficient custom 
> code in the operator?
> 
> Michael
> 
>> On Apr 7, 2018, at 10:42 AM, Ken Krugler  wrote:
>> 
>> When you say “join” are you talking about a real join (so one or more fields 
>> can be used as a joining key), or some other operation?
>> 
>> For more than two streams, you can do cascading window joins via multiple 
>> join()s that reduce your source streams down to a single stream.
>> 
>> If the fields are the same across these streams, then a union() followed by 
>> say a ProcessFunction that implements your joining logic could work.
>> 
>> Or you can convert all the streams to a common tuple format that consists of 
>> a unions the fields, so you can do a union() and then follow that with 
>> whatever logic is needed to actually do the join.
>> 
>> Though I’m sure there are more elegant approaches :)
>> 
>> — Ken
>> 
>> 
>> 
>>> On Apr 6, 2018, at 5:04 PM, Michael Latta  wrote:
>>> 
>>> I would like to “join” several streams (>3) in a custom operator. Is this 
>>> feasible in Flink?
>>> 
>>> 
>>> Michael
>> 
>> 
>> http://about.me/kkrugler
>> +1 530-210-6378
>> 
> 


http://about.me/kkrugler
+1 530-210-6378



Re: Multi-stream question

2018-04-07 Thread TechnoMage
In my case I have more elaborate logic to select data from the streams.  They 
are not all the same logical type, though I may be able to represent them as 
the same Java type.  My main question is whether it is technically feasible to 
have a single operator that takes multiple streams as input.  For example 
Operator(stream1, stream2, stream3) and produces an output stream.  Can the 
checkpointing and other logic accomodate this if I write sufficient custom code 
in the operator?

Michael

> On Apr 7, 2018, at 10:42 AM, Ken Krugler  wrote:
> 
> When you say “join” are you talking about a real join (so one or more fields 
> can be used as a joining key), or some other operation?
> 
> For more than two streams, you can do cascading window joins via multiple 
> join()s that reduce your source streams down to a single stream.
> 
> If the fields are the same across these streams, then a union() followed by 
> say a ProcessFunction that implements your joining logic could work.
> 
> Or you can convert all the streams to a common tuple format that consists of 
> a unions the fields, so you can do a union() and then follow that with 
> whatever logic is needed to actually do the join.
> 
> Though I’m sure there are more elegant approaches :)
> 
> — Ken
> 
> 
> 
>> On Apr 6, 2018, at 5:04 PM, Michael Latta  wrote:
>> 
>> I would like to “join” several streams (>3) in a custom operator. Is this 
>> feasible in Flink?
>> 
>> 
>> Michael
> 
> 
> http://about.me/kkrugler
> +1 530-210-6378
>