Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-14 Thread Piotr Nowojski
Hi,

Sorry for late response, somehow I wasn’t notified about your e-mail.

> 
> So you meant implementation in DataStreamAPI with cutting corners would,
> generally, shorter than Table Join. I thought that using Tables would be
> more intuitive and shorter, hence my initial question :)

It depends what you are trying to do. Take a look at the class 
TemporalRowtimeJoin and/or class TemporalProcessTimeJoin classes in Flink to 
judge the complexity of writing your version of it vs using Table API just for 
that.

> 
> Regarding all the limitations with Table API that you mentioned, is there
> any summary page in Flink docs for that?

I don’t recall such summary :( Maybe someone else knows?

Piotrek

> On 4 May 2019, at 01:38, Averell  wrote:
> 
> Thank you Piotr for the thorough answer.
> 
> So you meant implementation in DataStreamAPI with cutting corners would,
> generally, shorter than Table Join. I thought that using Tables would be
> more intuitive and shorter, hence my initial question :)
> 
> Regarding all the limitations with Table API that you mentioned, is there
> any summary page in Flink docs for that?
> 
> Thanks and regards,
> Averell
> 
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Averell
Thank you Piotr for the thorough answer.

So you meant implementation in DataStreamAPI with cutting corners would,
generally, shorter than Table Join. I thought that using Tables would be
more intuitive and shorter, hence my initial question :)

Regarding all the limitations with Table API that you mentioned, is there
any summary page in Flink docs for that?

Thanks and regards,
Averell




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Piotr Nowojski
Hi Averell,

I will be referring to your original two options: 1 (duplicating stream_C) and 
2 (multiplexing stream_A and stream_B).

Both of them could be expressed using Temporal Table Join. You could multiplex 
stream_A and stream_B in Table API, temporal table join them with stream_C and 
then de multiplex them in DataStream API.

Resource usage/consumption would be more or less the same, but it depends to 
what you are comparing it. Temporal Table Joins in Table API when using 
processing time have little no overhead. When using event time, there is much 
more complicated logic how handle out of order data, when to emit the data (on 
watermark as the Table API’s implementation? Asap?). I could imagine different 
implementations cutting some corners here and there, but if you would like to 
implement the same set of features that Temporal Table Join provides in 
DataStream API, you would end up with roughly the same code (if not, if you end 
up with something better please contribute it! :) ). Please check the 
implementation details of 
org.apache.flink.table.runtime.join.TemporalRowtimeJoin and 
org.apache.flink.table.runtime.join.TemporalProcessTimeJoin.

Having said that, you have to answer yourself whether it’s better to implement 
the Temporal Join on your own in DataStream API or wether to go through the 
hassle of converting your DataStream to Tables and back again. I would guess no 
- if you are already working in DataStream API environment, using Table API 
will have some limitations, like possible data conversion or the fact that you 
are loosing the control over the state of your operator - Table API doesn’t 
provide support for keeping the state of the job/query during upgrading Flink 
versions or if you would like to modify your Table API job graph/query. While 
with DataStream API both of those things are supported.

Piotrek

> On 3 May 2019, at 15:22, Averell  wrote:
> 
> Hi,
> 
> Back to my story about enriching two different streams with data from one
> (slow stream) using Flink's low lever functions like CoProcessFunction
> (mentioned in this thread:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoFlatMapFunction-with-more-than-two-input-streams-td22320.html)
> 
> Now I see that Flink Table also support doing something similar with
> Temporal Table [1]. With this, I would only need to convert my enrichment
> stream to be a Temporal table, and the two other streams into two unbounded
> tables.
> 
> */In term of performance and resource usage/*, would this way of
> implementation (using Flink Table) be better than the option no.1 mentioned
> in my other thread: creating two different (though similar)
> CoProcessFunction's, maintaining two state tables (for the enrichment
> stream, one in each function)?
> 
> Thanks and best regards,
> Averell 
> 
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/temporal_tables.html
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Averell
Hi,

Back to my story about enriching two different streams with data from one
(slow stream) using Flink's low lever functions like CoProcessFunction
(mentioned in this thread:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoFlatMapFunction-with-more-than-two-input-streams-td22320.html)

Now I see that Flink Table also support doing something similar with
Temporal Table [1]. With this, I would only need to convert my enrichment
stream to be a Temporal table, and the two other streams into two unbounded
tables.

*/In term of performance and resource usage/*, would this way of
implementation (using Flink Table) be better than the option no.1 mentioned
in my other thread: creating two different (though similar)
CoProcessFunction's, maintaining two state tables (for the enrichment
stream, one in each function)?

Thanks and best regards,
Averell 

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/temporal_tables.html



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/