[ Adding the list back in, as this clarifies my question ] On Tue, Feb 20, 2018 at 3:42 PM, Darshan Singh <darshan.m...@gmail.com> wrote:
> I am no expert in Flink but I will try my best. Issue you mentioned will > be with all streaming systems even with Kafka KTable I use them a lot for > similar sort of requirements. > > In Kafka you have KTable on Telemetry with 3 records and join with say > Scores which could be KTable or Kstrem and you start your streaming query > as mentioned above it will give just 1 row as expected. However, if there > is a new value for the same key with timestamp greater than previous max > will be added to the Telemetry it will output the new value as well and > that is main idea about the streaming anyway you want to see the changed > value. So once you started streaming you will get whatever is the outcome > of your > Darshan, Thanks for the reply. I've already implemented this job using Kafka Streams, so I am aware of how KTables behaves. I would have helped if I had included some sample data in my post, so here it is. If you have this data coming into Telemetry: ts, item, score, source 0, item1, 1, source1 1, item1, 1, source1 2, item1, 1, source1 And this comes into Scores: ts, item, score 3, item1, 3 Flink will output 3 records from the queries I mentioned: (3, item1, 3, source1) (3, item1, 3, source1) (3, item1, 3, source1) In contrast, if you run the query in Kafka Stream configuring Telemetry as a KTable keyed by (item, source), the output will be a single record. In Telemetry record for key (item1, source1) at time 1 will overwrite the record at time 0, and the record at time 2 will overwrite the one at time 1. By the time the record at time 3 comes in via Scores, it will be joined only with the record from time 2 in Telemetry. Yes, it is possible for the Kafka Streams query to output multiple records if the records from the different streams are not time aligned, as Kafka Streams only guarantees a best effort aligning the streams. But in the common case the output will be a single record. I think in fllink you can do the same, from your telemeter stream/table you > can create the LatestTelemetry table using similar sql(I am sure it should > give you latest timestamp's data) as you did with the RDBMS and then join > with scores table. You should get similar results to KTable or any other > streaming system. > Not sure if you missed it, but I actually executed the query to define the LatestTelemetry table in Flink using that query and joined against it. The output was the same three records.