Hi Seth,

Some clarifications to point out:

Quick follow up question. Is there some way to notify a TimestampAssigner that 
is consuming from an idle source? 

In Flink, idle sources would emit a special idleness marker event that notifies 
downstream time-based operators to not wait for its watermark.
This would avoid the need for the TimestampAssigner to generate watermarks just 
for the sake of letting downstream operators to advance their watermark in the 
event of idle sources.
However, there is 2 cases for idle sources and only one of them is handled at 
the moment: 1) the source subtask simply has no Kafka partitions to read from, 
or 2) the Kafka partitions do not have any records.
Only case 1) is handled, as of Flink 1.3+.

I think you are correct. This stream is consumed from Kafka and the number of 
partitions is much less than the parallelism of the program so there would be 
many partitions that never forward watermarks greater than Long.Min_Value.

In this case, Flink consumer subtasks that do not have Kafka partitions would 
mark themselves as idle and emit the special idleness marker.
Therefore, the expected behavior is that downstream time-based operators will 
not wait on these idle sources, even if they don’t produce watermarks.

Best,
Gordon

On 14 December 2017 at 6:08:20 PM, Fabian Hueske (fhue...@gmail.com) wrote:

Hi Seth,

that's not possible with the current interface.
There have been some discussions about how to address issues of idle sources 
(or partitions).
Aljoscha (in CC) should know more about that.

Best, Fabian

2017-12-13 18:13 GMT+01:00 Seth Wiesman <swies...@mediamath.com>:
Quick follow up question. Is there some way to notify a TimestampAssigner that 
is consuming from an idle source?

 



Seth Wiesman | Software Engineer, Data


4 World Trade Center, 46th Floor, New York, NY 10007


 

 

From: Seth Wiesman <swies...@mediamath.com>
Date: Wednesday, December 13, 2017 at 12:04 PM
To: Timo Walther <twal...@apache.org>, "user@flink.apache.org" 
<user@flink.apache.org>
Subject: Re: Watermark in broadcast

 

Hi Timo,

 

I think you are correct. This stream is consumed from Kafka and the number of 
partitions is much less than the parallelism of the program so there would be 
many partitions that never forward watermarks greater than Long.Min_Value.

 

Thank you for the quick response.  

 



Seth Wiesman | Software Engineer, Data


4 World Trade Center, 46th Floor, New York, NY 10007


 

 

From: Timo Walther <twal...@apache.org>
Date: Wednesday, December 13, 2017 at 11:46 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: Watermark in broadcast

 

Hi Seth,

are you sure that all partitions of the broadcasted stream send a watermark? 
processWatermark is only called if a minimum watermark arrived from all 
partitions.

Regards,
Timo

Am 12/13/17 um 5:10 PM schrieb Seth Wiesman:

Hi,

 

How are watermarks propagated during a broadcast partition? I have a 
TwoInputStreamTransformation that takes a broadcast stream as one of its 
inputs. Both streams are assigned timestamps and watermarks before being 
connected however I only ever see watermarks from my non-broadcast stream. Is 
this expected behavior? Currently I have overridden processWatermark1 to 
unconditionally call processWatermark but that does not seem like an ideal 
solution.

 

Thank you,



Seth Wiesman | Software Engineer, Data


4 World Trade Center, 46th Floor, New York, NY 10007


 

 


Reply via email to