RE: need flink support framework for dependency injection

2024-03-27 Thread Schwalbe Matthias
Hi Ruibin,

Our code [1] targets a very old version of Flink 1.8, for current development 
my employer didn’t decide (yet?) to contribute it to the public.
That old code does not yet contain the abstractions for setup of state 
primitive, so let me sketch it here:

  *   Derive a specific implementation per operator from 
SetupDualUnboundedBoundedState
  *   All state primitive setup is then implemented in the respective open() 
function
  *   Derive the operator and other (savepoint reader/writer) from this state 
setup class/trait
  *   For convenience there is a boundedMode field that tells the operator 
whether run in bounded/streaming mode (as the time semantics are similar yet 
different)
  *   This is one example where we ‘patched’ the non-public runtime 
implementation (mentioned in that other mail), therefore it needs to be 
maintained Flink version by Flink version 

Feel free to query details …

Sincere greetings

Thias


[1] https://github.com/VisecaCard/flink-commons
[2] common savepoint setup:
/** Marker trait for flink functions/operators that can run in both Bounded 
(BATCH) and Unbounded (PIPELINED) mode,
* and for auxiliary functions for savepoint priming and reading.
*
* @note Derive a specific trait/mixin for each respective flink streaming 
function/operator that initializes
*   state primitives. Mixin that trait into auxiliary functions for 
savepoint priming and reading, to have a common
*   state initialization.
* @note Call 
[[ch.viseca.flink.operators.state.SetupDualUnboundedBoundedState#open(org.apache.flink.api.common.functions.RuntimeContext)]]
*   in order to initialize this field.
*
* */
trait SetupDualUnboundedBoundedState extends Serializable {

  /** Determines at runtime, if the job DAG is running in Bounded (BATCH) or 
Unbounded (PIPELINED) mode.
   *
   * @note Call 
[[ch.viseca.flink.operators.state.SetupDualUnboundedBoundedState#open(org.apache.flink.api.common.functions.RuntimeContext)]]
   *   in order to initialize this field.
   * */
  @transient var boundedMode = false

  /** Opens the respective function/operator for initialization of state 
primitives */
  def open(rtc: RuntimeContext): Unit = {
boundedMode =
  rtc match {
case src: StreamingRuntimeContext =>
  src.getTaskManagerRuntimeInfo.getConfiguration
.get[RuntimeExecutionMode](ExecutionOptions.RUNTIME_MODE) == 
RuntimeExecutionMode.BATCH
case _ => false
  }
  }
}




From: Ruibin Xing 
Sent: Wednesday, March 27, 2024 10:41 AM
To: Schwalbe Matthias 
Cc: Marco Villalobos ; Ganesh Walse 
; user@flink.apache.org
Subject: Re: need flink support framework for dependency injection

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Thias,

Could you share your approach to job setup using Spring, if that's possible? We 
also use Spring Boot for DI in jobs, primarily relying on profiles. I'm 
particularly interested in how you use the same job structures for different 
scenarios, such as reading savepoints. Thank you very much.

On Wed, Mar 27, 2024 at 3:12 PM Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi Ganesh,

I tend to agree with Marco. However your 'feature request' is very loose and 
leave much room for misunderstanding.

There are at least two scenarios for DI integration:
- DI for job setup:
  - we use spring for job setup, which
- lets us use the same job structure for (at least) 4 scenarios: 
streaming job, batch job for savepoint priming, savepoint reading, 
transformation for complex schema changes -> savepoint writing
- we also appreciate a very convenient integration of a layered 
configuration by means of spring profiles
- we can easily replace e.g. sources and sinks for test/local 
develop/debug scenarios
- however this can also easily be done without DI
- our approach is public (Apache 2.0 license), if interested
- DI for Flink would probably be counterproductive for a number of reasons 
(some guesswork here  )
- from what I see, the Flink code base is separated into two clearly 
distinct parts: the public API, and the non-public implementation
- Flink community takes great efforts to guarantee backwards 
compatibility of the public API, which also allows for replacing the underneath 
implementation
- the private API mostly uses the Service-Locator pattern (sort of) 
also to make it harder to introduce arbitrary changes to the architecture, 
which would be hard to include into the backwards-compatibility-guaranties
- if need be, in most cases you can change the non-public implementation
- by implementing a set of replacement classes (quite tedious) 
and wire them in, but
- that forces you to re-integrate for every new version of 
Flink (even more tedious )
- we've done so in select cases that were not of interest for the 
general public,
- alternatively, if your ext

RE: need flink support framework for dependency injection

2024-03-27 Thread Schwalbe Matthias
Hi Ganesh,

I tend to agree with Marco. However your 'feature request' is very loose and 
leave much room for misunderstanding.

There are at least two scenarios for DI integration:
- DI for job setup:
  - we use spring for job setup, which 
- lets us use the same job structure for (at least) 4 scenarios: 
streaming job, batch job for savepoint priming, savepoint reading, 
transformation for complex schema changes -> savepoint writing
- we also appreciate a very convenient integration of a layered 
configuration by means of spring profiles
- we can easily replace e.g. sources and sinks for test/local 
develop/debug scenarios
- however this can also easily be done without DI
- our approach is public (Apache 2.0 license), if interested
- DI for Flink would probably be counterproductive for a number of reasons 
(some guesswork here  )
- from what I see, the Flink code base is separated into two clearly 
distinct parts: the public API, and the non-public implementation
- Flink community takes great efforts to guarantee backwards 
compatibility of the public API, which also allows for replacing the underneath 
implementation
- the private API mostly uses the Service-Locator pattern (sort of) 
also to make it harder to introduce arbitrary changes to the architecture, 
which would be hard to include into the backwards-compatibility-guaranties
- if need be, in most cases you can change the non-public implementation
- by implementing a set of replacement classes (quite tedious) 
and wire them in, but 
- that forces you to re-integrate for every new version of 
Flink (even more tedious )
- we've done so in select cases that were not of interest for the 
general public,
- alternatively, if your extension use case is of public interest, it is better 
to make a proposal for a change and negotiate agreement with the community of 
whether and how to implement it
- we've also done so (recent example: [1])

WDYT? What is your case for DI? ...

Sincere greetings

Thias

[1] https://issues.apache.org/jira/browse/FLINK-26585




-Original Message-
From: Marco Villalobos  
Sent: Tuesday, March 26, 2024 11:40 PM
To: Ganesh Walse 
Cc: user@flink.apache.org
Subject: Re: need flink support framework for dependency injection

Hi Ganesh,

I disagree. I don’t think Flink needs a dependency injection framework. I have 
implemented many complex jobs without one. Can you please articulate why you 
think it needs a dependency injection framework, along with some use cases that 
will show its benefit?

I would rather see more features related to stream programming, 
data-governance, file based table formats, or ML.


> On Mar 26, 2024, at 2:27 PM, Ganesh Walse  wrote:
> 
> 
> 
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Not all the task slots are used. Are we missing a setting somewhere?

2024-02-23 Thread Schwalbe Matthias
Thought it would be something the like 

Jean-Marc, in future, please ‘reply all’ for your answer such that the 
community can see it as well 

Welcome anyway to the community

Thias



From: Jean-Marc Paulin 
Sent: Friday, February 23, 2024 1:14 PM
To: Schwalbe Matthias 
Subject: Re: Not all the task slots are used. Are we missing a setting 
somewhere?

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Ah, found our mistake..

Yes we do set the parallelism as part of the option we pass when we submit the 
job in the first place.

Sorry for the bother  Thias.

JM


From: Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>
Sent: Friday, February 23, 2024 10:21
To: Jean-Marc Paulin mailto:j...@uk.ibm.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org> 
mailto:user@flink.apache.org>>
Subject: [EXTERNAL] RE: Not all the task slots are used. Are we missing a 
setting somewhere?

This Message Is From an External Sender
This message came from outside your organization.
Report 
Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!2m-rCPRfAPMaEGU7OIElDv8JfEX3zZcJfO_N_APNRGxACPCSq5QB1T7f3b8qqNTFK8AEsf1J0Y4_Qp0vsX1y4EZ5$>


Hi Jean-Marc,



In absence of more context, did you adjust the parallelism of you job 
accordingly?



Thias



From: Jean-Marc Paulin mailto:j...@uk.ibm.com>>
Sent: Friday, February 23, 2024 11:06 AM
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Q: Not all the task slots are used. Are we missing a setting somewhere?



Hi,



We used to run with 3 task managers with numberOfTaskSlots = 2. So all together 
we had 6 task slots and our application used them all. Trying to increase 
throughput, we increased the number of task managers to 6. So now we have 12 
task slots all together. However our application still only uses 6 task slots, 
so we have 6 that are unused. Is there a setting I am missing somewhere ?



Thanks



JM

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.
Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Not all the task slots are used. Are we missing a setting somewhere?

2024-02-23 Thread Schwalbe Matthias
Hi Jean-Marc,

In absence of more context, did you adjust the parallelism of you job 
accordingly?

Thias

From: Jean-Marc Paulin 
Sent: Friday, February 23, 2024 11:06 AM
To: user@flink.apache.org
Subject: Q: Not all the task slots are used. Are we missing a setting somewhere?

Hi,

We used to run with 3 task managers with numberOfTaskSlots = 2. So all together 
we had 6 task slots and our application used them all. Trying to increase 
throughput, we increased the number of task managers to 6. So now we have 12 
task slots all together. However our application still only uses 6 task slots, 
so we have 6 that are unused. Is there a setting I am missing somewhere ?

Thanks

JM
Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Preparing keyed state before snapshot

2024-02-21 Thread Schwalbe Matthias
Good morning all,

Let me loop myself in …


  1.  Another even more convenient way to enable cache is to actually 
configure/assign RocksDB to use more off-heap memory for cache, you also might 
consider enabling bloom filters  (all depends on how large you key-space is 
(thousands/millions/billions/…)
Within the technological limits, RocksDB is hard to top, if keeping all data in 
memory is no option, this is the path I usually follow.

  1.  The other question on how to control the current-key from within snapshot 
state: you can acquire a pointer to the underlying state backend e.g. from 
within open() and the get hold of a pointer of the specific state primitive, 
and set the current key directly.
In order to find out how to do that, put a breakpoint in debugger and walk up a 
couple of call stack frames, and/or walk into the value setters and model after 
how it is done there.
Mind though, to restore the current key, if you happen to change it to another 
key.
Doing this e.g. in initializeState() is time-insensitive, because this happens 
outside the ‘hot’ code paths.

  1.  If the number of elements to store is small, you can store it in operator 
state and initialize your local structure in initializeState() from it, you 
probably would want to keep the data in serialized form in operator state, 
since you mentioned, serialization would be expensive.
  2.  There is another API (which I don’t remember the name of) that allows you 
to store operator state as BLOB directly if that would be a doable option for 
you.

Sincere greetings

Thias




From: Zakelly Lan 
Sent: Wednesday, February 21, 2024 8:04 AM
To: Lorenzo Nicora 
Cc: Flink User Group 
Subject: Re: Preparing keyed state before snapshot

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Lorenzo,

I think the most convenient way is to modify the code of the state backend, 
adding a k-v cache as you want.

Otherwise IIUC, there's no public interface to get keyContext. But well, you 
may try something hacky. You may use the passed-in `Context` instance in 
processElement, and leverage java reflection to get the KeyedProcessOperator 
instance, where you can perform setCurrentKey().


Best,
Zakelly

On Wed, Feb 21, 2024 at 1:05 AM Lorenzo Nicora 
mailto:lorenzo.nic...@gmail.com>> wrote:
Thanks Zakelly,

I'd need to do something similar, with a map containing my non-serializable 
"state", similar to the kvCache in FastTop1Fucntion.

But I am not sure I understand how I can set the keyed state for a specific 
key, in snapshotState().
FastTop1Function seems to rely on keyContext set via setKeyContext(). This 
method is not part of the API. I see it's set specifically for 
AbstractTopNFuction in StreamExecRank.
How can I do something similar without modifying the Flink runtime?

Lorenzo


On Sun, 18 Feb 2024 at 03:42, Zakelly Lan 
mailto:zakelly@gmail.com>> wrote:
Hi Lorenzo,

It is not recommended to do this with the keyed state. However there is an 
example in flink code (FastTop1Function#snapshotState) [1] of setting keys when 
snapshotState().

Hope this helps.

[1] 
https://github.com/apache/flink/blob/050503c65f5c5c18bb573748ccbf5aecce4ec1a5/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/rank/FastTop1Function.java#L165

Best,
Zakelly

On Sat, Feb 17, 2024 at 1:48 AM Lorenzo Nicora 
mailto:lorenzo.nic...@gmail.com>> wrote:
Hi Thias

I considered CheckpointedFunction.
In snapshotState() I would have to update the state of each key, extracting the 
in-memory "state" of each key and putting it in the state with 
state.update(...) .
This must happen per key,
But snapshotState() has no visibility of the keys. And I have no way of 
selectively accessing the state of a specific key to update it.
Unless I am missing something

Thanks
Lorenzo


On Fri, 16 Feb 2024 at 07:21, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Good morning Lorenzo,

You may want to implement 
org.apache.flink.streaming.api.checkpoint.CheckpointedFunction interface in 
your KeyedProcessFunction.
Btw. By the time initializeState(…) is called, the state backend is fully 
initialized and can be read and written to (which is not the case for when the 
open(…) function is called.
In initializeState(…) you also get access to state of different operator key.
SnapshotState(…) is called as part of the (each) checkpoint in order to store 
data.

Sincere greetings

Thias

From: Lorenzo Nicora mailto:lorenzo.nic...@gmail.com>>
Sent: Thursday, February 15, 2024 7:50 PM
To: Flink User Group mailto:user@flink.apache.org>>
Subject: Preparing keyed state before snapshot

Hello everyone,

I have a convoluted problem.

I am implementing a KeyedProcessFunction that keeps some non-serializable 
"state" in memory, in a transient Map (key = stream key, value = the 
non-serializable "state").

I can extract a serializable representation to p

RE: Preparing keyed state before snapshot

2024-02-15 Thread Schwalbe Matthias
Good morning Lorenzo,

You may want to implement 
org.apache.flink.streaming.api.checkpoint.CheckpointedFunction interface in 
your KeyedProcessFunction.
Btw. By the time initializeState(…) is called, the state backend is fully 
initialized and can be read and written to (which is not the case for when the 
open(…) function is called.
In initializeState(…) you also get access to state of different operator key.
SnapshotState(…) is called as part of the (each) checkpoint in order to store 
data.

Sincere greetings

Thias

From: Lorenzo Nicora 
Sent: Thursday, February 15, 2024 7:50 PM
To: Flink User Group 
Subject: Preparing keyed state before snapshot

Hello everyone,

I have a convoluted problem.

I am implementing a KeyedProcessFunction that keeps some non-serializable 
"state" in memory, in a transient Map (key = stream key, value = the 
non-serializable "state").

I can extract a serializable representation to put in Flink state, and I can 
load my in-memory "state" from the Flink state. But these operations are 
expensive.

Initializing the in-memory "state" is relatively easy. I do it lazily, in 
processElement(), on the first record for the key.

The problem is saving the in-memory "state" to Flink state.
I need to do it only before the state snapshot. But KeyedProcessFunction has no 
entrypoint called before the state snapshot.
I cannot use CheckpointedFunction.snapshotState(), because it does not work for 
keyed state.

Any suggestions?

Note that I cannot use operator state nor a broadcast state.
Processing is keyed. Every processed record modifies the in-memory "state" of 
that key. If the job rescale, the state of the key must follow the partition.


Regards
Lorenzo
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Idleness not working if watermark alignment is used

2024-02-06 Thread Schwalbe Matthias
Hi Alexis,

Yes, I guess so, while not utterly acquainted with that part of the code.
Apparently the SourceCoordinator cannot come up with a proper watermark time, 
if watermarking is turned off (idle mode of stream), and then it deducts 
watermark time from the remaining non-idle sources.
It’s consistent with how idling-state of data streams is designed.
However it still remains the notion of that one needs to compensate for 
.withIdleness(…) if correctness is any consideration.
Using .withIdleness(…) is IMHO only justified in rare cases where implications 
are fully understood.

If a source is not configured with .withIdleness(…) and becomes factually idle, 
all window aggregations or stateful stream joins stall until that source 
becomes active again (= added latency)

Thias

From: Alexis Sarda-Espinosa 
Sent: Tuesday, February 6, 2024 9:48 AM
To: Schwalbe Matthias 
Cc: user 
Subject: Re: Idleness not working if watermark alignment is used

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Matthias,

thanks for looking at this. Would you then say this comment in the source code 
is not really valid?
https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L181

That's where the log I was looking at is created.

Regards,
Alexis.

Am Di., 6. Feb. 2024 um 08:54 Uhr schrieb Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>:
Good morning Alexis,

withIdleness(…) is easily misunderstood, it actually means that the thus 
configured stream is exempt from watermark processing after 5 seconds (in your 
case).
Hence also watermark alignment is turned off for the stream until a new event 
arrives.

.withIdleness(…) is good for situations where you prefer low latency over 
correctness (causality with respect to time order).
Downstream operators can choose a manual implementation of watermark behavior 
in order to compensate for the missing watermarks.

IMHO, because I see so many people make the same mistake I would rather rename 
.withIdleness(…) to something like .idleWatermarkExcemption(…) to make it more 
obvious.

Hope this helps


Thias



From: Alexis Sarda-Espinosa 
mailto:sarda.espin...@gmail.com>>
Sent: Monday, February 5, 2024 6:04 PM
To: user mailto:user@flink.apache.org>>
Subject: Re: Idleness not working if watermark alignment is used

Ah and I forgot to mention, this is with Flink 1.18.1

Am Mo., 5. Feb. 2024 um 18:00 Uhr schrieb Alexis Sarda-Espinosa 
mailto:sarda.espin...@gmail.com>>:
Hello,

I have 2 Kafka sources that are configured with a watermark strategy 
instantiated like this:

WatermarkStrategy.forBoundedOutOfOrderness(maxAllowedWatermarkDrift)
.withIdleness(idleTimeout) // 5 seconds currently
.withWatermarkAlignment(alignmentGroup, maxAllowedWatermarkDrift, 
Duration.ofSeconds(1L))

The alignment group is the same for both, but each one consumes from a 
different topic. During a test, I ensured that one of the topics didn't receive 
any messages, but when I check the logs I see multiple entries like this:

Distributing maxAllowedWatermark=1707149933770 of group=dispatcher to 
subTaskIds=[0] for source Source: GenericChangeMessageDeserializer.

where maxAllowedWatermark grows all the time.

Maybe my understanding is wrong, but I think this means the source is never 
marked as idle even though it didn't receive any new messages in the Kafka 
topic?

Regards,
Alexis.

Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung

RE: Idleness not working if watermark alignment is used

2024-02-05 Thread Schwalbe Matthias
Good morning Alexis,

withIdleness(…) is easily misunderstood, it actually means that the thus 
configured stream is exempt from watermark processing after 5 seconds (in your 
case).
Hence also watermark alignment is turned off for the stream until a new event 
arrives.

.withIdleness(…) is good for situations where you prefer low latency over 
correctness (causality with respect to time order).
Downstream operators can choose a manual implementation of watermark behavior 
in order to compensate for the missing watermarks.

IMHO, because I see so many people make the same mistake I would rather rename 
.withIdleness(…) to something like .idleWatermarkExcemption(…) to make it more 
obvious.

Hope this helps


Thias



From: Alexis Sarda-Espinosa 
Sent: Monday, February 5, 2024 6:04 PM
To: user 
Subject: Re: Idleness not working if watermark alignment is used

Ah and I forgot to mention, this is with Flink 1.18.1

Am Mo., 5. Feb. 2024 um 18:00 Uhr schrieb Alexis Sarda-Espinosa 
mailto:sarda.espin...@gmail.com>>:
Hello,

I have 2 Kafka sources that are configured with a watermark strategy 
instantiated like this:

WatermarkStrategy.forBoundedOutOfOrderness(maxAllowedWatermarkDrift)
.withIdleness(idleTimeout) // 5 seconds currently
.withWatermarkAlignment(alignmentGroup, maxAllowedWatermarkDrift, 
Duration.ofSeconds(1L))

The alignment group is the same for both, but each one consumes from a 
different topic. During a test, I ensured that one of the topics didn't receive 
any messages, but when I check the logs I see multiple entries like this:

Distributing maxAllowedWatermark=1707149933770 of group=dispatcher to 
subTaskIds=[0] for source Source: GenericChangeMessageDeserializer.

where maxAllowedWatermark grows all the time.

Maybe my understanding is wrong, but I think this means the source is never 
marked as idle even though it didn't receive any new messages in the Kafka 
topic?

Regards,
Alexis.

Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Issue with Flink Job when Reading Data from Kafka and Executing SQL Query (q77 TPC-DS)

2024-01-03 Thread Schwalbe Matthias
Hi Vladimir,

I might be mistaken, here my observations:


  *   List res = 
CollectionUtil.iteratorToList(result.execute().collect()); will block until the 
job is finished
  *   However, we have a unbounded streaming job which will not finish until 
you cancel it
  *   If you just want to print results, the print sink will do
  *   You might want to directly iterate on the iterator returned by 
result.execute().collect()
  *   And make sure to close/dispose of the iterator once done

Sincere greetings
Thias

From: Alexey Novakov via user 
Sent: Tuesday, January 2, 2024 12:05 PM
To: Вова Фролов 
Cc: user@flink.apache.org
Subject: Re: Issue with Flink Job when Reading Data from Kafka and Executing 
SQL Query (q77 TPC-DS)

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Vladimir,

As I see, your SQL query is reading data from the Kafka topic and pulls all 
data to the client side. The "*.collect" method is quite network/memory 
intensive. You probably do want that.

If you want to debug/print the ingested data via SQL, I would recommend the 
"print" connector. 
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/print/
It means you could INSERT from SELECT to the print table.

Also, could it be that Flink becomes silent because it has read all data from 
the Kafka topic and just waits for any new record to be inserted to the topic?
Although I would not expect those Node - 1 disconnected messages in such a 
scenario.

Alexey

On Tue, Dec 19, 2023 at 10:40 AM Вова Фролов 
mailto:vovafrolov1...@gmail.com>> wrote:
Hello Flink Community,
I am texting to you with an issue I have encountered while using Apache Flink 
version 1.17.1. In my Flink Job, I am using Kafka version 3.6.0 to ingest data 
from TPC-DS(current tpcds100 target size tpcds1), and then I am executing 
SQL queries, specifically, the q77 query, on the data in Kafka.
Versions of Components in Use:

•Apache Flink: 1.17.1

•Kafka: 3.6.0
Kafka Settings: (kafka cluster consists of 9 topics and each has: 512 
partitions, replication factor 3)

•num.network.threads=12

•num.io.threads=10

•socket.send.buffer.bytes=2097152

•socket.request.max.bytes=1073741824
Flink Job Code:
Creating tables with Kafka connector:
public static final String CREATE_STORE_SALES = "CREATE TEMPORARY TABLE 
store_sales_kafka(\n" +
"  ss_sold_date_sk INTEGER,\n" +

// here are 21 columns

"  ss_net_profit DECIMAL(7, 2)\n" +


") WITH (\n" +

"   'connector' = 'kafka',\n" +

"   'key.format' = 'avro',\n" +

"   'key.fields' = 'ss_item_sk;ss_ticket_number',\n" +

"   'properties.group.id' = 'store_sales_group',\n" 
+

"   'scan.startup.mode' = 'earliest-offset',\n" +

"   'properties.bootstrap.servers' = 'xyz1:9092, xyz2:9092, xyz3:9092, 
xyz4:9092, xyz5:9092',\n" +

"   'topic' = 'storeSales100',\n" +

"'value.format' = 'avro',\n" +

"'value.fields-include' = 'EXCEPT_KEY'\n" +

"   );";

Q77 with Flink

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);


tEnv.executeSql(Tpcds100.CREATE_STORE_SALES);

Table result = tEnv.sqlQuery(Tpcds100.Q77_WITH_KAFKA);

List res = CollectionUtil.iteratorToList(result.execute().collect());

for (Row row : res) {

System.out.println(row);

}


Flink Job Configuration:
I tried several configurations, but here are the main ones:

1. slots per TaskManager 10, parallelism 100;

2. slots per TaskManager 5, parallelism 50;

3. slots per TaskManager 15, parallelism 375;
The result is always about the same
Logs and Errors:
The logs from my Flink Job do not contain any errors.
Description of the Issue:
The Flink Job runs smoothly for approximately 5 minutes, during which 
data processing and transformations occur as expected. However, after this 
initial period, the Flink Job seems to enter a state where no further changes 
or updates are observed in the processed data. In the logs I see a message:
 “
2023-12-18 INFO  org.apache.kafka.clients.NetworkClient   
[] - [AdminClient clientId=store_group-enumerator-admin-client] Node -1 
disconnected
“
, that is written every 5 minutes
It's worth noting that, despite the lack of errors in the logs, the 
Flink Job essentially becomes unresponsive or ceases to make progress, 
resulting in a stagnation of data processing.
This behavior is consistent across different configurations tested, 
including variations in the number of slots per TaskManager and parallelism.
While the logs do not indicate any errors, they do not provide insights 
into the reason behind the observed data processing stagnation.

Cluster consists of 5 machines and each has:

•2 CPU x86-64 20 cores, 40 threads, 2200 MHz base frequency, 3600 MHz 
max turbo frequency. 40 cores, 80 threads total on 

RE: Updating existing state with state processor API

2023-10-31 Thread Schwalbe Matthias
Hi Alexis,

Sorry for the late answer … got carried away with other tasks 

I hope I get this right as there is a mixture of concepts in my mind with 
respect for the old and the savepoint API. I’ll try to answer for the new API.

  *   If you want to patch an existing savepoint, you would load it into a 
SavepointWriter [1], this will basically copy the existing Savepoint, you then:
  *   can remove or/and add state for operators [2] [3]
  *   then you write the savepoint into the new location [4]
  *   the old API had the restriction, that you had to change at least one 
operator state in order to be able to write the savepoint out to a new 
location. I don’t believe this restriction applies for the new API
  *   If you want to patch an existing state (e.g. for changing an incompatible 
schema),
  *   you need to load/bootstrap this state by means of SavepointReader [5] and 
some of the readXXXState functions
  *   then remove the existing state from previous savepoint [2] and add the 
new state [3] with the bootstrap transformation obtained above
  *   if you want a state to be empty it suffices to [2] remove the existing 
state, if existed

I hope this helps, … happy to hear someone correct me, if mistaken


Sincere regards

Thias


[1] 
org.apache.flink.state.api.SavepointWriter#fromExistingSavepoint(java.lang.String)
[2] 
org.apache.flink.state.api.SavepointWriter#removeOperator(org.apache.flink.state.api.OperatorIdentifier)
[3] 
org.apache.flink.state.api.SavepointWriter#withOperator(org.apache.flink.state.api.OperatorIdentifier,
 org.apache.flink.state.api.StateBootstrapTransformation)
[4] org.apache.flink.state.api.SavepointWriter#write
[5] 
org.apache.flink.state.api.SavepointReader#read(org.apache.flink.streaming.api.environment.StreamExecutionEnvironment,
 java.lang.String, org.apache.flink.runtime.state.StateBackend)

From: Alexis Sarda-Espinosa 
Sent: Friday, October 27, 2023 4:29 PM
To: Schwalbe Matthias 
Cc: user 
Subject: Re: Updating existing state with state processor API

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Matthias,

Thanks for the response. I guess the specific question would be, if I work with 
an existing savepoint and pass an empty DataStream to 
OperatorTransformation#bootstrapWith, will the new savepoint end up with an 
empty state for the modified operator, or will it maintain the existing state 
because nothing was changed?

Regards,
Alexis.

Am Fr., 27. Okt. 2023 um 08:40 Uhr schrieb Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>:
Good morning Alexis,

Something like this we do all the time.
Read and existing savepoint, copy some of the not to be changed operator states 
(keyed/non-keyed) over, and process/patch the remaining ones by transforming 
and bootstrapping to new state.

I could spare more details for more specific questions, if you like 

Regards

Thias

PS: I’m currently working on this ticket in order to get some glitches removed: 
FLINK-26585<https://issues.apache.org/jira/browse/FLINK-26585>


From: Alexis Sarda-Espinosa 
mailto:sarda.espin...@gmail.com>>
Sent: Thursday, October 26, 2023 4:01 PM
To: user mailto:user@flink.apache.org>>
Subject: Updating existing state with state processor API

Hello,

The documentation of the state processor API has some examples to modify an 
existing savepoint by defining a StateBootstrapTransformation. In all cases, 
the entrypoint is OperatorTransformation#bootstrapWith, which expects a 
DataStream. If I pass an empty DataStream to bootstrapWith and then apply the 
resulting transformation to an existing savepoint, will the transformation 
still receive data from the existing state?

If the aforementioned is incorrect, I imagine I could instantiate a 
SavepointReader and create a DataStream of the existing state with it, which I 
could then pass to the bootstrapWith method directly or after "unioning" it 
with additional state. Would this work?

Regards,
Alexis.

Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strict

RE: Updating existing state with state processor API

2023-10-27 Thread Schwalbe Matthias
Good morning Alexis,

Something like this we do all the time.
Read and existing savepoint, copy some of the not to be changed operator states 
(keyed/non-keyed) over, and process/patch the remaining ones by transforming 
and bootstrapping to new state.

I could spare more details for more specific questions, if you like 

Regards

Thias

PS: I’m currently working on this ticket in order to get some glitches removed: 
FLINK-26585


From: Alexis Sarda-Espinosa 
Sent: Thursday, October 26, 2023 4:01 PM
To: user 
Subject: Updating existing state with state processor API

Hello,

The documentation of the state processor API has some examples to modify an 
existing savepoint by defining a StateBootstrapTransformation. In all cases, 
the entrypoint is OperatorTransformation#bootstrapWith, which expects a 
DataStream. If I pass an empty DataStream to bootstrapWith and then apply the 
resulting transformation to an existing savepoint, will the transformation 
still receive data from the existing state?

If the aforementioned is incorrect, I imagine I could instantiate a 
SavepointReader and create a DataStream of the existing state with it, which I 
could then pass to the bootstrapWith method directly or after "unioning" it 
with additional state. Would this work?

Regards,
Alexis.

Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Window aggregation on two joined table

2023-09-21 Thread Schwalbe Matthias
… well yes and no:

  *   If the second table is a small table used for enrichment, you can also 
mark it as broadcast table, but I don’t know how to do that on table API
  *   If the second table has significant data and significant update, the you 
need to configure watermarking/event time semantics on the second table as well
  *   The logic is this:
 *   Your join operator only generates output windows once the event time 
passes by the end of the time window
 *   The event time/watermark time of you join operator is the minimum 
watermark time of all inputs
 *   Because your second table does not emit watermark, it’s watermark time 
 remains at Long.MinValue, hence also the operator time stays there
  *   Another way to make progress is, in case your second table does not 
update watermarks/data often enough, to mark the source with an idle watermark 
generator in which case it is rendered as ‘timeless’ and does not prevent time 
progress in your join operator
 *   Again, not sure how to configure this

Ancora cari saluti

Thias





From: Eugenio Marotti 
Sent: Thursday, September 21, 2023 2:35 PM
To: Schwalbe Matthias 
Cc: user@flink.apache.org
Subject: Re: Window aggregation on two joined table

Hi Matthias,

No the second table doesn’t have an event time and a watermark specified. In 
order for the window to work do I need a watermark also on the second table?

Thanks
Eugenio


Il giorno 21 set 2023, alle ore 13:45, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> ha scritto:

Ciao Eugenio,

I might be mistaken, but did you specify the event time for the second table 
like you did for the first table (watermark(….))?
I am no so acquainted with table api (doing more straight data stream api 
work), but I assume this join and windowing should be by event time.

What do you think?

Cari saluti

Thias


From: Eugenio Marotti 
mailto:ing.eugenio.maro...@gmail.com>>
Sent: Thursday, September 21, 2023 8:56 AM
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Window aggregation on two joined table

Hi,

I’m trying to execute a window aggregation on two joined table from two Kafka 
topics (upsert fashion), but I get no output. Here’s the code I’m using:

This is the first table from Kafka with an event time watermark on ‘data_fine’ 
attribute:


final TableDescriptor phasesDurationsTableDescriptor = 
TableDescriptor.forConnector("upsert-kafka")
   .schema(Schema.newBuilder()
 .column("id_fascicolo", DataTypes.BIGINT().notNull())
 .column("nrg", DataTypes.STRING())
 .column("giudice", DataTypes.STRING())
 .column("oggetto", DataTypes.STRING())
 .column("codice_oggetto", DataTypes.STRING())
 .column("ufficio", DataTypes.STRING())
 .column("sezione", DataTypes.STRING())
 .column("fase_completata", DataTypes.BOOLEAN())
 .column("fase", DataTypes.STRING().notNull())
 .column("durata", DataTypes.BIGINT())
 .column("data_inizio", DataTypes.TIMESTAMP_LTZ(3))
 .column("data_fine", DataTypes.TIMESTAMP_LTZ(3))
 .watermark("data_inizio", "data_inizio - INTERVAL '1' SECOND")
 .primaryKey("id_fascicolo", "fase")
 .build())
   .option(KafkaConnectorOptions.TOPIC, 
List.of("sicid.processor.phases-durations"))
   .option(KafkaConnectorOptions.PROPS_BOOTSTRAP_SERVERS, KAFKA_HOST)
   .option(KafkaConnectorOptions.KEY_FORMAT, "json")
   .option(KafkaConnectorOptions.VALUE_FORMAT, "json")
   .build();
tEnv.createTable("PhasesDurations_Kafka", phasesDurationsTableDescriptor);
Table phasesDurationsTable = tEnv.from("PhasesDurations_Kafka”);

Here’s the second table:

final TableDescriptor averageJudgeByPhaseReportTableDescriptor = 
TableDescriptor.forConnector("upsert-kafka")
   .schema(Schema.newBuilder()
 .column("giudice", DataTypes.STRING().notNull())
 .column("fase", DataTypes.STRING().notNull())
 .column("media_mobile", DataTypes.BIGINT())
 .primaryKey("giudice", "fase")
 .build())
   .option(KafkaConnectorOptions.TOPIC, 
List.of("sicid.processor.average-judge-by-phase-report"))
   .option(KafkaConnectorOptions.PROPS_BOOTSTRAP_SERVERS, KAFKA_HOST)
   .option(KafkaConnectorOptions.KEY_FORMAT, "json")
   .option(KafkaConnectorOptions.VALUE_FORMAT, "json")
   .option(KafkaConnectorOptions.PROPS_GROUP_ID, 
"average-judge-by-phase-report")
   .build();
tEnv.createTable("AverageJudgeByPhaseReport_Kafka", 
averageJudgeByPhaseReportTableDescriptor);
Table 

RE: Window aggregation on two joined table

2023-09-21 Thread Schwalbe Matthias
Ciao Eugenio,

I might be mistaken, but did you specify the event time for the second table 
like you did for the first table (watermark(….))?
I am no so acquainted with table api (doing more straight data stream api 
work), but I assume this join and windowing should be by event time.

What do you think?

Cari saluti

Thias


From: Eugenio Marotti 
Sent: Thursday, September 21, 2023 8:56 AM
To: user@flink.apache.org
Subject: Window aggregation on two joined table

Hi,

I’m trying to execute a window aggregation on two joined table from two Kafka 
topics (upsert fashion), but I get no output. Here’s the code I’m using:

This is the first table from Kafka with an event time watermark on ‘data_fine’ 
attribute:


final TableDescriptor phasesDurationsTableDescriptor = 
TableDescriptor.forConnector("upsert-kafka")
   .schema(Schema.newBuilder()
 .column("id_fascicolo", DataTypes.BIGINT().notNull())
 .column("nrg", DataTypes.STRING())
 .column("giudice", DataTypes.STRING())
 .column("oggetto", DataTypes.STRING())
 .column("codice_oggetto", DataTypes.STRING())
 .column("ufficio", DataTypes.STRING())
 .column("sezione", DataTypes.STRING())
 .column("fase_completata", DataTypes.BOOLEAN())
 .column("fase", DataTypes.STRING().notNull())
 .column("durata", DataTypes.BIGINT())
 .column("data_inizio", DataTypes.TIMESTAMP_LTZ(3))
 .column("data_fine", DataTypes.TIMESTAMP_LTZ(3))
 .watermark("data_inizio", "data_inizio - INTERVAL '1' SECOND")
 .primaryKey("id_fascicolo", "fase")
 .build())
   .option(KafkaConnectorOptions.TOPIC, 
List.of("sicid.processor.phases-durations"))
   .option(KafkaConnectorOptions.PROPS_BOOTSTRAP_SERVERS, KAFKA_HOST)
   .option(KafkaConnectorOptions.KEY_FORMAT, "json")
   .option(KafkaConnectorOptions.VALUE_FORMAT, "json")
   .build();
tEnv.createTable("PhasesDurations_Kafka", phasesDurationsTableDescriptor);
Table phasesDurationsTable = tEnv.from("PhasesDurations_Kafka”);

Here’s the second table:

final TableDescriptor averageJudgeByPhaseReportTableDescriptor = 
TableDescriptor.forConnector("upsert-kafka")
   .schema(Schema.newBuilder()
 .column("giudice", DataTypes.STRING().notNull())
 .column("fase", DataTypes.STRING().notNull())
 .column("media_mobile", DataTypes.BIGINT())
 .primaryKey("giudice", "fase")
 .build())
   .option(KafkaConnectorOptions.TOPIC, 
List.of("sicid.processor.average-judge-by-phase-report"))
   .option(KafkaConnectorOptions.PROPS_BOOTSTRAP_SERVERS, KAFKA_HOST)
   .option(KafkaConnectorOptions.KEY_FORMAT, "json")
   .option(KafkaConnectorOptions.VALUE_FORMAT, "json")
   .option(KafkaConnectorOptions.PROPS_GROUP_ID, 
"average-judge-by-phase-report")
   .build();
tEnv.createTable("AverageJudgeByPhaseReport_Kafka", 
averageJudgeByPhaseReportTableDescriptor);
Table averageJudgeByPhaseReportTable = 
tEnv.from("AverageJudgeByPhaseReport_Kafka");

Table renamedAverageJudgeByPhaseReportTable = averageJudgeByPhaseReportTable
   .select(
 $("giudice").as("giudice_media"),
 $("fase").as("fase_media"),
 $("media_mobile")
   );



And here’s the code I’m experimenting with:

phasesDurationsTable
   .join(renamedAverageJudgeByPhaseReportTable)
   .where($("giudice").isEqual($("giudice_media")))
   .window(Tumble.over(lit(30).days()).on($("data_inizio")).as("w"))
   .groupBy(
 $("giudice"),
 $("w")
   )
   .select(
 $("giudice")
   )
   .execute().print();



Am I doing something wrong?
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Checkpoint jitter?

2023-09-13 Thread Schwalbe Matthias

Hi Mátyás,

Checkpoint are meant to be atomic in nature, i.e. everything is checkpointed at 
the more or less same time.
What you can do in newer Flink versions is to enable the Change Log Feature 
(see [1]) which spreads the actual I/O for writing checkpoint files to a longer 
period and to keep an additional change log file with the running updates.
What you get is a little more overall I/O but a quite flat I/O rate.


Hope this helps

Thias


[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/state/state_backends/#enabling-changelog


From: Őrhidi Mátyás 
Sent: Wednesday, September 13, 2023 2:47 PM
To: Gyula Fóra 
Cc: Hangxiang Yu ; user@flink.apache.org
Subject: Re: Checkpoint jitter?

Correct, thanks for the clarification Gyula!

On Wed, Sep 13, 2023 at 1:39 AM Gyula Fóra 
mailto:gyula.f...@gmail.com>> wrote:
No, I think what he means is to trigger the checkpoint at slightly different 
times at the different sources so the different parts of the pipeline would not 
checkpoint at the same time.

Gyula

On Wed, Sep 13, 2023 at 10:32 AM Hangxiang Yu 
mailto:master...@gmail.com>> wrote:
Hi, Matyas.
Do you mean something like adjusting checkpoint intervals dynamically or 
frequency of uploading files according to the pressure of the durable storage ?

On Wed, Sep 13, 2023 at 9:12 AM Őrhidi Mátyás 
mailto:matyas.orh...@gmail.com>> wrote:
Hey folks,

Is it possible to add some sort of jitter to the checkpointing logic for 
massively parallel jobs to mitigate the burst impact on the durable storage 
when a checkpoint is triggered?

Thanks,
Matyas


--
Best,
Hangxiang.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: kafka duplicate messages

2023-09-07 Thread Schwalbe Matthias
Hi Nick,

Short (and somewhat superficial answer):

  *   (assuming your producer supports exactly once mode (e.g. Kafka))
  *   Duplicates should only ever appear when your job restarts after a hiccup
  *   However if you job is properly configured (checkpointing/Kafka 
transactions) everything should be fine, provided
 *   The consumer of your kafka topic is in read-committed mode,
 *   In that case you should only see events produced every checkpoint cycle
 *   If the consumer of your produced topic is in read-uncommitted mode it 
will indeed see duplicates and needs to implement deduplication/idempotence 
manually

Hope this helps clarify the matter

Sincere greetings

Thias


From: nick toker 
Sent: Donnerstag, 7. September 2023 13:36
To: user 
Subject: kafka duplicate messages

Hi

i am configured with exactly ones
i see that flink producer send duplicate messages   ( sometime few copies)

that consumed latter only ones by other application,

How can  I avoid duplications ?

regards'
nick
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: using CheckpointedFunction on a keyed state

2023-09-07 Thread Schwalbe Matthias
Hi Krzysztof again,

Just for clarity … your sample code [1] tries to count the number of events per 
key.
Assuming this is your intention?

Anyway your previous implementation initialized the keyed state keyCounterState 
in the open function that is the right place to do this,
you just wouldn’t want to store values in the state from within the open() 
function.

InitializeState() and snapshotState() are mainly used to initialize operator 
state, not keyed state … refer to the relevant documentation.


Thias


From: Krzysztof Chmielewski 
Sent: Donnerstag, 7. September 2023 09:59
To: user 
Subject: using CheckpointedFunction on a keyed state

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi,
I have a toy Flink job [1] where I have a KeyedProcessFunction implementation 
[2] that also implements the CheckpointedFunction. My stream definition has 
.keyBy(...) call as you can see in [1].

However when I'm trying to run this toy job I'm getting an exception from 
CheckpointedFunction::initializeState method that says:
"Keyed state can only be used on a 'keyed stream', i.e., after a 'keyBy()' 
operation."

I got an impression from the docs that  CheckpointedFunction can be used   on a 
keyed stream and CheckpointedFunction::initializeState is for initialziing the 
state object.
Are my assumptions wrong? Is  initializeState onlty to set an initial value of 
a state per key and state object must be initialized in open() method?

Thanks,
Krzysztof
[1] 
https://github.com/kristoffSC/FlinkSimpleStreamingJob/blob/CheckpointedFunction_issueKeyedStream/src/main/java/org/example/DataStreamJob.java

[2] 
https://github.com/kristoffSC/FlinkSimpleStreamingJob/blob/CheckpointedFunction_issueKeyedStream/src/main/java/org/example/KeyCounter.java
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: updating keyed state in open method.

2023-09-07 Thread Schwalbe Matthias
Hi Krzysztof,

You cannot access keyed state in open().
Keyed state has a value per key.
In theory you would have to initialize per possible key, which is quite 
impractical.
However you don’t need to initialize state, the initial state per key default 
to the default value of the type (null for objects).
Just drop the initializer [1]

Hope this helps

Thias


[1] 
https://github.com/kristoffSC/FlinkSimpleStreamingJob/blob/033f74c427553fbfe0aaffe7d2af4382c09734ad/src/main/java/org/example/KeyCounter.java#L26




From: Krzysztof Chmielewski 
Sent: Donnerstag, 7. September 2023 09:38
To: user 
Subject: updating keyed state in open method.

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi,
I'm having a problem with my toy flink job where I would like to access a 
ValueState of a keyed stream. The Job setup can be found here [1], it is fairly 
simple
env
.addSource(new CheckpointCountingSource(100, 60))
.keyBy(value -> value)
.process(new KeyCounter())
.addSink(new ConsoleSink());

As you can see I'm using a keyBay and KeyCounter is extending 
KeyedProcessFunction.
It seems that keyed state cannot be update from RichFunction::open() method. Is 
that intended?

When I ran this example I have an exception that says:

Caused by: java.lang.NullPointerException: No key set. This method should not 
be called outside of a keyed context.
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:76)
at 
org.apache.flink.runtime.state.heap.StateTable.checkKeyNamespacePreconditions(StateTable.java:270)
at org.apache.flink.runtime.state.heap.StateTable.get(StateTable.java:260)
at org.apache.flink.runtime.state.heap.StateTable.get(StateTable.java:143)
at 
org.apache.flink.runtime.state.heap.HeapValueState.value(HeapValueState.java:72)
at org.example.KeyCounter.open(KeyCounter.java:26)


[1] 
https://github.com/kristoffSC/FlinkSimpleStreamingJob/blob/KeyBayIssue/src/main/java/org/example/DataStreamJob.java
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Rate Limit / Throttle Data to Send

2023-08-30 Thread Schwalbe Matthias
Hi Patricia,

What you try to implement can be achieved out-of-the-box by windowing.

I assume these packets of 100 event are by key but globally.
In that case use non-keyed windowing [1] with count trigger (100) [3] and maybe 
add a processing time trigger if it takes too long time to collect all 100 
events, then create the output with a process window function [2].

I hope this helps

Thias


[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/windows/#keyed-vs-non-keyed-windows
[2] 
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/windows/#processwindowfunction
[3] 
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/windows/#built-in-and-custom-triggers


From: patricia lee 
Sent: Wednesday, August 30, 2023 6:54 AM
To: user@flink.apache.org
Subject: Rate Limit / Throttle Data to Send

Hi,

I have a requirement that I need to send data to a third party with a limit 
number of elements with flow below.

kafkasource
mapToVendorPojo
processfunction
sinkToVendor

My implementation is I continuously add the elements to my list state
ListState in ProcessFunction and once it reaches 100 in size I emit 
the data and start collecting data again to another set of 100.

if (rateConfig == Iterables.size(appEventState.get()) {
List holder = new ArrayList();
appEventState.get().forEach(e -> holder.add(e));
collector.collect(holder);
appEventState.clear()
}

The problem I am getting is, "if " condition above never gets matched. Because 
the appEventState size is always 0 or 1 only. The rateConfig is set to 20.

What am I missing?

Thanks,
Patricia

Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Checkpoint/savepoint _metadata

2023-08-29 Thread Schwalbe Matthias
Hi Frederic,

I’ve once (upon a time ) had a similar situation when we changed from Flink 
1.8 to Flink 1.13 … It took me a long time to figure out.
Some hints where to start to look:

  *   _metadata file is used for
 *   Job manager state
 *   Smallish keyed state (in order to avoid too many small state files)
 *   Operator state (non-keyed)
  *   Does the operator that is getting blocked in initialization use operator 
state?
 *   Look for some condition that might cause it growing
 *   In my case back then, a minor condition caused the operator state 
being duplicated per operator parallelism when loading from a savepoint, which 
caused exponential growth per savepoint cycle
  *   You can obtain a local copy of this savepoint and try to load it by means 
of the state-processor-api
 *   Breaking into the debugger, at some point the _metadata file gets 
loaded and allows to determine which state actually had the run-away and what 
might have caused duplication

I hope this helps

Thias



From: Frederic Leger 
Sent: Monday, August 28, 2023 12:30 PM
To: user@flink.apache.org
Subject: Checkpoint/savepoint _metadata

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi team,

We use flink 1.16.0 with openjdk-11-jre mainly to run streaming jobs.
We do checkpoints with 2 min interval and savepoint when deploying new job 
version.
We also use rocksdb state backend for most of them.

We had a streaming job running for long without any issue and during a new 
deployment we could not launch it anymore, it was getting stuck on CREATING on 
one task, then was failing and restarting and so on.
In this Flink job, we handle a large data stream using key-based grouping. 
Inside a processFunction, we use MapState[Long, String] as our state storage, 
which keeps data with associated time limits (TTL) of 30 days.

The most relevant error we got from the logs was :

2023-08-02 12:40:52,186 ERROR akka.remote.EndpointWriter
   [] - Transient association error (association remains live)
akka.remote.OversizedPayloadException: Discarding oversized payload sent to 
Actor[akka.tcp://flink@10.1.1.5:31336/user/rpc/taskmanager_0#1435767402]:
 max allowed size 10485760 bytes, actual size of encoded class 
org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation was 42582878 bytes.

The solution to solve this issue was to increase akka.framesize from default 
(10MB) to 50MB
akka.framesize: 52428800b

After 16h of uptime, we wanted to move back the job to its initial cluster as 
it was running fine since then, but after the savepoint done, we could not 
launch it back and got this error :

2023-08-03 08:49:06,474 ERROR akka.remote.EndpointWriter
   [] - Transient association error (association remains live)
akka.remote.OversizedPayloadException: Discarding oversized payload sent to 
Actor[akka.tcp://flink@10.1.1.5:31358/user/rpc/taskmanager_0#1492669447]:
 max allowed size 52428800 bytes, actual size of encoded class 
org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation was 679594586 bytes.

After some research it seems to be related to _metadata file written when 
checkpointing/savepointing and this file has grown up amazingly in the past 16h 
from 50MB to more than 600MB if we compare the first ERROR and the last one.

Since then we were unable to launch back the job.

Increasing akka.framesize from 50MB to 1GB permit to avoid the above errors, 
but one task was remaining in CREATING state until failure.
We started to get java.lang.OutOfMemoryError: Java heap space on the 
jobmanager, then timeout between the taskmanagers and jobmanager.
The heap size set to avoid the OOM on the jobmanager was from 2GB to 20GB.
Increasing timeouts lead to other errors, like java.lang.OutOfMemoryError: Java 
heap space on the taskmanagers and so on to finally timeout and fail.

2023-08-03 10:28:32,191 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - The heartbeat 
of ResourceManager with id be6b26eb0a0a54e636c9fbfc5f9815f3 timed out.
2023-08-03 10:28:32,191 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Close 
ResourceManager connection be6b26eb0a0a54e636c9fbfc5f9815f3.
2023-08-03 10:28:32,191 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Connecting to 
ResourceManager 
akka.tcp://flink@10.1.1.3:46899/user/rpc/resourcemanager_0(a6fef33bff489d7e860c1017d2a34f50).
2023-08-03 10:28:38,411 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - The heartbeat 
of JobManager with id 39d49002792d881da6a5e7266c8ee58b timed out.
2023-08-03 10:28:38,412 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Close 
JobManager connection for 

RE: [E] RE: Recommendations on using multithreading in flink map functions in java

2023-08-18 Thread Schwalbe Matthias
… mirrored back to user list …

On additional thing you can do is to not split into 10 additional tasks but:

  *   Fan-out your original event into 10 copies (original key, 
1-of-10-algorithm-key, event),
  *   key by the combined key (original key, algorithm key)
  *   have a single operator chain that internally switches by algorithm key
  *   then collect by event id to enrich a final result
  *   much like mentioned in [1]
This made all the difference for us with orders of magnitude better overall 
latency and backpressure because we avoided multiple layers of parallelism (job 
parallelism * algorithm parallelism).

Thias

[1] Master Thesis, Dominik Bünzli, University of Zurich, 2021: 
https://www.merlin.uzh.ch/contributionDocument/download/14168<https://urldefense.com/v3/__https:/www.merlin.uzh.ch/contributionDocument/download/14168__;!!Op6eflyXZCqGR5I!D0WaqFZfknkd-7hl-VgoNQ_l5tszcDDoP-vY4yBoLTIBRev_Iqtkyrei7vIQtduLckRXkz5Q3SIo42ZmYhhONov02b1Cl1g$>


From: Vignesh Kumar Kathiresan 
Sent: Thursday, August 17, 2023 10:27 PM
To: Schwalbe Matthias 
Cc: liu ron ; dominik.buen...@swisscom.com
Subject: Re: [E] RE: Recommendations on using multithreading in flink map 
functions in java

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hello Thias,

Thanks for the explanation. The objective to achieve e2e 100 ms latency was to 
establish the latency vs ser/deser + I/O  tradeoff. 1 sec(when you execute all 
the 10 algorithms in sequence) vs ~100 ms(when you execute them in parallel).

My takeaway is that in streaming frameworks when you want to move your e2e 
latency towards the 100 ms end of the latency spectrum

  *   Separate the 10 algorithms as different tasks (unchain) so that they are 
executed in different threads
  *   Fan out the element(branch out) and send them to the each algorithm 
task(separate task)
  *   Incur the serialize/deserialize cost and try to avoid a network shuffle 
as much as possible(by having the same parallelism in all the 11 operators? so 
that they are run in a different thread but the same worker)
  *   Combine the results using some stateful process function finally.


On Wed, Aug 16, 2023 at 12:01 AM Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi Ron,

What you say is pretty much similar to what I’ve written , the difference is 
the focus:


  *   When you use a concurrency library, things are not necessarily running in 
parallel, they serialize/schedule the execution of tasks to the available CPU 
cores, and need synchronization
  *   i.e. you end up with total latency bigger than the 100ms (even if you’ve 
got 10 dedicated CPU cores, because of synchronization)
  *   the whole matter is affected by Amdahls law [1]

Back to your most prominent question: “What I want to know is how to achieve 
low latency per element processing”:
Strictly speaking, the only way to achieve the 100ms overall latency is to have 
10 dedicated CPU cores that don’t do anything else and avoid synchronization at 
any cost and events must have a minimum distance of 100ms .
This is not possible, but with a couple of nifty tricks you can come very close 
to it.
However another aspect is, that this practically only scales up to the maximum 
feasible number of CPU cores (512 core e.g.) in a system beyond which you 
cannot avoid serialization and synchronization.

The way I understand the design of Flink is:

  *   That the focus is on throughput with decent latency values
  *   Flink jobs can be scaled linearly in a wide range of parallelism
  *   i.e. within that range Flink does not run into the effects of Amdahls 
law, because it avoids synchronization among tasks
  *   This comes with a price: serialization efforts, and I/O cost
  *   A Flink (sub-)task is basically a message queue

 *   where incoming events sit in a buffer and are process one after the 
other (=latency),
 *   buffering incurs serialization (latency),
 *   outgoing messages for non-chained operators are also serialized and 
buffered (latency)
 *   before they get sent out to a downstream (sub-)task (configurable size 
and time triggers on buffer (latency))

  *   the difference that makes the difference is that all these forms of 
latency are linear to the number of events, i.e. the effects of Amdahls law 
don’t kick in

Independent of the runtime (Flink or non-Flink) it is good to use only a single 
means/granularity of parallelism/concurrency.
This way we avoid a lot of synchronization cost and avoid one level steal 
resources from other levels in an unpredictable way (= latency/jitter)

The solution that I proposed in my previous way does exactly this:

  *   it unifies the parallelism used for sharding (per key group parallelism) 
with the parallelism for the 10 calculation algorithms
  *   it scales linearly, avoids backpressure given enough resources, and has a 
decent overall latency (although not the 100ms)

In order to minimize serialization cost you could consider s

RE: Recommendations on using multithreading in flink map functions in java

2023-08-15 Thread Schwalbe Matthias

Hi Vignesh,

In addition to what Ron has said, there are a number of options to consider, 
depending on the nature of your calculations:

Given that your main focus seems to be latency:

  *   As Ron has said, Flink manages parallelism in a coarse grained way that 
is optimized for spending as little time as possible in synchronization, and 
takes away the need to manually synchronize
  *   If you spawn your own threads (it’s possible) you need to manage 
synchronization yourself and this can add considerable to latency and create 
back-pressure
 *   You would combine two implementations of parallelism and probably end 
up in threads stealing CPU resources from each other
  *   When planning for horizontal scalability in Flink, plan CPU resources so 
they can manage the workload
Fanout and collection pattern in general is a good idea, given that in order to 
allow for horizontal scaling you need to have at least one network shuffle 
anyway

  *   Make sure you use the best serializer possible for your data.
  *   Out of the box, Pojo serializer is hard to top, a hand coded serializer 
might help (keep this for later in you dev process)
  *   If you can arrange you problem so that operators can be chained into a 
single chain you can avoid serialization within the chain
  *   In Flink, if you union() or connect() multiple streams, chaining is 
interrupted and this adds considerably to latency
There is a neat trick to combine the parallelism-per-key-group and the 
parallelism-per-algorithm into a single implementation and end up with single 
chains with little de-/serialization except for the fanout

  *   One of my students has devised this scheme in his masters thesis (see [1] 
chapter 4.4.1 pp. 69)
  *   With his implementation we reduces back-pressure and latency 
significantly for some orders of magnitude

I hope this helps, feels free to discuss details 

Thias



[1] Master Thesis, Dominik Bünzli, University of Zurich, 2021: 
https://www.merlin.uzh.ch/contributionDocument/download/14168



From: liu ron 
Sent: Dienstag, 15. August 2023 03:54
To: Vignesh Kumar Kathiresan 
Cc: user@flink.apache.org
Subject: Re: Recommendations on using multithreading in flink map functions in 
java

Hi, Vignesh

Flink is a distributed parallel computing framework, each MapFunction is 
actually a separate thread. If you want more threads to process the data, you 
can increase the parallelism of the MapFunction without having to use multiple 
threads in a single MapFunction, which in itself violates the original design 
intent of Flink.

Best,
Ron

Vignesh Kumar Kathiresan via user 
mailto:user@flink.apache.org>> 于2023年8月15日周二 03:59写道:
Hello All,

Problem statement
For a given element, I have to perform multiple(lets say N) operations on it. 
All the N operations are independent of each other. And for achieving lowest 
latency, I want to do them concurrently. I want to understand what's the best 
way to perform it in flink?.

I understand flink achieves huge parallelism across elements. But is it 
anti-pattern to do parallel processing in a map func at single element level? I 
do not see anything on the internet for using multithreading inside a map 
function.

I can always fan out with multiple copies of the same element and send them to 
different operators. But it incurs at the least a serialize/deserialize cost 
and may also incur network shuffle. Trying to see if a multithreaded approach 
is better.

Thanks,
Vignesh
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Join two streams

2023-06-29 Thread Schwalbe Matthias
Привет Иван,

The source of your problem is quite easy:
- If you do windowing by event time, all the sources need to emit watermarks.
- watermarks are the logical clock used when event-time timing
- you could use either processing time windows, or adjust watermark strategy of 
your sources accordingly

... didn't check other potential sources of troubles in your code

Hope this helps

Thias


-Original Message-
From: Иван Борисов  
Sent: Freitag, 30. Juni 2023 05:45
To: user@flink.apache.org
Subject: Join two streams

Hello,
plz help me, I can't join two streams. In the joined stream I've got zero 
messages and can't understand why?

Kafka Topics:
1st stream
topic1: {'data': {'temp':25.2, 'sensore_name': 'T1', 'timestamp':
123123131}, 'compare_with': 'T2'}
2nd stream
topic2: {'data': {'temp':28, 'sensore_name': 'T2', 'timestamp':
53543543}, 'compare_with': 'T1'}


DataStreamT1_Stream = env.fromSource( T1_Source, 
WatermarkStrategy.noWatermarks(),
"T1 Stream");

DataStream T2_Stream = env.fromSource( T2_Source, 
WatermarkStrategy.noWatermarks(),
"T2 Stream");

DataStream comparisonStream = T1_Stream
.join(T2_Stream)
.where(T1 -> T1.getCompare_with())
.equalTo(T2 -> T2.getSensor_Name())
.window(TumblingEventTimeWindows.of(Time.seconds(60)))
.apply((JoinFunction) (T1, T2) -> { double firstValue = 
T1.getTemp(); double secondValue = T2.getTemp(); double m = 
firstValue-secondValue; return m; }); 
comparisonStream.writeAsText("/tmp/output_k.txt",
org.apache.flink.core.fs.FileSystem.WriteMode.OVERWRITE);

And my file is empty!
What am I do wrong?

--
Yours truly, Ivan Borisov  |  С уважением, Иван Борисов
mob./WhatsApp: 7 913  088 8882
Telegram: @Ivan_S_Borisov
Skype: ivan.s.borisov
e-mail: ivan.s.bori...@gmail.com
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Identifying a flink dashboard

2023-06-29 Thread Schwalbe Matthias
Hi Mike,

Let me sketch it:

  *   The trick I use (no idea if it is wise or not  ) is to have 
nginx-ingress set up and then specify a service selecting the nginx…controller 
pods [1]
  *   You don’t need to bind to the node address (see externalIPs), you could 
much the same port-forward this service, but
the ingresses that specify the nginx-ingress, all relay over that same service, 
using a different https path respectively
  *   I’ll give an example configuration for flink-kubernetes-operator 
FlinkDeployment [2]
 *   template: is patched with the namespace and job name
 *   unfortunately, annotations: does not support templating (yet?),
 *   i.e. you need to manually replace the path which must be the same as 
what comes out of template:
 *   put in the  whatever you like (that was your original question 
 )
  *   I work on a local VM with microk8s, so specifying that as externalIPs 
allows me to access it, however I also need to register this IP as 
local.ingress in my hosts file, and accept the certificate in the browser …
  *   In your case you could either expose that service with a port forward and 
also get the certificate and DNS business solved
  *   This is the result on my machine:

[cid:image001.png@01D9AA66.B544C770]




Hope that helps

Thias



[1] service-exposing-nginx-ingress-on-node.yaml :
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress-microk8s-service
  namespace: ingress
  labels:
app: nginx-ingress
spec:
  ports:
- port: 8095
  targetPort: http
  protocol: TCP
  name: http
- port: 8444
  targetPort: https
  protocol: TCP
  name: https
  selector:
name: nginx-ingress-microk8s
  externalIPs:
- xxx.xxx.xxx.xxx


[2] basic.ingress.yaml :

#  Licensed to the Apache Software Foundation (ASF) under one
#  or more contributor license agreements.  See the NOTICE file
#  distributed with this work for additional information
#  regarding copyright ownership.  The ASF licenses this file
#  to you under the Apache License, Version 2.0 (the
#  "License"); you may not use this file except in compliance
#  with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: basic-ingress
  namespace: flink
spec:
  image: flink:1.16
  flinkVersion: v1_16
  ingress:
template: "ingress.local/{{namespace}}/{{name}}(/|$)(.*)"
className: "nginx"
annotations:
  nginx.ingress.kubernetes.io/use-regex: "true"
  nginx.ingress.kubernetes.io/rewrite-target: "/$2"
  nginx.ingress.kubernetes.io/configuration-snippet: |
proxy_set_header Accept-Encoding "";
sub_filter_last_modified off;
sub_filter '' '';
sub_filter 'Apache Flink Web Dashboard' 'flink: 
basic-ingress Dashboard';
  flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
  serviceAccount: flink
  jobManager:
resource:
  memory: "2048m"
  cpu: 1
  taskManager:
resource:
  memory: "2048m"
  cpu: 1
  job:
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
parallelism: 2
upgradeMode: stateless

From: Mike Phillips 
Sent: Thursday, June 29, 2023 7:42 AM
To: Schwalbe Matthias ; user@flink.apache.org
Subject: Re: Identifying a flink dashboard

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


G'day,

The flink and the dashboard are running in k8s and I am not on the same network.
We don't have a VPN into the cluster. (Don't ask)

I am not sure how I would access the dashboard without having a port forward.

On 28/06/2023 14:39, Schwalbe Matthias wrote:
Good Morning Mike,

As a quick fix, sort of, you could use an Ingress on nginx-ingress (instead of 
the port-forward) and
Add a sub_filter rule to patch the HTML response.
I use this to add a  tag to the header and for the Flink-Dashboard I 
experience no glitches.

As to point 3. … you don’t need to expose that Ingress to the internet, but 
only to the node IP, so it becomes visible only within your network, … there is 
a number of ways doing it

I could elaborate a little more, if interested

Hope this helps

Thias


From: Mike Phillips 
<mailto:mike.phill...@intellisense.io>
Sent: Wednesday, June 28, 2023 3:47 AM


G'day Alex,

Thanks!

1 - hmm maybe beyond my capabilities presently
2 - Yuck! :-) Will look at t

RE: Identifying a flink dashboard

2023-06-28 Thread Schwalbe Matthias
Good Morning Mike,

As a quick fix, sort of, you could use an Ingress on nginx-ingress (instead of 
the port-forward) and
Add a sub_filter rule to patch the HTML response.
I use this to add a  tag to the header and for the Flink-Dashboard I 
experience no glitches.

As to point 3. … you don’t need to expose that Ingress to the internet, but 
only to the node IP, so it becomes visible only within your network, … there is 
a number of ways doing it

I could elaborate a little more, if interested

Hope this helps

Thias


From: Mike Phillips 
Sent: Wednesday, June 28, 2023 3:47 AM
To: user@flink.apache.org
Subject: Re: Identifying a flink dashboard

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


G'day Alex,

Thanks!

1 - hmm maybe beyond my capabilities presently
2 - Yuck! :-) Will look at this
3 - Not possible, the dashboards are not accessible via the internet, so we use 
kube and port forward, URL looks like http://wobbegong:3/ the port changes
4 - I think this requires the dashboard be internet accessible?

On Tue, 27 Jun 2023 at 17:21, Alexander Fedulov 
mailto:alexander.fedu...@gmail.com>> wrote:
Hi Mike,

no, it is currently hard-coded
https://github.com/apache/flink/blob/master/flink-runtime-web/web-dashboard/src/app/app.component.html#L23

Your options are:
1. Contribute a change to make it configurable
2. Use some browser plugin that allows renaming page titles
3. Always use different ports and bookmark the URLs accordingly
4. Use an Ingress in k8s

Best,
Alex

On Tue, 27 Jun 2023 at 05:58, Mike Phillips 
mailto:mike.phill...@intellisense.io>> wrote:
G'day all,

Not sure if this is the correct place but...
We have a number of flink dashboards and it is difficult to know what dashboard 
we are looking at.
Is there a configurable way to change the 'Apache Flink Dashboard' heading on 
the dashboard?
Or some other way of uniquely identifying what dashboard I am currently looking 
at?
Flink is running in k8s and we use kubectl port forwarding to connect to the 
dashboard so we can't ID using the URL

--
--
Kind Regards

Mike


--
--
Kind Regards

Mike
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Using pre-registered schemas with avro-confluent-registry format is not possible

2023-05-31 Thread Schwalbe Matthias

Hello Jannik,

Some things to consider (I had a similar problem a couple of years before):

  *   The schemaRegistryClient actually caches schema ids, so it will hit the 
schema registry only once,
  *   The schema registered in schema registry needs to be byte-equal, 
otherwise schema registry considers it to be a new schema (version)
  *   … to my best knowledge writing an existing schema to the schema registry 
does not fail because it is actually not written
 *   Could be that this is not entirely true as we had to replace the whole 
schemaRegistryClient with our own implementation because the existing one could 
not be reconfigured to accept compressed answers from our r/o proxy
  *   if you manage to fill the cache of your schemaRegistryClient with the 
exact schema (e.g. by querying it beforehand) you might never run into the 
trouble

Hope this helps … keep us posted 

Thias




From: Schmeier, Jannik 
Sent: Wednesday, May 31, 2023 12:44 PM
To: user@flink.apache.org
Subject: Using pre-registered schemas with avro-confluent-registry format is 
not possible

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hello,

I'm trying to use the avro-confluent-registry format with the Confluent Cloud 
Schema Registry in our company.
Our schemas are managed via Terraform and global write access is denied for all 
Kafka clients in our environments (or at least in production).
Therefore, when using the avro-confluent-registry format I'm getting an error 
when Flink is trying to serialize a row:

java.lang.RuntimeException: Failed to serialize row.
at 
org.apache.flink.formats.avro.AvroRowDataSerializationSchema.serialize(AvroRowDataSerializationSchema.java:90)
 ~[?:?]
at 
org.apache.flink.formats.avro.AvroRowDataSerializationSchema.serialize(AvroRowDataSerializationSchema.java:40)
 ~[?:?]
at 
org.apache.flink.streaming.connectors.kafka.table.DynamicKafkaRecordSerializationSchema.serialize(DynamicKafkaRecordSerializationSchema.java:95)
 ~[?:?]
at 
org.apache.flink.streaming.connectors.kafka.table.DynamicKafkaRecordSerializationSchema.serialize(DynamicKafkaRecordSerializationSchema.java:36)
 ~[?:?]
at 
org.apache.flink.connector.kafka.sink.KafkaWriter.write(KafkaWriter.java:196) 
~[?:?]
at 
org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.processElement(SinkWriterOperator.java:158)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.table.runtime.util.StreamRecordCollector.collect(StreamRecordCollector.java:44)
 ~[flink-table-runtime-1.17.0.jar:1.17.0]
at 
org.apache.flink.table.runtime.operators.sink.ConstraintEnforcer.processElement(ConstraintEnforcer.java:247)
 ~[flink-table-runtime-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
 ~[flink-dist-1.17.0.jar:1.17.0]
at StreamExecCalc$2221.processElement_0_0(Unknown Source) ~[?:?]
at 
StreamExecCalc$2221.processElement_0_0_rewriteGroup22_split310(Unknown Source) 
~[?:?]
at 
StreamExecCalc$2221.processElement_0_0_rewriteGroup22(Unknown Source) ~[?:?]
at StreamExecCalc$2221.processElement_split308(Unknown Source) 
~[?:?]
at StreamExecCalc$2221.processElement(Unknown Source) ~[?:?]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 
org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
 ~[flink-dist-1.17.0.jar:1.17.0]
at 

RE: Bootstrapping multiple state within same operator

2023-03-24 Thread Schwalbe Matthias
Hi David,
… coming in late into this discussion

We had a very similar problem and I found a simple way to implement priming 
savepoints with mixed keyed/operator state.
The trick is this:

  *   In your KeyedStateBootstrapFunction also implement CheckpointedFunction
  *   In initializeState() you can initialize the broad state primitive (the 
code skeleton below uses getUnionListState, same principle)
  *   Then in the processElement() function I process a tuple of state 
collections for each state primitive, i.e. event object per key
  *   For the union list state I forge special key “broadcast”, and only the 
3rd tuple vector contains anything,
 *   (the upstream operator feeding into this bootstrap function makes sure 
only one event with “broadcast” key is generated)

Peruse the code skeleton (scala) if you like (I removed some stuff I’m not 
supposed to disclose):


/** Event type for state updates for savepoint generation. A 4-tuple of
* 1) vector of valid [[CSC]] entries
* 2) a vector of valid [[DAS]] entries
* 3) a vector of valid broadcast FFR.
* 4) a vector of timers for state cleanup */
type DAFFS = (Vector[CSC], Vector[DAS], Vector[FFR], Vector[Long])

/** [[StateUpdate]] type for [[DAFFS]] state along with the [[String]] key 
context. */
type DAFFSHU = StateUpdate[String, DAFFS]

class DAFFOperatorStateBootstrapFunction
  extends KeyedStateBootstrapFunction[String, DAFFSHU]
with CheckpointedFunction {

  override def open(parameters: Configuration): Unit = {
super.open(parameters)
val rtc: RuntimeContext = getRuntimeContext
//keyed state setup:
// cSC = rtc.getListState(new ListStateDescriptor[CSC](...
// dAS = rtc.getListState(new ListStateDescriptor[DAS](...
  }

  override def processElement(value: DAFFSHU, ctx: 
KeyedStateBootstrapFunction[String, DAFFSHU]#Context): Unit = {

val daffs = value.state
val ts = ctx.timerService()

for (csc <- daffs._1) {
  cSC.add(csc)
}
for (das <- daffs._2) {
  dAS.add(das)
}
for (ffr <- daffs._3) {
  fFRState.add(ffr)
}
for (timer <- daffs._4) {
  ts.registerEventTimeTimer(timer)
}

val stop = 0
  }

  @transient var fFRState: ListState[FFR] = null

  override def snapshotState(context: FunctionSnapshotContext): Unit = {
  }

  override def initializeState(context: FunctionInitializationContext): Unit = {
val fFRStateDescriptor = new ListStateDescriptor[FFR]("ffr", ffrTI)
fFRState = 
context.getOperatorStateStore.getUnionListState(fFRStateDescriptor)
  }
}

Hope this helps …

Sincere greetings

Thias


From: David Artiga 
Sent: Wednesday, March 22, 2023 11:31 AM
To: Hang Ruan 
Cc: user@flink.apache.org
Subject: Re: Bootstrapping multiple state within same operator

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Not familiar with the implementation but thinking some options:

- composable transformations
- underlying MultiMap
- ...

On Wed, Mar 22, 2023 at 10:24 AM Hang Ruan 
mailto:ruanhang1...@gmail.com>> wrote:
Hi, David,
I also read the code about the `SavepointWriter#withOperator`. The 
transformations are stored in a `Map` whose key is `OperatorID`. I don't come 
up with a way that we could register multi transformations for one operator 
with the provided API.

Maybe we need a new type of  `XXXStateBootstrapFunction` to change more states 
at one time.

Best,
Hang

David Artiga mailto:david.art...@gmail.com>> 
于2023年3月22日周三 15:22写道:
We are using 
state
 processor 
API
 to bootstrap the state of some operators. It has been working fine until now, 
when we tried to bootstrap an operator that has both a keyed state and a 
broadcasted state. Seems the API does not provide a convenient method to apply 
multiple transformations on the same uid...

Is there a way to do that and we just missed it? Any insights appreciated.

Cheers,
/David
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any 

RE: Is it possible to preserve chaining for multi-input operators?

2023-03-24 Thread Schwalbe Matthias
Hi Viacheslav,

… back from vacation

… you are welcome, glad to hear it worked out 

Thias


From: Viacheslav Chernyshev 
Sent: Thursday, March 16, 2023 5:34 PM
To: user@flink.apache.org
Subject: Re: Is it possible to preserve chaining for multi-input operators?

Hi Matthias,

Just wanted to thank you for the hints! I've successfully developed a 
multi-stream operator that allows doing things like this:

KeyedMultiInputStream.builder(environment, new UserDefinedFunction())
.addKeyedStream(fooSource, fooMapper, UserDefinedFunction::processFoo)
.addKeyedStream(barSource, barMapper, UserDefinedFunction::processBar)
.addBroadcastStream(bazSource, UserDefinedFunction::processBaz)
.build();

Direct connectivity to the sources and optional on-the-fly mappers have 
completely eliminated the performance issues that we had been facing before.

Kind regards,
Viacheslav

From: Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>
Sent: 28 February 2023 15:50
To: Viacheslav Chernyshev 
mailto:v.chernys...@outlook.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org> 
mailto:user@flink.apache.org>>
Subject: RE: Is it possible to preserve chaining for multi-input operators?


Hi Viacheslav,



Certainly I can …



There is two parts to it,

  *   setting up such MultipleInputStreamOperator, which is documented (sort 
of), but not quite complete

 *   I can prepare some boiler-plate, not today, but in the next days (if 
you are interested)

  *   Second part is about how to put all joins and other operations into a 
single operator implementation (well, you exactly do that  ):

 *   Equi-joins on the key, you can process per Input() implementation and 
state kept from other inputs
 *   Windowing is restricted to a single window key type (a Namespace in 
Flink-speak) for your operator

*   Windowing can be implemented manually and modelled after the 
official Flink windowing operators

 *   Should you absolutely need more than one windowing namespace, then you 
need to become creative with state primitives

  *   You mentioned also broadcast streams, that is in the end you’ll have more 
than 2 input streams, the keyed ones + the broadcast streams

 *   This is where MultipleInputStreamOperator comes into play, because you 
are not restricted to only 2 input streams as in the KeyedCoProcessFunction case
 *   That gives you more freedom to combine data in a single operator 
instead of being forced to split/chain multiple operators



Kind regards



Thias









From: Viacheslav Chernyshev 
mailto:v.chernys...@outlook.com>>
Sent: Tuesday, February 28, 2023 3:42 PM
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Is it possible to preserve chaining for multi-input operators?



Hi Matthias,



Thank you for the reply. You are absolutely right, the first keyBy is 
unavoidable, but after that we fix the parallelism and maintain the same key 
throughout the pipeline.



The MultipleInputStreamOperator approach that you've described looks very 
interesting! Unfortunately, I have never used it before. Would you be able to 
share the details for how to force the chaining with e.g. two input streams?



Kind regards,

Viacheslav

____

From: Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>
Sent: 28 February 2023 14:12
To: Viacheslav Chernyshev 
mailto:v.chernys...@outlook.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org> 
mailto:user@flink.apache.org>>
Subject: RE: Is it possible to preserve chaining for multi-input operators?





Hi Viacheslav,



These are two very interesting questions…



You have found out about the chaining restriction to single input operators to 
be chained, it does also not help to union() multiple streams into a single 
input, they still count as multiple inputs.



  *   The harder way to go would be to patch the relevant parts of Flink to 
allow chaining with multiple inputs

 *   This is very complicated to get right, especially for the then 
multiple inputs and outputs that need to get façaded
 *   We once did it (successfully) and abandoned the idea because of its 
complexity and maintenance cost

  *   The other way might be to implement all into one 
org.apache.flink.streaming.api.operators.MultipleInputStreamOperator that 
allows to have any (reasonable) number of inputs, keyed, non-keyed, broadcast ; 
mixed …. Let me explain:

 *   From what you say I assume, that after the Kafka source you need to 
.keyBy() the instrument-id anyway, which means a shuffle and 
(de-/)serialization … unavoidable.
 *   However, after that shuffle, the MultipleInputStreamOperator could 
force-chain all your logic as long as it stays to be on the same key/partition 
domain
 *   Integration of broadcast inputs is a no-brainer there
 *   We do these things all t

RE: Avoiding data shuffling when reading pre-partitioned data from Kafka

2023-03-07 Thread Schwalbe Matthias
Hi Tommy,

While not coming up with a sure solution, I’ve got a number of idea on how to 
continue and shed light into the matter:


  *   With respect to diagnostics, have you enabled flame graph 
(cluster-config.rest.flamegraph.enabled),
 *   It allows you to see the call tree of each task and where dominantly 
time is spent
 *   That usually gives me quite some insight
  *   You mention serialization could be a problem:
 *   Which serialization are you using currently?
 *   I could imagine to use one the (almost) zero-copy type like RowData
*   I considered this once but didn’t try
 *   Nico published a nice comparison of the choices w/r to serializers [1]
  *   Just for completeness: pipeline.object-reuse can cut down quite a bit on 
GC cost adding the need to execute more discipline with object mutation and 
caching un-serialized objects in arbitrary data structures

Hope this helps

Thias




[1] 
https://flink.apache.org/2020/04/15/flink-serialization-tuning-vol.-1-choosing-your-serializer-if-you-can/




From: Tommy May 
Sent: Tuesday, March 7, 2023 3:25 AM
To: David Morávek 
Cc: Ken Krugler ; Flink User List 

Subject: Re: Avoiding data shuffling when reading pre-partitioned data from 
Kafka

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Ken & David,

Thanks for following up. I've responded to your questions below.

 If the number of unique keys isn’t huge, I could think of yet another 
helicopter stunt that you could try :)

Unfortunately the number of keys in our case is huge, they're unique per 
handful of events.

If your data are already pre-partitioned and the partitioning matches (hash 
partitioning on the JAVA representation of the key yielded by the KeySelector), 
you can use `reinterpretAsKeyedStream` [1] to skip the shuffle.

That comes with the additional constraints that Ken mentioned, correct? It 
could break immediately in cases if a key comes through on a different 
partition, or if the number of partitions happen to change? I'm concerned about 
that for our use case as we don't have 100% control of the upstream data source.

I feel you'd be blocked by the state access downstream (with RocksDB). Are you 
sure it isn't the case?

Yes, you are right that state access is also a limiting factor and some 
optimizations to limit that have helped quite a bit (both in our implementation 
and in using local SSDs for rocksdb). One other path we looked at is using 
memory-backed volumes for rocksdb, but ran into a limitation that we cannot 
configure Flink's process memory lower than the k8s container memory, leading 
to OOMs. More details at 
https://stackoverflow.com/questions/74118022/flink-pods-ooming-using-memory-backed-volume-with-k8s-operator.

I don't have a dashboard currently to immediately point to data shuffling as 
the primary bottleneck, but I thought it could be a huge optimization if we can 
tell Flink to take advantage of the pre-partitioned datasource, given we're 
shuffling near 1 Gb/sec right now. I can see that the join is causing the 
backpressure on the sources though, and figured that network and state acces 
would be the two primary contributors there. Let me know if you have any good 
debugging tools to narrow in on this more.

Thanks,
Tommy


On Mon, Mar 6, 2023 at 4:42 AM David Morávek 
mailto:d...@apache.org>> wrote:
Using an operator state for a stateful join isn't great because it's meant to 
hold only a minimal state related to the operator (e.g., partition tracking).

If your data are already pre-partitioned and the partitioning matches (hash 
partitioning on the JAVA representation of the key yielded by the KeySelector), 
you can use `reinterpretAsKeyedStream` [1] to skip the shuffle.

> What we see is that the join step causes backpressure on the kafka sources 
> and lag slowly starts to accumulate.

I feel you'd be blocked by the state access downstream (with RocksDB). Are you 
sure it isn't the case?

[1] 
https://javadoc.io/static/org.apache.flink/flink-streaming-java/1.16.1/org/apache/flink/streaming/api/datastream/DataStreamUtils.html#reinterpretAsKeyedStream-org.apache.flink.streaming.api.datastream.DataStream-org.apache.flink.api.java.functions.KeySelector-

Best,
D.

On Sun, Mar 5, 2023 at 5:31 AM Ken Krugler 
mailto:kkrugler_li...@transpac.com>> wrote:
Hi Tommy,

To use stateful timers, you need to have a keyed stream, which gets tricky when 
you’re trying to avoid network traffic caused by the keyBy()

If the number of unique keys isn’t huge, I could think of yet another 
helicopter stunt that you could try :)

It’s possible to calculate a composite key, based on the “real” key and a 
synthetic value, that will wind up on in the same slot where you’re doing this 
calculation.

So that would let you create a keyed stream which would have 
serialization/deserialization cost, but wouldn’t actually go through the 
network stack.

Since the composite key generation is deterministic, you can do the same thing 
on 

RE: Is it possible to preserve chaining for multi-input operators?

2023-02-28 Thread Schwalbe Matthias
Hi Viacheslav,

Certainly I can …

There is two parts to it,

  *   setting up such MultipleInputStreamOperator, which is documented (sort 
of), but not quite complete
 *   I can prepare some boiler-plate, not today, but in the next days (if 
you are interested)
  *   Second part is about how to put all joins and other operations into a 
single operator implementation (well, you exactly do that  ):
 *   Equi-joins on the key, you can process per Input() implementation and 
state kept from other inputs
 *   Windowing is restricted to a single window key type (a Namespace in 
Flink-speak) for your operator
*   Windowing can be implemented manually and modelled after the 
official Flink windowing operators
 *   Should you absolutely need more than one windowing namespace, then you 
need to become creative with state primitives
  *   You mentioned also broadcast streams, that is in the end you’ll have more 
than 2 input streams, the keyed ones + the broadcast streams
 *   This is where MultipleInputStreamOperator comes into play, because you 
are not restricted to only 2 input streams as in the KeyedCoProcessFunction case
 *   That gives you more freedom to combine data in a single operator 
instead of being forced to split/chain multiple operators

Kind regards

Thias




From: Viacheslav Chernyshev 
Sent: Tuesday, February 28, 2023 3:42 PM
To: user@flink.apache.org
Subject: Re: Is it possible to preserve chaining for multi-input operators?

Hi Matthias,

Thank you for the reply. You are absolutely right, the first keyBy is 
unavoidable, but after that we fix the parallelism and maintain the same key 
throughout the pipeline.

The MultipleInputStreamOperator approach that you've described looks very 
interesting! Unfortunately, I have never used it before. Would you be able to 
share the details for how to force the chaining with e.g. two input streams?

Kind regards,
Viacheslav

From: Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>
Sent: 28 February 2023 14:12
To: Viacheslav Chernyshev 
mailto:v.chernys...@outlook.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org> 
mailto:user@flink.apache.org>>
Subject: RE: Is it possible to preserve chaining for multi-input operators?




Hi Viacheslav,



These are two very interesting questions…



You have found out about the chaining restriction to single input operators to 
be chained, it does also not help to union() multiple streams into a single 
input, they still count as multiple inputs.



  *   The harder way to go would be to patch the relevant parts of Flink to 
allow chaining with multiple inputs

 *   This is very complicated to get right, especially for the then 
multiple inputs and outputs that need to get façaded
 *   We once did it (successfully) and abandoned the idea because of its 
complexity and maintenance cost

  *   The other way might be to implement all into one 
org.apache.flink.streaming.api.operators.MultipleInputStreamOperator that 
allows to have any (reasonable) number of inputs, keyed, non-keyed, broadcast ; 
mixed …. Let me explain:

 *   From what you say I assume, that after the Kafka source you need to 
.keyBy() the instrument-id anyway, which means a shuffle and 
(de-/)serialization … unavoidable.
 *   However, after that shuffle, the MultipleInputStreamOperator could 
force-chain all your logic as long as it stays to be on the same key/partition 
domain
 *   Integration of broadcast inputs is a no-brainer there
 *   We do these things all the time and it really helps cutting down 
serialization cost, among other things
 *   This way does not necessarily help with keeping latency down, as more 
inputs means more time to round-robin the available inputs



I hope this helps



What do you think?



Regards



Thias











From: Viacheslav Chernyshev 
mailto:v.chernys...@outlook.com>>
Sent: Tuesday, February 28, 2023 1:06 PM
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Is it possible to preserve chaining for multi-input operators?



Hi everyone,



My team is developing a streaming pipeline for analytics on top of market data. 
The ultimate goal is to be able to handle tens of millions of events per second 
distributed across the cluster according to the unique ID of a particular 
financial instrument. Unfortunately, we struggle with achieving acceptable 
performance. As far as I can see, Flink forcibly breaks operator chaining when 
it encounters a job graph node with multiple inputs. Subsequently, it severely 
affects the performance because a network boundary is enforced, and every event 
is forcibly serialised and deserialised.



From the pipeline graph perspective, the requirements are:

  *   Read data from multiple Kafka topics that are connected to different 
nodes in the graph.
  *   Broadcast a number of dynamic rules to the pipeline.

The clean

RE: Is it possible to preserve chaining for multi-input operators?

2023-02-28 Thread Schwalbe Matthias

Hi Viacheslav,

These are two very interesting questions...

You have found out about the chaining restriction to single input operators to 
be chained, it does also not help to union() multiple streams into a single 
input, they still count as multiple inputs.


  *   The harder way to go would be to patch the relevant parts of Flink to 
allow chaining with multiple inputs
 *   This is very complicated to get right, especially for the then 
multiple inputs and outputs that need to get façaded
 *   We once did it (successfully) and abandoned the idea because of its 
complexity and maintenance cost
  *   The other way might be to implement all into one 
org.apache.flink.streaming.api.operators.MultipleInputStreamOperator that 
allows to have any (reasonable) number of inputs, keyed, non-keyed, broadcast ; 
mixed  Let me explain:
 *   From what you say I assume, that after the Kafka source you need to 
.keyBy() the instrument-id anyway, which means a shuffle and 
(de-/)serialization ... unavoidable.
 *   However, after that shuffle, the MultipleInputStreamOperator could 
force-chain all your logic as long as it stays to be on the same key/partition 
domain
 *   Integration of broadcast inputs is a no-brainer there
 *   We do these things all the time and it really helps cutting down 
serialization cost, among other things
 *   This way does not necessarily help with keeping latency down, as more 
inputs means more time to round-robin the available inputs

I hope this helps

What do you think?

Regards

Thias





From: Viacheslav Chernyshev 
Sent: Tuesday, February 28, 2023 1:06 PM
To: user@flink.apache.org
Subject: Is it possible to preserve chaining for multi-input operators?

Hi everyone,

My team is developing a streaming pipeline for analytics on top of market data. 
The ultimate goal is to be able to handle tens of millions of events per second 
distributed across the cluster according to the unique ID of a particular 
financial instrument. Unfortunately, we struggle with achieving acceptable 
performance. As far as I can see, Flink forcibly breaks operator chaining when 
it encounters a job graph node with multiple inputs. Subsequently, it severely 
affects the performance because a network boundary is enforced, and every event 
is forcibly serialised and deserialised.

>From the pipeline graph perspective, the requirements are:

  *   Read data from multiple Kafka topics that are connected to different 
nodes in the graph.
  *   Broadcast a number of dynamic rules to the pipeline.
The cleanest way is to achieve the first goal is to have a bunch of 
KeyedCoProcessFunction operations. This design didn't work for us because the 
SerDe overhead added by broken chains was too high, we had to completely 
flatten the pipeline instead. Unfortunately, I can't find any way to solve the 
second problem. As soon as the broadcast stream is introduced into the 
pipeline, the performance tanks.

Is there any technique that I could possibly utilise to preserve the chaining?

Kind regards,
Viacheslav
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Fast and slow stream sources for Interval Join

2023-02-28 Thread Schwalbe Matthias
Hi All,

Another option to consider (and this is more a question ) is to

  *   Implement org.apache.flink.streaming.api.operators.InputSelectable in the 
join operator
  *   And manually control backpressure on the inputs running ahead of 
watermark time

I’m not sure where actually to implement this and if it would work … just an 
idea.
As also said for the watermark aligning, you would still need state to buffer 
fast events, but not as much as in the unaligned case.
If this works you could control backpressure and watermarking for a single 
operator without forcing the whole job to adopt aligned watermarks.

What do you think?

Regards

Thias


From: Alexis Sarda-Espinosa 
Sent: Tuesday, February 28, 2023 7:57 AM
To: Mason Chen 
Cc: Remigiusz Janeczek ; user 
Subject: Re: Fast and slow stream sources for Interval Join

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Mason,

Very interesting, is it possible to apply both types of alignment? I.e., 
considering watermark skew across splits from within one source & also from 
another source?

Regards,
Alexis.

On Tue, 28 Feb 2023, 05:26 Mason Chen, 
mailto:mas.chen6...@gmail.com>> wrote:
Hi all,

It's true that the problem can be handled by caching records in state. However, 
there is an alternative using `watermark alignment` with Flink 1.15+ [1] which 
does the desired synchronization that you described while reducing the size of 
state from the former approach.

To use this with two topics of different speeds, you would need to define two 
Kafka sources, each corresponding to a topic. This limitation is documented in 
[1]. This limitation is resolved in Flink 1.17 by split level (partition level 
in the case of Kafka) watermark alignment, so one Kafka source reading various 
topics can align on the partitions of the different topics.

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_

Best,
Mason

On Mon, Feb 27, 2023 at 8:11 AM Alexis Sarda-Espinosa 
mailto:sarda.espin...@gmail.com>> wrote:
Hello,

I had this question myself and I've seen it a few times, the answer is always 
the same, there's currently no official way to handle it without state.

Regards,
Alexis.

On Mon, 27 Feb 2023, 14:09 Remigiusz Janeczek, 
mailto:capi...@gmail.com>> wrote:
Hi,
How to handle a case where one of the Kafka topics used for interval join is 
slower than the other? (Or a case where one topic lags behind)
Is there a way to stop consuming from the fast topic and wait for the slow one 
to catch up? I want to avoid running out of memory (or keeping a very large 
state) and I don't want to discard any data from the fast topic until a 
watermark from the slow topic allows that.

Best Regards
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: DI in flink

2023-02-14 Thread Schwalbe Matthias
Hi Yashoda,

I use Spring-Boot to setup my job networks and DI-compose streaming components 
like operators/functions etc.
Important part is that all components need to be serializable in order for this 
to work.
Specific task implementations are a little more difficult (little experience) 
to set up in a DI way. If I’m not mistaken, Flink uses factories for this.

Sincere greetings

Thias


From: Yashoda Krishna T 
Sent: Wednesday, February 15, 2023 6:19 AM
To: Austin Cawley-Edwards 
Cc: user 
Subject: Re: DI in flink

Thanks Austin.
I can make use of Rich functions to solve my problem.

Thanks
Yashoda

On Wed, Feb 15, 2023 at 12:42 AM Austin Cawley-Edwards 
mailto:austin.caw...@gmail.com>> wrote:
(note: please keep user@flink.apache.org included 
in replies)

Ah, I see. Then no, this is not provided by Flink. When I've used dependency 
inject with Flink in the past, I instantiated everything in the `open()` method 
of the Flink Rich* classes. Could you solve this by having a common base Sink 
class or builder that does the configuring? I'm just wondering why it's 
necessary to solve it in Flink itself.

Best,
Austin

On Tue, Feb 14, 2023 at 11:05 AM Yashoda Krishna T 
mailto:yashoda.kris...@unbxd.com>> wrote:
This is my use case.
I have a sink function to push streaming data to S3. And I have a class lets 
call S3ConnProvider that provides me a connection object to S3, and a class 
lets say S3Util that has functions over S3 which injects S3ConnProvider.
If dependency injection works I can inject S3Util alone in my SinkFunction 
class. If not I have to initialize S3ConnProvider first and then S3Util.
This can become complex if there are too many initializations required 
depending on the use case.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Reducing Checkpoint Count for Chain Operator

2023-02-02 Thread Schwalbe Matthias
Hi Talat Uyarer,


  *   There is no way to have only one file unless you lower the parallelism to 
1 (= only one subtask)
  *   So which files do you see: 1 “_metadata” + multiple data files (or just 
one)?
  *   The idea of having multiple files is to allow multiple threads to be able 
to stare checkpoints at the same time, and when restarting from a checkpoint to 
consume from more files potentially distributed to multiple physical hard 
driver (more I/O capacity)
  *   So in general it is good to have multiple files

Still (out of curiosity) why would you want to have everything in a single file?

Sincere greetings

Thias


From: Talat Uyarer 
Sent: Thursday, February 2, 2023 5:57 PM
To: Schwalbe Matthias 
Cc: Kishore Pola ; weijie guo 
; user@flink.apache.org
Subject: Re: Reducing Checkpoint Count for Chain Operator

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Schwalbe, weijie,

Thanks for your reply.


  *   Each state primitive/per subtask stores state into a separate file

In this picture You can see Operator Chain 
https://nightlies.apache.org/flink/flink-docs-master/fig/tasks_chains.svg

Source and Map are in the same chain. Today Flink creates two files for that 
operator chain. When we have OperatorChain, All subtasks are running in the 
same machine, same thread for memory optimization.  However Flink creates 
separate files per subtasks. Our question is whether there is a way to have one 
file not multiple files.

Thanks



On Wed, Feb 1, 2023 at 11:50 PM Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi Kishore,


Having followed this thread for a while it is still quite a bit of confusion of 
concepts and in order to help resolve your original we would need to know,

  *   what makes your observation a problem to be solved?
  *   You write, you have no shuffling, does that mean you don’t use any 
keyBy(), or rebalance()?
  *   How do you determine that there are 7 checkpoint, one for each operator?
  *   In general please relate a bit more details about how you configure state 
primitives: kinds/also operator state?/on all operators/etc.

In general (as Weijie told) checkpointing works like that (simplified):

  *   Jobmanager creates checkpoint mark/barrier in a configured interval
  *   For synchronous checkpointing this flows along with the events through 
the chain of tasks
  *   For asynchronous checkpointing, the checkpointing marker is directly sent 
to the subtasks
  *   A single checkpoint looks like that:

 *   Each state primitive/per subtask stores state into a separate file
 *   At the end jobmager writes a “_metadata” file for the checkpoint 
metadata and for state that is too small to end up in a separate file
 *   i.e. each checkpoint generates only one checkpoint (multiple files) 
not 7

Hope we shed a little light on this

Best regards

Thias



From: Kishore Pola mailto:kishore.p...@hotmail.com>>
Sent: Thursday, February 2, 2023 4:12 AM
To: weijie guo mailto:guoweijieres...@gmail.com>>; 
Talat Uyarer mailto:tuya...@paloaltonetworks.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Reducing Checkpoint Count for Chain Operator

Hi Weijie,

In our case we do have 7 operators. All the 7 operators are getting executed as 
one chain within a single StreamTask. As checkpoint barrier is passing through 
all the operators, there are 7 checkpoints being stored. So our checkpoint size 
is up by 7 times. We are investigating to see if we can checkpoint the start 
operator (kafka source) or end operator (BQ sink), we are good and check point 
size comes down. Hence the question, when the operators are executed in the 
same StreamTask as one chain, is it possible to checkpoint at operator chain or 
single operator level?

Thanks,
Kishore


From: weijie guo mailto:guoweijieres...@gmail.com>>
Sent: Wednesday, February 1, 2023 6:59 PM
To: Talat Uyarer 
mailto:tuya...@paloaltonetworks.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org> 
mailto:user@flink.apache.org>>
Subject: Re: Reducing Checkpoint Count for Chain Operator

Hi Talat,

In Flink, a checkpoint barrier will be injected from source, and then pass 
through all operators in turn. Each stateful operator will do checkpoint in 
this process, the state is managed at operator granularity, not operator chain. 
So what is the significance of checkpoint based on the granularity of operator 
chain?


Best regards,

Weijie


Talat Uyarer 
mailto:tuya...@paloaltonetworks.com>> 
于2023年2月2日周四 02:20写道:
Hi Weijie,

Thanks for replying back.

Our job is  a streaming job. The OperatorChain contains all operators that are 
executed as one chain within a single StreamTask. But each operator creates 
their own checkpoint at checkpointing time . Rather than creating a checkpoint 
per operator in checkpointing time. Can I have one checkpoint per 
OperatorChain? This is my 

RE: Reducing Checkpoint Count for Chain Operator

2023-02-01 Thread Schwalbe Matthias
Hi Kishore,


Having followed this thread for a while it is still quite a bit of confusion of 
concepts and in order to help resolve your original we would need to know,

  *   what makes your observation a problem to be solved?
  *   You write, you have no shuffling, does that mean you don’t use any 
keyBy(), or rebalance()?
  *   How do you determine that there are 7 checkpoint, one for each operator?
  *   In general please relate a bit more details about how you configure state 
primitives: kinds/also operator state?/on all operators/etc.

In general (as Weijie told) checkpointing works like that (simplified):

  *   Jobmanager creates checkpoint mark/barrier in a configured interval
  *   For synchronous checkpointing this flows along with the events through 
the chain of tasks
  *   For asynchronous checkpointing, the checkpointing marker is directly sent 
to the subtasks
  *   A single checkpoint looks like that:
 *   Each state primitive/per subtask stores state into a separate file
 *   At the end jobmager writes a “_metadata” file for the checkpoint 
metadata and for state that is too small to end up in a separate file
 *   i.e. each checkpoint generates only one checkpoint (multiple files) 
not 7

Hope we shed a little light on this

Best regards

Thias



From: Kishore Pola 
Sent: Thursday, February 2, 2023 4:12 AM
To: weijie guo ; Talat Uyarer 

Cc: user@flink.apache.org
Subject: Re: Reducing Checkpoint Count for Chain Operator

Hi Weijie,

In our case we do have 7 operators. All the 7 operators are getting executed as 
one chain within a single StreamTask. As checkpoint barrier is passing through 
all the operators, there are 7 checkpoints being stored. So our checkpoint size 
is up by 7 times. We are investigating to see if we can checkpoint the start 
operator (kafka source) or end operator (BQ sink), we are good and check point 
size comes down. Hence the question, when the operators are executed in the 
same StreamTask as one chain, is it possible to checkpoint at operator chain or 
single operator level?

Thanks,
Kishore


From: weijie guo mailto:guoweijieres...@gmail.com>>
Sent: Wednesday, February 1, 2023 6:59 PM
To: Talat Uyarer 
mailto:tuya...@paloaltonetworks.com>>
Cc: user@flink.apache.org 
mailto:user@flink.apache.org>>
Subject: Re: Reducing Checkpoint Count for Chain Operator

Hi Talat,

In Flink, a checkpoint barrier will be injected from source, and then pass 
through all operators in turn. Each stateful operator will do checkpoint in 
this process, the state is managed at operator granularity, not operator chain. 
So what is the significance of checkpoint based on the granularity of operator 
chain?


Best regards,

Weijie


Talat Uyarer 
mailto:tuya...@paloaltonetworks.com>> 
于2023年2月2日周四 02:20写道:
Hi Weijie,

Thanks for replying back.

Our job is  a streaming job. The OperatorChain contains all operators that are 
executed as one chain within a single StreamTask. But each operator creates 
their own checkpoint at checkpointing time . Rather than creating a checkpoint 
per operator in checkpointing time. Can I have one checkpoint per 
OperatorChain? This is my question.

Thanks

On Wed, Feb 1, 2023 at 1:02 AM weijie guo 
mailto:guoweijieres...@gmail.com>> wrote:
Hi Talat,

Can you elaborate on what it means to create one checkpoint object per chain 
operator more than all operators? If you mean to do checkpoint independently 
for each task, this is not supported.



Best regards,

Weijie


Talat Uyarer via user mailto:user@flink.apache.org>> 
于2023年2月1日周三 15:34写道:
Hi,

We have a job that is reading from kafka and writing some endpoints. The job 
does not have any shuffling steps.  I implement it with multiple steps.  Flink 
chained those operators in one operator in submission time. However I see all 
operators are doing checkpointing.

Is there any way to create one checkpoint object per chain operator rather than 
all operators ?

Thanks
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of 

RE: Processing watermarks in a broadcast connected stream

2023-01-31 Thread Schwalbe Matthias
Good Morning Sajjad,

I’ve once had a similar problem. As you’ve found out, directly using 
KeyedBroadcastProcessFunction is a little tricky.
What I ended up with instead is to use the rather new @PublicEvolving 
MultipleInputStreamOperator.
It allows you to connect and process any (reasonable) number of DataStream 
keyed/broadcast/plain and also to tap into
the meta-stream of watermark events. Each Input is set up separately and can 
implement separate handlers for the events/watermarks/etc.
However, it is an operator implementation, you e.g. need to manually set up 
timer manager and a number of other auxiliary components.
This is not too difficult as you can always model after other operator 
implementations within flink.

If you don’t mind that it will be in Scala, I could take the time to collect 
the basic setup …?


Hope this helps

Thias







From: Sajjad Rizvi 
Sent: Monday, January 30, 2023 7:42 PM
To: user@flink.apache.org
Subject: Processing watermarks in a broadcast connected stream

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi,

I am trying to process watermarks in a BroadcastConnectedStream. However, I am 
not able to find any direct way to handle watermark events, similar to what we 
have in processWatermark1 in a  KeyedCoProcessOperator. Following are further 
details.

In the context of the example given in “A Practical Guide to Broadcast State in 
Apache Flink”, I have 
a user actions stream and a pattern stream. The pattern stream is broadcast and 
connected with the user actions stream. The result is a 
BroadcastConnectedStream. I want to handle user action events and pattern evens 
in this stream. In addition, I want to use a processWatermark function to 
perform an action in response to watermark events.

The problem is that a BroadcastConnectedStream has only process() function, no 
transform(), that takes a (Keyed)BroadcastProcessFunction. A 
BroadcastProcessFunction allows only to process elements, doesn’t provide the 
interface to process watermarks. In contrast, a ConnectedStream (without 
broadcast) provides a transform function, which takes in an operator that 
provides a way to process watermarks.

Is there a way to process watermarks in a BroadcastConnectedStream?

Thanks,
Sajjad


Disclaimer

This email and any attachments are for the expressed and sole use of the 
intended recipient(s) and contain information that may be confidential and/or 
legally privileged. Any disclosure, copying, distribution or use of this 
communication by someone other than the intended recipient is strictly 
prohibited. If you are not the intended recipient please delete this email 
immediately. Any information and services described herein are provided by 
Arctic Wolf Networks, Inc.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Understanding pipelined regions

2022-12-20 Thread Schwalbe Matthias
Hi Sunny,

Welcome to Flink .

The next thing for you to consider is to setup checkpointing [1] which allows a 
failing job to pick up from where it stopped.

Sincere greetings from the supposed close-by  Zurich 

Thias


[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/


From: Raihan Sunny 
Sent: Tuesday, December 20, 2022 6:30 AM
To: user@flink.apache.org
Subject: Understanding pipelined regions

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi,

I'm quite new to the world of stream and batch processing. I've been reading 
about pipelined regions in Flink and am quite confused by what it means. My 
specific problem involves a streaming job that looks like the following:

1. There is a Kafka source that takes in an input data that sets off a series 
of operations
2. As part of the first operation, I have an operator that produces multiple 
values, each of which has to be fed into several different operators in parallel
3. The operators each produce a result which I keyBy and merge together using 
the union operator
4. The merged result is then written to a Kafka sink

The problem is that when one of the parallel operators throws an exception, all 
the tasks in the entire pipeline gets restarted including the source which then 
replays the input data and the process starts off once again. My question is if 
it's possible to make the tasks of only the branch that failed restart rather 
than the whole job. I do realize that it is possible to split up the job such 
that the first operator produces its output to a sink and having that as the 
source to the subsequent operations can mitigate the problem. I was just 
wondering if it's possible in the scenario that I have described above. In 
general, how can I "create" a pipelined region?


Thanks,
Sunny


[Image removed by sender.]

Secure Link Services Group
Zürich: The Circle 37, 8058 Zürich-Airport, Switzerland
Munich: Tal 44, 80331 München, Germany
Dubai: Building 3, 3rd Floor, Dubai Design District, Dubai, United Arab Emirates
Dhaka: Midas Center, Road 16, Dhanmondi, Dhaka 1209, Bangladesh
Thimphu: Bhutan Innovation Tech Center, Babesa, P.O. Box 633, Thimphu, Bhutan

Visit us: www.selise.ch


Important Note: This e-mail and any attachment are confidential and may contain 
trade secrets and may well also be legally privileged or otherwise protected 
from disclosure. If you have received it in error, you are on notice of its 
status. Please notify us immediately by reply e-mail and then delete this 
e-mail and any attachment from your system. If you are not the intended 
recipient please understand that you must not copy this e-mail or any 
attachment or disclose the contents to any other person. Thank you for your 
cooperation.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Concatenating a bounded and unbounded stream

2022-10-27 Thread Schwalbe Matthias
Sorry, I’ve got thing really mixed up, I meant to reply to this other thread … ☹

Thias

From: Schwalbe Matthias
Sent: Thursday, October 27, 2022 9:14 AM
To: 'Tzu-Li (Gordon) Tai' ; Filip Karnicki 

Cc: user 
Subject: RE: State Processor API - VoidNamespaceSerializer must be compatible 
with the old namespace serializer LongSerializer

Hi Filip, Hi Tsu-Li,

@Tsu-Li: long time not seen  (it is time for an on-site FlinkForward in Berlin 
again next year  )

Considering Tsu-Li’s proposal, there is a restriction, at the time being you 
can only create a HybridSource from other sources that have exactly the same 
type.
This is not always feasible given that Flip-27 sources have no means of 
projecting to a specific type within the source.
E.g. a flat-file HDFS source of String won’t match with a on-line Kafka source 
of some AVRO formatted type … to get this working is really tricky.

There is another idea that comes to mind (i.e. not verified / tested):

  *   Implement a MultipleInputOperator with InputSelectable (example see [1])
  *   Always select the bounded input until that one is finished,
  *   The other input(s) will backpressure until getting selected

A third idea, that worked before the existence of MultipleInputOperator was to

  *   Union the streams and buffer incoming records in keyed state (e.g. map 
state of timestamp -> List[event]), and
  *   By means of timers yield them only once the watermark passes by the 
stored timestamp
  *   State backend needs to be RocksDB for this because you can iterate 
MapState by Key-order (= timestamp), this does not work well for the other 
state backends
 *   @Tsu-Li: I remember well when you demonstrated this trick on the 
conference a couple of years ago
  *   The problem with this is, that you collect lots of state because of 
watermark-skew among the two input stream
  *   This can be remedied by restricting watermark-skew in job configuration 
[2]

@Filip: feel free to get back to us for help with getting this set-up …

Sincere grettings


Thias



[1] 
org.apache.flink.streaming.api.graph.StreamGraphGeneratorBatchExecutionTest.InputSelectableMultipleInputOperator
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_

From: Tzu-Li (Gordon) Tai mailto:tzuli...@apache.org>>
Sent: Wednesday, October 26, 2022 6:59 PM
To: Filip Karnicki mailto:filip.karni...@gmail.com>>
Cc: Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>; user 
mailto:user@flink.apache.org>>
Subject: Re: State Processor API - VoidNamespaceSerializer must be compatible 
with the old namespace serializer LongSerializer

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Filip,

I think what you are seeing is expected. The State Processor API was intended 
to allow access only to commonly used user-facing state structures, while 
Stateful Functions uses quite a bit of Flink internal features, including for 
its state maintenance.
The list state in question in StateFun's FunctionGroupOperator is an internal 
kind of state normally used in the context of Flink window states that are 
namespaced. Normal user-facing list states are not namespaced.

Just curious, which specific state in FunctionGroupOperator are you trying to 
transform? I assume all other internal state in FunctionGroupOperator you want 
to remain untouched, and only wish to carry them over to be included in the 
transformed savepoint?

Thanks,
Gordon


On Wed, Oct 26, 2022 at 3:50 AM Filip Karnicki 
mailto:filip.karni...@gmail.com>> wrote:
Hi Thias

Thank you for your reply. I can re-create a simplified use case at home and 
stick it on github if you think it will help.

What I'm trying to access is pretty internal to Flink Stateful Functions. It 
seems that a custom operator 
(https://github.com/apache/flink-statefun/blob/09a5cba521e9f994896c746ec9f8cc6479403612/statefun-flink/statefun-flink-core/src/main/java/org/apache/flink/statefun/flink/core/functions/FunctionGroupOperator.java#L188)
 is accessing a KeyedStateBackend and creating an InternalListState, which I'm 
not sure I'll be able to get my hands on using the State Processor API.

The only reason why I need to get my hands on all the states from this Stateful 
Functions operator is because later I (think I) have to use 
.removeOperator(uid) on a savepoint and replace it .withOperator(uid, 
myTransformation) in order to transform my own, non-stateful-functions keyed 
state which also belongs to this operator.

Kind regards
Fil

On Tue, 25 Oct 2022 at 16:24, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi Filip,

It looks like, your state primitive is used in the context of Windows:
Keyed state works like this:
· It uses a cascade of key types to store and retrieve values:
oThe key (set by .keyBy)
oA namespace (usually a VoidNamespace), unless it is used in context of a 
specific window
o   

RE: State Processor API - VoidNamespaceSerializer must be compatible with the old namespace serializer LongSerializer

2022-10-27 Thread Schwalbe Matthias
Hi Filip, Hi Tsu-Li,

@Tsu-Li: long time not seen  (it is time for an on-site FlinkForward in Berlin 
again next year  )

Considering Tsu-Li’s proposal, there is a restriction, at the time being you 
can only create a HybridSource from other sources that have exactly the same 
type.
This is not always feasible given that Flip-27 sources have no means of 
projecting to a specific type within the source.
E.g. a flat-file HDFS source of String won’t match with a on-line Kafka source 
of some AVRO formatted type … to get this working is really tricky.

There is another idea that comes to mind (i.e. not verified / tested):

  *   Implement a MultipleInputOperator with InputSelectable (example see [1])
  *   Always select the bounded input until that one is finished,
  *   The other input(s) will backpressure until getting selected

A third idea, that worked before the existence of MultipleInputOperator was to

  *   Union the streams and buffer incoming records in keyed state (e.g. map 
state of timestamp -> List[event]), and
  *   By means of timers yield them only once the watermark passes by the 
stored timestamp
  *   State backend needs to be RocksDB for this because you can iterate 
MapState by Key-order (= timestamp), this does not work well for the other 
state backends
 *   @Tsu-Li: I remember well when you demonstrated this trick on the 
conference a couple of years ago
  *   The problem with this is, that you collect lots of state because of 
watermark-skew among the two input stream
  *   This can be remedied by restricting watermark-skew in job configuration 
[2]

@Filip: feel free to get back to us for help with getting this set-up …

Sincere grettings


Thias



[1] 
org.apache.flink.streaming.api.graph.StreamGraphGeneratorBatchExecutionTest.InputSelectableMultipleInputOperator
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_

From: Tzu-Li (Gordon) Tai 
Sent: Wednesday, October 26, 2022 6:59 PM
To: Filip Karnicki 
Cc: Schwalbe Matthias ; user 

Subject: Re: State Processor API - VoidNamespaceSerializer must be compatible 
with the old namespace serializer LongSerializer

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Filip,

I think what you are seeing is expected. The State Processor API was intended 
to allow access only to commonly used user-facing state structures, while 
Stateful Functions uses quite a bit of Flink internal features, including for 
its state maintenance.
The list state in question in StateFun's FunctionGroupOperator is an internal 
kind of state normally used in the context of Flink window states that are 
namespaced. Normal user-facing list states are not namespaced.

Just curious, which specific state in FunctionGroupOperator are you trying to 
transform? I assume all other internal state in FunctionGroupOperator you want 
to remain untouched, and only wish to carry them over to be included in the 
transformed savepoint?

Thanks,
Gordon


On Wed, Oct 26, 2022 at 3:50 AM Filip Karnicki 
mailto:filip.karni...@gmail.com>> wrote:
Hi Thias

Thank you for your reply. I can re-create a simplified use case at home and 
stick it on github if you think it will help.

What I'm trying to access is pretty internal to Flink Stateful Functions. It 
seems that a custom operator 
(https://github.com/apache/flink-statefun/blob/09a5cba521e9f994896c746ec9f8cc6479403612/statefun-flink/statefun-flink-core/src/main/java/org/apache/flink/statefun/flink/core/functions/FunctionGroupOperator.java#L188)
 is accessing a KeyedStateBackend and creating an InternalListState, which I'm 
not sure I'll be able to get my hands on using the State Processor API.

The only reason why I need to get my hands on all the states from this Stateful 
Functions operator is because later I (think I) have to use 
.removeOperator(uid) on a savepoint and replace it .withOperator(uid, 
myTransformation) in order to transform my own, non-stateful-functions keyed 
state which also belongs to this operator.

Kind regards
Fil

On Tue, 25 Oct 2022 at 16:24, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi Filip,

It looks like, your state primitive is used in the context of Windows:
Keyed state works like this:
· It uses a cascade of key types to store and retrieve values:
oThe key (set by .keyBy)
oA namespace (usually a VoidNamespace), unless it is used in context of a 
specific window
oAn optional key of the state primitive (if it is a MapState)

In your case the state primitive is (probably) declared in the context of a 
window and hence when loading the state by means of StateProcessorAPI you also 
need to specify the correct Namespace TypeInformation.
If I am in doubt, how a state primitive is set up, I let the debugger stop in a 
process function and walk up the call stack to find the proper components 
implementing it.

If you share a little more of your code it

RE: State Processor API - VoidNamespaceSerializer must be compatible with the old namespace serializer LongSerializer

2022-10-25 Thread Schwalbe Matthias
Hi Filip,

It looks like, your state primitive is used in the context of Windows:
Keyed state works like this:

  *   It uses a cascade of key types to store and retrieve values:
 *   The key (set by .keyBy)
 *   A namespace (usually a VoidNamespace), unless it is used in context of 
a specific window
 *   An optional key of the state primitive (if it is a MapState)

In your case the state primitive is (probably) declared in the context of a 
window and hence when loading the state by means of StateProcessorAPI you also 
need to specify the correct Namespace TypeInformation.
If I am in doubt, how a state primitive is set up, I let the debugger stop in a 
process function and walk up the call stack to find the proper components 
implementing it.

If you share a little more of your code it is much easier to provide specific 
guidance 
(e.g. ‘savepoint’ is never used again in your code snippet …)

Sincere greeting

Thias



From: Filip Karnicki 
Sent: Tuesday, October 25, 2022 10:08 AM
To: user 
Subject: State Processor API - VoidNamespaceSerializer must be compatible with 
the old namespace serializer LongSerializer

Hi, I'm trying to load a list state using the State Processor API (Flink 1.14.3)

Cluster settings:


state.backend: rocksdb

state.backend.incremental: true

(...)

Code:

val env = ExecutionEnvironment.getExecutionEnvironment
val savepoint = Savepoint.load(env, pathToSavepoint, new 
EmbeddedRocksDBStateBackend(true))

val tpe = new 
MessageTypeInformation(MessageFactoryKey.forType(MessageFactoryType.WITH_PROTOBUF_PAYLOADS,
 null) // using Flink Stateful Functions
val envelopeSerializer: TypeSerializer[Message] = 
tpe.createSerializer(env.getConfig)
val listDescriptor = new ListStateDescriptor[Message]("delayed-message-buffer", 
envelopeSerializer.duplicate)

(...)
override def open(parameters: Configuration): Unit = {
getRuntimeContext.getListState(listDescriptor) // fails with error [1]
}


Error [1]:

Caused by: java.io.IOException: Failed to restore timer state
at 
org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:177)
at 
org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:64)
at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:183)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: Error while getting state
at 
org.apache.flink.runtime.state.DefaultKeyedStateStore.getListState(DefaultKeyedStateStore.java:74)
at 
org.apache.flink.state.api.runtime.SavepointRuntimeContext.getListState(SavepointRuntimeContext.java:213)
at 
x.x.x.x.x.myModule.StateReader$$anon$1.open(StateReader.scala:527)
at 
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
at 
org.apache.flink.state.api.input.operator.StateReaderOperator.open(StateReaderOperator.java:106)
at 
org.apache.flink.state.api.input.operator.KeyedStateReaderOperator.open(KeyedStateReaderOperator.java:66)
at 
org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:174)
... 7 more
Caused by: org.apache.flink.util.StateMigrationException: The new namespace 
serializer 
(org.apache.flink.runtime.state.VoidNamespaceSerializer@2806d6da)
 must be compatible with the old namespace serializer 
(org.apache.flink.api.common.typeutils.base.LongSerializer@52b06bef).
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.updateRestoredStateMetaInfo(RocksDBKeyedStateBackend.java:685)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.tryRegisterKvStateInformation(RocksDBKeyedStateBackend.java:624)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.createInternalState(RocksDBKeyedStateBackend.java:837)
at 
org.apache.flink.runtime.state.KeyedStateFactory.createInternalState(KeyedStateFactory.java:47)
at 
org.apache.flink.runtime.state.ttl.TtlStateFactory.createStateAndWrapWithTtlIfEnabled(TtlStateFactory.java:73)
at 
org.apache.flink.runtime.state.AbstractKeyedStateBackend.getOrCreateKeyedState(AbstractKeyedStateBackend.java:302)
at 
org.apache.flink.runtime.state.AbstractKeyedStateBackend.getPartitionedState(AbstractKeyedStateBackend.java:353)
at 

RE: Re:Question about Flink Broadcast State event ordering

2022-10-10 Thread Schwalbe Matthias
Hi Qing again,

Another point to consider: broadcast streams are subject to watermarking. i.e.

  *   You can wait to process the broadcast records only after the watermark 
passed, then
  *   order those records by time
  *   keep all broadcast records where the watermark not yet passed in some 
extra data structure without processing them
  *   that also means, the broadcast stream should not be configured to use 
watermark idleness, or manually implement watermark processing logic for the 
idling broadcast stream

This sounds a little complicated, but can definitely be done (I do that all the 
time  )

Best regards

Thias



From: 仙路尽头谁为峰 
Sent: Wednesday, October 5, 2022 10:13 AM
To: Qing Lim 
Cc: User 
Subject: [SPAM] 回复: Re:Question about Flink Broadcast State event ordering

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Qing:
  The key point is that the broadcast side may have different partitions that 
interleaves. If you can make sure those messages you want to be ordered go into 
the same partition, then I think the order can be reserved.

Best regards!
从 Windows 版邮件发送

发件人: Qing Lim
发送时间: 2022年10月5日 15:16
收件人: xljtswf2022
抄送: User
主题: RE: Re:Question about Flink Broadcast State event ordering

Hi, thanks for answering my question.

Is there anyway to make the order reflecting the upstream? I wish to broadcast 
messages that has deletion semantic, so ordering matters here.
I guess worst case I can use some logical timestamp to reason about order at 
downstream.

From: xljtswf2022 mailto:xljtswf2...@163.com>>
Sent: 05 October 2022 03:02
To: Qing Lim mailto:q@mwam.com>>
Cc: User mailto:user@flink.apache.org>>
Subject: Re:Question about Flink Broadcast State event ordering

Hi Qing:
> I think this is refering to the order between broadcasted element and non 
> broadcasted element, right?
  No, as broadcast and nonbroadcast stream are different streams, they will 
usually transfer with different tcp connection, we can not control the order of 
elements in different connections.
> The broadcasted element should arrive in the same order across all tasks, 
> right?
no. image the broadcast stream has 2 partitions ,say p1 and p2. and each 
partition has elements with index 1, 2, 3
then one downstream task may see the broadcast stream p1-1, p1-2 . p2-1, 
p2-2...
and another will see p1-1, p2-1,p1-2,p2-2
ps: for elements usually come in bulk, the index is just for explanation.

Best regards!



At 2022-10-04 21:54:23, "Qing Lim" mailto:q@mwam.com>> 
wrote:
Hi Flink user group,

I have a question around broadcast.

Reading the docs 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/broadcast_state/#important-considerations,
 it says the following:

> Order of events in Broadcast State may differ across tasks: Although 
> broadcasting the elements of a stream guarantees that all elements will 
> (eventually) go to all downstream tasks, elements may arrive in a different 
> order to each task. So the state updates for each incoming element MUST NOT 
> depend on the ordering of the incoming events.

I think this is refering to the order between broadcasted element and non 
broadcasted element, right?
The broadcasted element should arrive in the same order across all tasks, right?

For example, given a broadcasted stream A, and a non-broadcasted stream B

When joining A and B, elements from A should always reach all tasks in the same 
order right? Its just the interleaving of A and B that might differ across 
tasks, did I understand it correctly? I wasn’t sure because its not clear to me 
by just reading the doc, happy to update the doc once its clarified here.

Kind regards.



This e-mail and any attachments are confidential to the addressee(s) and may 
contain information that is legally privileged and/or confidential. If you are 
not the intended recipient of this e-mail you are hereby notified that any 
dissemination, distribution, or copying of its content is strictly prohibited. 
If you have received this message in error, please notify the sender by return 
e-mail and destroy the message and all copies in your possession.

To find out more details about how we may collect, use and share your personal 
information, please see https://www.mwam.com/privacy-policy. This includes 
details of how calls you make to us may be recorded in order for us to comply 
with our legal and regulatory obligations.

To the extent that the contents of this email constitutes a financial 
promotion, please note that it is issued only to and/or directed only at 
persons who are professional clients or eligible counterparties as defined in 
the FCA Rules. Any investment products or services described in this email are 
available only to professional clients and eligible counterparties. Persons who 
are not professional clients or 

RE: Loading broadcast state on BroadcastProcessFunction instantiation or open method

2022-09-27 Thread Schwalbe Matthias
Hi Alfredo,

Did you consider implementing 
org.apache.flink.streaming.api.checkpoint.CheckpointedFunction interface in 
your broadcast function … the initializeState(…) function should give you 
access to the state backend.

Kind regards

Thias


From: David Anderson 
Sent: Tuesday, September 27, 2022 12:26 PM
To: alfredo.vasq...@spglobal.com
Cc: user@flink.apache.org
Subject: Re: Loading broadcast state on BroadcastProcessFunction instantiation 
or open method

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Logically it would make sense to be able to initialize BroadcastState in the 
open method of a BroadcastProcessFunction, but in practice I don't believe it 
can be done -- because the necessary Context isn't made available.

Perhaps you could use the State Processor API to bootstrap some state into the 
broadcast state.

David

On Mon, Sep 26, 2022 at 6:07 PM 
alfredo.vasquez.spglobal.com via user 
mailto:user@flink.apache.org>> wrote:
Hello community.

Currently we have a BroadcastProcessFunction implementation that is storing the 
broadcast state using a MapStateDescriptor.
I have a use case that needs to load the BroadcastState to perform some 
operation before receiving any processElement or processBroadcastElement 
message.

Is there a way to load the BroadcastState on BroadcastProcessFunction  
instantiation, overriding open(Configuration parameters) method or by 
overriding some other callback function?

Kind regards,



The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. S Global Inc. 
reserves the right, subject to applicable local law, to monitor, review and 
process the content of any electronic message or information sent to or from 
S Global Inc. e-mail addresses without informing the sender or recipient of 
the message. By sending electronic message or information to S Global Inc. 
e-mail addresses you, as the sender, are consenting to S Global Inc. 
processing any of your personal data therein.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


KeyedMultipleInputTransformation: StreamingRuntimeContext#keyedStateStore is not properly initialized

2022-09-21 Thread Schwalbe Matthias
Hi all,

When trying to adopt the new (@Experimental) KeyedMultipleInputTransformation I 
came across following problem:


  *   In the open(…) function of my operator, derived from 
MultipleInputStreamOperator with AbstractStreamOperatorV2, I can not initialize 
keyed state primitives, because
  *   StreamingRuntimeContext#keyedStateStore is not properly initialized
  *   I’ve tracked down the root cause to a difference in the implementation of 
initializeState(…) in
 *   AbstractStreamOperator [1] vs.
 *   AbstractStreamOperatorV2 [2]
 *   The latter is missing a call to: 
runtimeContext.setKeyedStateStore(stateHandler.getKeyedStateStore().orElse(null));

Is this an oversight (bug) or is this intentional?

… However I’ve found a work-around for until this is fixed:

  *   Implement initializeState(context: StateInitializationContext) in the 
operator and initialize the runtime-context properly (scala):

override def initializeState(context: StateInitializationContext): Unit = {
super.initializeState(context)
//TODO: remove once flink implementation is fixed
val rtc = getRuntimeContext
val kss = context.getKeyedStateStore
rtc.setKeyedStateStore(kss)
}


I can create a Bug ticket, if confirmed as bug 

Many thanks

Thias



[1] 
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractStreamOperator.java#L287
[2] 
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractStreamOperatorV2.java#L231
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Is it possible to connect multiple streams

2022-09-21 Thread Schwalbe Matthias
Hi Deepak,

Coming back to your original question, you’ve got a number of option (some of 
them already mentioned:

  *   You can connect/join 2 streams of different types at a time by means of 
s1.connect(s2).
 *   (your example does not work directly as written (3 streams))
  *   You can connect many streams of same type by means of s1.union(s2, s3, 
…).
  *   The third option is new and not yet documented (marked as @experimental):
 *   Connect/join many streams of different types by means of a 
[Keyed]MultipleInputTransformation
*   If keyed: input streams need to either
   *   be keyed on the same key type, or
   *   be non-keyed (most likely) broadcast streams
 *   the API is still a little elaborate, a good starting point could be 
this test case: [1]

If of any interest, feel free to ask for clarifications …


Thias
PS: see also my next email in a couple of minutes …


[1] 
https://github.com/apache/flink/blob/master/flink-streaming-java/src/test/java/org/apache/flink/streaming/api/graph/StreamGraphGeneratorBatchExecutionTest.java#L441

From: Shammon FY 
Sent: Wednesday, September 21, 2022 6:29 AM
To: user@flink.apache.org
Cc: Deepak kumar Gunjetti ; Yaroslav Tkachenko 

Subject: Re: Is it possible to connect multiple streams

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi

Thanks @yaroslav .
And @deepakgd79 here is the document for datastream: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#datastream-transformations
 You can find examples for union, connect, join and other transformations


On Wed, Sep 21, 2022 at 11:55 AM Yaroslav Tkachenko 
mailto:yaros...@goldsky.com>> wrote:
Hi Deepak,

You can use a union operator. I actually gave a talk on creating an advanced 
join using the union operator and multiple streams:
- 
https://www.slideshare.net/sap1ens/storing-state-forever-why-it-can-be-good-for-your-analytics
- https://www.youtube.com/watch?v=tiGxEGPyqCg

I hope this helps.

On Tue, Sep 20, 2022 at 5:22 PM Deepak kumar Gunjetti 
mailto:deepakg...@gmail.com>> wrote:
Hi,
My name is Deepak, I am a new user to apache flink. It is one of the best open 
source i have used. I want to thank the community for developing such a 
wonderful product.

I have one query.
Is it possible to connect multiple streams, like
stream1.connect(stream2).connect(stream3).flatmap(new 
RickCoFlatMapFunctionHandler())

Can someone please let me know how I can achieve this.
Thanks,
Deepak
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: A question about restoring state with an additional variable with kryo

2022-09-16 Thread Schwalbe Matthias
Hi Vishal,

Good news and bad news :


  *   Bad: Kryo serializer cannot be used for schema evolution, see [1]
  *   Good: not all is lost here,
 *   If you happen to have state that you cannot afford to lose, you can 
transcode it by means of the savepoint API [2],
 *   However, this takes quite some effort
  *   In general, if you ever plan to migrate/extend your schemas, choose a 
data type that supports schema migration [1],
  *   In your case, PoJo types would be the closest to your original 
implementation
  *   You can disable Kryo in configuration to avoid this situation in the 
future, by the way,
  *   Kryo serializer is quite slow compared to the other options and I believe 
it is only there as a (emergency) fallback solution: [3]

Feel free to ask for clarification 

Thias



[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/serialization/schema_evolution/#kryo-cannot-be-used-for-schema-evolution
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/libs/state_processor_api/
[3] 
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html



From: Vishal Santoshi 
Sent: Friday, September 16, 2022 1:17 AM
To: user 
Subject: Re: A question about restoring state with an additional variable with 
kryo

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


The exception thrown is as follows. I realize that it is trying to read the 
long value. How do I signal to kryo that it is OK and that he object can have a 
default value

Caused by: java.io.EOFException: No more bytes left.
at 
org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:80)
at 
com.esotericsoftware.kryo.io.Input.readVarLong(Input.java:690)
at 
com.esotericsoftware.kryo.io.Input.readLong(Input.java:685)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:133)
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:123)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:730)
at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:354)
at 
org.apache.flink.api.common.typeutils.CompositeSerializer.deserialize(CompositeSerializer.java:156)
at 
org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:89)

On Thu, Sep 15, 2022 at 7:10 PM Vishal Santoshi 
mailto:vishal.santo...@gmail.com>> wrote:
<< How do I make sure that when reconstituting the state, kryo does not 
complain? It tries to map the previous state to the new definition of Class A 
and complains that it cannot read the value for `String b`.

>> How do I make sure that when reconstituting the state, kryo does not 
>> complain? It tries to map the previous state to the new definition of Class 
>> A and complains that it cannot read the value for `long b`.

Sorry a typo


On Thu, Sep 15, 2022 at 7:04 PM Vishal Santoshi 
mailto:vishal.santo...@gmail.com>> wrote:
I have state in rocksDB that represents say

class A {
  String a
}

I now change my class and add another variable


Class A {
  String a;
  long b = 0;
}

How do I make sure that when reconstituting the state, kryo does not complain? 
It tries to map the previous state to the new definition of Class A and 
complains that it cannot read the value for `String b`.

Unfortunately the state is not using POJO serializer.

Thanks and Regards.

Vishal




Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: how to connect to the flink-state store and use it as cache to serve APIs.

2022-07-06 Thread Schwalbe Matthias
Hi Laxmi,

Did you consider Apache Flink Table Store [1] which was introduced short time 
ago.
Yours sounds like a case for early integration …

Sincere greetings

Thias

[1] https://nightlies.apache.org/flink/flink-table-store-docs-release-0.1/

From: laxmi narayan 
Sent: Wednesday, July 6, 2022 6:29 AM
To: Yuan Mei 
Cc: Hangxiang Yu ; user 
Subject: Re: how to connect to the flink-state store and use it as cache to 
serve APIs.

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠



Hi Folks,

I just wanted to double check, if there is any way to expose rest APIs using 
Flink sql tables ?




Thank you.


On Thu, Jun 30, 2022 at 12:15 PM Yuan Mei 
mailto:yuanmei.w...@gmail.com>> wrote:
That's definitely something we want to achieve in the future term, and your 
input is very valuable.

One problem with the current queryable state setup is that the service is 
bounded to the life cycle of Flink Job, which limits the usage of the state 
store/service.

Thanks for your insights.

Best
Yuan

On Wed, Jun 29, 2022 at 3:41 PM laxmi narayan 
mailto:nit.dgp...@gmail.com>> wrote:

Hi Hangxiang,

I was thinking , since we already store entire state in the checkpoint dir so 
why can't we expose it as a service through the Flink queryable state, in this 
way I can easily avoid introducing a cache and serve realtime APIs via this 
state itself and I can go to the database for the historical data.



Thank you.


On Wed, Jun 29, 2022 at 11:17 AM Hangxiang Yu 
mailto:master...@gmail.com>> wrote:
Hi, laxmi.
There are two ways that users can access the state store currently:
1. Queryable state [1] which you could access states in runtime.
2. State Processor API [2] which you could access states (snapshot) offline.

But we have marked the Queryable state as "Reaching End-of-Life".
We are also trying to find a graceful and effective way for users to debug and 
troubleshoot.
So could you share your case about what you want to use it for ?
Your feedback is important for us to design it in the long term. Thanks!

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/queryable_state/
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/libs/state_processor_api/
[3] https://flink.apache.org/roadmap.html

Best,
Hangxiang.

On Tue, Jun 28, 2022 at 8:26 PM laxmi narayan 
mailto:nit.dgp...@gmail.com>> wrote:
Hi Team,
I am not sure if this is the right use case for the state-store but I wanted to 
serve the APIs using queryable-state, what are the different ways to achieve 
this ?

I have come across a version where we can use Job_Id to connect to the state, 
but is there any other way to expose a specific rest-endpoint etc ?
Any sample example/github link would be nice.



Thank you.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Synchronizing streams in coprocessfunction

2022-06-27 Thread Schwalbe Matthias
Hi Gopi,

Your use case is a little under-specified to give a specific answer, especially 
to the nature of the two input streams and the way events of both streams are 
correlated (joined):

  *   Is your fast-stream keyed?
 *   If yes: keyed state and timers can be used, otherwise only operator 
state can be used to buffer events, no timers
  *   Is your metadata-stream keyed? I.e.
 *   Metadata-stream events are combined only to fast-stream events having 
the same respective key
*   Implement a KeyedCoProcessFunction …
 *   Metadata-stream events apply to all fast-stream events irrespective of 
the key
*   Implement a KeyedBroadcastProcessFunction (after converting the 
metadata-stream to a broadcast stream)
*   Then in the processBroadcastElement function you can iterate over 
all keys of all state primitives
  *   None of your streams are keyed?
 *   That leaves you only the option of using operator state
*   Current implementation of operator state is not incremental and 
thus it is completely generated/stored with each state checkpoint
*   This allows only a moderate number of datapoints in operator state
  *   Which version of Flink are you using? Recommendations above refer to 
Flink 1.15.0

Looking forward to your answers (also please go a little more into detail of 
you use case) and follow up questions …

Greetings

Thias


From: Gopi Krishna M 
Sent: Monday, June 27, 2022 3:01 PM
To: Qingsheng Ren 
Cc: user@flink.apache.org
Subject: Re: Synchronizing streams in coprocessfunction

Thanks Quingsheng, that would definitely work. But I'm unable to figure out how 
I can apply this with CoProcessFunction. One stream is windowed and trigger 
implementation uses the 2nd stream.

On Mon, Jun 27, 2022 at 3:29 PM Qingsheng Ren 
mailto:re...@apache.org>> wrote:
Hi Gopi,

What about using a window with a custom trigger? The window is doing nothing 
but aggregating your input to a collection. The trigger accepts metadata from 
the low input stream so it can fire and purge the window (emit all elements in 
the window to downstream) on arrival of metadata.

Best,
Qingsheng

> On Jun 27, 2022, at 12:46, Gopi Krishna M 
> mailto:gopikrish...@gmail.com>> wrote:
>
> Hi,
> I've a scenario where I use connected streams where one is a low throughput 
> metadata stream and another one is a high throughput data stream. I use 
> CoProcessFunction that operates on a data stream with behavior controlled by 
> a metadata stream.
>
> Is there a way to slow down/pause the high throughput data stream until I've 
> received one entry from the metadata stream? It's possible that by the time I 
> get the first element from the metadata stream, I might get 1000s of items 
> from the data stream. One option is to create a state to buffer the data 
> stream within the operator. Is there any other option which doesn't need this 
> state management?
>
> Thanks,
> Gopi
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Recover watermark from savepoint

2022-06-10 Thread Schwalbe Matthias
Hi Sweta,

It is actually a sound idea to implement a dedicated process function for this 
purpose, as David suggests.
Especially if you are in a situation where waiting for a valid natural 
watermark after a restore from savepoint is not sufficient.

We had a situation with input streams of different update frequencies (one only 
updated once a day, and hence only generated watermarks once a day).

This is how you can approach the specific task of

  *   watermark storing:
 *   Create a process function
 *   Create a map that stores the latest watermark per sub-partition (i.e. 
there are 128 sub-partitions in a job with max-parallelism of 128)
 *   Store this map into operator state with each checkpoint
 *   Create a repeating processing time timer (with high frequency 
according to your needs), in order to yield a watermark after savepoint restore
  *   watermark restoring:
 *   when restoring from operator state (because there might have been a 
change in parallelism):
 *   determine the lowest watermark among all sub-partition that belong to 
the respective subtask (on operator state restore)
 *   yield this watermark in processing time handler of above timer (once)

Feel free to ask details, I hope this helps … I need to ask my folks whether I 
can share our implementation (20 lines of code, odd).

What do you think?

Thias


From: David Anderson 
Sent: Thursday, June 9, 2022 11:35 AM
To: User-Flink 
Subject: Re: Recover watermark from savepoint

Sweta,

Flink does not include watermarks in savepoints, nor are they included in 
aligned checkpoints. For what it's worth, I believe that with unaligned 
checkpoints in-flight watermarks are included in checkpoints, but I don't 
believe that would solve the problem, since the watermark strategy's state is 
still lost during a restart.

I can't think of any way to guarantee that all possibly late events will be 
deterministically identified as late. The commonly used 
bounded-out-of-orderness watermark strategy doesn't guarantee this either, even 
without a restart (because watermarks are delayed by the auto watermark 
interval, rather than being produced at every conceivable opportunity).

If this is a strong requirement, you could decide not to rely on watermarks for 
dropping late events, and implement the logic yourself in a process function.

Best,
David

On Wed, Jun 8, 2022 at 6:10 PM Sweta Kalakuntla 
mailto:skalakun...@bandwidth.com>> wrote:
Hi,

I want to understand if flink saves a watermark during savepoint and if not, 
how do we achieve this?

We are seeing an issue where on recovery, the job processes some late events 
which should have been discarded if the job were to be running without any 
downtime.

Thank you,
Sweta
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Checkpoint directories not cleared as TaskManagers run

2022-05-19 Thread Schwalbe Matthias
Hi James,

Let me give some short answers, (there is documentation that better describes 
this):

>> - why do taskmanagers create the chk-x directory but only the jobmanager can 
>> delete it? Shouldn’t the jobmanager be the only component creating and 
>> deleting these directories? That would seem more consistent to me but maybe 
>> there is a reason.

  *   Assuming proper setup, i.e. checkpoint directory is on a shared folder
  *   Tasks and state thereof are split as subtasks to separate slots 
(according to parallelism)
  *   When checkpoints are written each state primitive on each resp. subtask 
writes its portion of state to the checkpoint folder and forwards the filename 
to the job manager
  *   For incremental checkpoints some files also remain in older checkpoint 
folders until obsolete
  *   This process is managed by jobmanager
  *   In the end of each checkpoint, jobmanager writes _metadata file to the 
resp. checkpoint folder containing (simplified) the filenames of respective 
states and small state
  *   When a new checkpoint is finished, jobmanager decides according to 
configuration which old checkpoint files become obsolete and hence deleted
  *   In general checkpoints and savepoints are for high availability purposes, 
if the checkpoint data were on a local folder of machine that crashed it would 
not be available for restart of the job
  *   The parts that should be on a local (and fast) drive are the ones used by 
RocksDB, these are ephermeral and can (and will) be recreated on job recovery
>>  - I see many files under each chk-x folder. Can anyone confirm if each file 
>> is wholly owned by a single task manager? ie is each file only written by 1 
>> TM? Otherwise there could be file locking and contention.

  *   Mostly explained above … however
  *   If two taskmanagers happen to be started on the same machine (uncommon 
for k8s, common for Yarn resource manager) they would use the same folder
  *   Filenames contain a uuid which is unlikely to collide
>> - we are now looking to add in NFS mounts for our containers so all the job 
>> managers and taskmanagers share the same path. Can anyone confirm if NFS is 
>> a ‘reliable’ storage mechanism as we have heard many stories how problematic 
>> it can be. We are not yet able to use HDFS or S3.

  *   NFS is not reliable, probably not fit for PROD purposes, don’t know about 
some NAS setup that uses NFS and has integrated reliability …
>> - if Flink can not write to NFS my understanding is although the checkpoint 
>> will fail the Flink process will carry on and try again at the next 
>> checkpoint. It will not cause my program to fail correct?

  *   Imho there would be no reason to setup checkpointing in the first place, 
if you cannot restart a job from such checkpoint
  *   This is only important, of course, if you need reliability, or exactly 
once semantics …

Thias

From: James Sandys-Lumsdaine 
Sent: Wednesday, May 18, 2022 2:53 PM
To: Schwalbe Matthias 
Cc: Hangxiang Yu ; user@flink.apache.org
Subject: Re: Checkpoint directories not cleared as TaskManagers run

Hello Matthias,

Thanks for your reply. Yes indeed your are correct. My /tmp path is private so 
you have confirmed what I thought was happening.

I have some follow up questions:
- why do taskmanagers create the chk-x directory but only the jobmanager can 
delete it? Shouldn’t the jobmanager be the only component creating and deleting 
these directories? That would seem more consistent to me but maybe there is a 
reason.
- I see many files under each chk-x folder. Can anyone confirm if each file is 
wholly owned by a single task manager? ie is each file only written by 1 TM? 
Otherwise there could be file locking and contention.
- we are now looking to add in NFS mounts for our containers so all the job 
managers and taskmanagers share the same path. Can anyone confirm if NFS is a 
‘reliable’ storage mechanism as we have heard many stories how problematic it 
can be. We are not yet able to use HDFS or S3.
- if Flink can not write to NFS my understanding is although the checkpoint 
will fail the Flink process will carry on and try again at the next checkpoint. 
It will not cause my program to fail correct?

Many thanks again,

James.

Sent from my iPhone


On 17 May 2022, at 15:17, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:

Hi James,

From reading the thread … I assume, your file:/tmp/Flink/State folder is not 
shared across all machines, right?

In this case it cannot work:
- checkpoints and savepoints need to go to a path that can be commonly accessed 
by jobmanager and all taskmanagers in order to work
- as your jobmanager can not access the checkpoint files of it can also not 
clean-up those files

Hope that helps

Regards

Thias

From: James Sandys-Lumsdaine mailto:jas...@hotmail.com>>
Sent: Tuesday, May 17, 2022 3:55 PM
To: Hangxiang Y

RE: Checkpoint directories not cleared as TaskManagers run

2022-05-17 Thread Schwalbe Matthias
Hi James,

From reading the thread … I assume, your file:/tmp/Flink/State folder is not 
shared across all machines, right?

In this case it cannot work:
- checkpoints and savepoints need to go to a path that can be commonly accessed 
by jobmanager and all taskmanagers in order to work
- as your jobmanager can not access the checkpoint files of it can also not 
clean-up those files

Hope that helps

Regards

Thias

From: James Sandys-Lumsdaine 
Sent: Tuesday, May 17, 2022 3:55 PM
To: Hangxiang Yu ; user@flink.apache.org
Subject: Re: Checkpoint directories not cleared as TaskManagers run

Thanks for your replay.

To be clear on my setup with the problem:

  *   4 taskmanagers running across different containers and machines. Each 
container has its own filesystem including / and /tmp.
  *   1 jobmanager also running in its own container and machine. Also has its 
own filesystem.
  *   I have configured the FS checkpoint address to be "file:/tmp/Flink/State" 
- therefore each process (JM and TMs) are reading and writing to their own 
/tmp. i.e. there is no shared access like if it was NFS or HDFS.
So when the checkpointing happens the directories are created and populated but 
only the JM's old checkpoint directories and cleaned up. Each of the TM 
/tmp/Flink/State old "chk-x" directories remain and are not cleared up.

From your email I don't know if you think I am writing to a "shared" path or 
not?

I started looking at the in memory checkpoint storage but this has a max size 
with an int so can't have for 5GB of state. I need the checkpointing to trigger 
my sinks to persist (GenericWriteAheadSink) so it seem I have​ to create a 
proper shared file path all my containers can access.

James.

From: Hangxiang Yu mailto:master...@gmail.com>>
Sent: 17 May 2022 14:38
To: James Sandys-Lumsdaine mailto:jas...@hotmail.com>>; 
user@flink.apache.org 
mailto:user@flink.apache.org>>
Subject: Re: Checkpoint directories not cleared as TaskManagers run

Hi, James.
I may not get what the problem is.
All checkpoints will store in the address as you set.
IIUC, TMs will write some checkpoint info in their local dir and then upload 
them to the address and then delete local one.
JM will write some metas of checkpoint to the address and also do the entire 
deletion for checkpoints.
Best,
Hangxiang.

On Tue, May 17, 2022 at 9:09 PM James Sandys-Lumsdaine 
mailto:jas...@hotmail.com>> wrote:
Some further Googling says on a StackOverflow posting it is the jobmanager that 
does the deletion and not the taskmanagers.

Currently my taskmanagers are writing their checkpoints to their own private 
disks (/tmp) rather than a share - so my suspicion is the jobmanager can't 
access the folder on other machine. I thought the jobmanagers could clear up 
their own state when instructed to by the jobmanager.

I can not yet use an nfs mount in my deployment so I may have to switch to heap 
checkpoint state instead of using the file storage checkpoint system. Now I 
understand what's going on a bit better it seems pointless for me to have file 
checkpoints that can't be read by the jobmanager for failover.

If anyone can clarify/correct me I would appreciate.

James.

From: James Sandys-Lumsdaine
Sent: 16 May 2022 18:52
To: user@flink.apache.org 
mailto:user@flink.apache.org>>
Subject: Checkpoint directories not cleared as TaskManagers run


Hello,



I'm seeing my Flink deployment's checkpoint storage directories build up and 
never clear down.



When I run from my own IDE, I see the only the latest "chk-x" directory under 
the job id folder. So the first checkpoint is "chk-1", which is then replaced 
with "chk-2" etc.



However, when I run as a proper application mode deployment, each of the 4 
taskmanagers running in their own containers retain every one of the "chk-x" 
directories meaning they eat a lot of disk space after as time progresses. 
Interestingly, the jobmanager itself is fine.



Does anyone have any suggestion on how to debug this? Anything obvious that 
would cause such behaviour? I'm currently using Flink 1.14.0.



My set up is essentially below (trimmed for simplicity):

   Configuration conf = new Configuration();

conf.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true);


conf.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
true);

final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);



env.enableCheckpointing(5 * 1000);

env.getCheckpointConfig().setMinPauseBetweenCheckpoints(10 * 1000);



env.setStateBackend(new HashMapStateBackend());

env.getCheckpointConfig().setCheckpointStorage("file:/tmp/Flink/State");


Thanks in advance,

James.

Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche 

RE: Practical guidance with Scala and Flink >= 1.15

2022-05-10 Thread Schwalbe Matthias
… just for my understanding

From the announcements I only got that scala remains only a dependency in the 
JARs that relate to the Scala API.
I never read about plans to drop the Scala API altogether … is that the case??
That would be very unfortunate …

What is the state of the affair?

Best regards

Thias



From: Martijn Visser 
Sent: Monday, May 9, 2022 2:38 PM
To: Robert Metzger 
Cc: Salva Alcántara ; user 
Subject: Re: Practical guidance with Scala and Flink >= 1.15

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Salva,

Like Robert said, I don't expect that we will be able to drop support for Scala 
2.12 anytime soon. I do think that we should have a discussion in the Flink 
community about providing Scala APIs. My opinion is that we are probably better 
off to deprecate the current Scala APIs (keeping it internal as we still have a 
big piece of Scala internally) and only offer Java APIs. The Flink community 
lacks real Scala maintainers. I think Seth's blog is pretty spot-on on this too 
[1].

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser

[1] https://flink.apache.org/2022/02/22/scala-free.html

On Mon, 9 May 2022 at 12:24, Robert Metzger 
mailto:metrob...@gmail.com>> wrote:
Hi Salva,
my somewhat wild guess (because I'm not very involved with the Scala 
development on Flink): I would stick with option 1 for now. It should be easier 
now for the Flink community to support Scala versions past 2.12 (because we 
don't need to worry about scala 2.12+ support for Flink's internal dependencies 
such as akka).
An argument against supporting newer Scala versions is that I'm not aware of 
anybody currently working on Flink with Scala in general.

On Fri, May 6, 2022 at 6:37 PM Salva Alcántara 
mailto:salcantara...@gmail.com>> wrote:
I've always used Scala in the context of Flink. Now that Flink 1.15 has become 
Scala-free, I wonder what is the best (most practical) route for me moving 
forward. These are my options:

1. Keep using Scala 2.12 for the years to come (and upgrade to newer versions 
when the community has come up with something). How long is Flink expected to 
support Scala 2.12?

2. Upgrade to Scala 2.13 or Scala 3 and use the Java API directly (without any 
Scala-specific wrapper/API). How problematic will that be, especially regarding 
type information & scala-specific serializers? I hate those "returns" (type 
hints) in the Java API...

3. Switch to Java, at least for the time being...

To be clear, I have a strong preference for Scala over Java, but I'm trying to 
look at the "grand scheme of things" here, and be pragmatic. I guess I'm not 
alone here, and that many people are indeed evaluating the same pros & cons. 
Any feedback will be much appreciated.

Thanks in advance!
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Notify on 0 events in a Tumbling Event Time Window

2022-05-10 Thread Schwalbe Matthias
Hi Shilpa,

There is no need to have artificial messages in the input kafka topic (and I 
don’t see where Andrew suggests this  )

However your use case is not 100% clear as to for which keys you want to emit 
0-count window results , either:

  *   A) For all keys your job has ever seen (that’s easy), or
  *   B) For all keys you job has seen, but you stop sending 0-count windows 
after the first one is emitted, and only start with the key when there is a new 
input event on the key, or
  *   C) For all keys from a pre-selection of keys

A KeyedProcessFunction is the way to go
I’ll sketch a solution for scenario A) the others are similar (scala-ish):

class Manual0Windowing extends KeyedProcessFunction[…] {

def open(…) = {
//register state primitive for aggregated window state with default 
0-window-state
val state = …
}

def processEvent(event, …) = {
val windowEnd = getWindowEndTime(event)
ctx.registerEventTimeTimer(windowEnd)
var currentState = state.get //or default)
currentState = aggregate(currentState, event)
state.put(currentState)
}

def onTimer(timestamp, ctx, out) = {
val currentState = state.get

if(is0Window(currentState)) {
//for scenario B) drop next line
ctx….registerEventTimer(timestamp + tumblingWIndowTime)

} else {
ctx….registerEventTimer(timestamp + tumblingWIndowTime)
}
out.collect(currentState)
state.clear
}

}

… Just to give an idea
… this code does not take care of late events (need too use a MapState instead 
keyed by windowEndTime)

What do you think…?

Thias



From: Shilpa Shankar 
Sent: Monday, May 9, 2022 4:00 PM
To: Dario Heinisch ; Andrew Otto 
Cc: user@flink.apache.org
Subject: Re: Notify on 0 events in a Tumbling Event Time Window

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Thanks Andrew.
We did consider this solution too. Unfortunately we do not have permissions to 
generate artificial kafka events in our ecosystem.

Dario,
Thanks for your inputs. We will give your design a try. Due the number of 
events being processed per window, we are using incremental aggregate function 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#processwindowfunction-with-incremental-aggregation.
 Do you think we can use KeyedCoProcessFunction in this design?

Thanks,
Shilpa







On Mon, May 9, 2022 at 9:31 AM Dario Heinisch 
mailto:dario.heini...@gmail.com>> wrote:

It depends on the user case,  in Shilpa's use case it is about users so the 
user ids are probably know beforehand.

https://dpaste.org/cRe3G <= This is an example with out an window but 
essentially Shilpa you would be reregistering the timers every time they fire.
You would also have to ingest the user ids before hand into your pipeline, so 
that if a user never has any event he still gets a notification. So probably on 
startup ingest the user ids with a single source
from the DB.

My example is pretty minimal but the idea in your case stays the same:

- key by user
- have a co-process function to init the state with the user ids
- reregister the timers every time they fire
- use `env.getConfig().setAutoWatermarkInterval(1000)` to move the event time 
forward even if there is no data coming in (this is what you are probably 
looking for!!)
- then collect an Optionable/CustomStruct/Null or so depending on if data is 
present or not
- and then u can check whether the event was triggered because there was data 
or because there wasn't data

Best regards,

Dario
On 09.05.22 15:19, Andrew Otto wrote:
This sounds similar to a non streaming problem we had at WMF.  We ingest all 
event data from Kafka into HDFS/Hive and partition the Hive tables in hourly 
directories.  If there are no events in a Kafka topic for a given hour, we have 
no way of knowing if the hour has been ingested successfully.  For all we know, 
the upstream producer pipeline might be broken.

We solved this by emitting artificial 'canary' events into each topic multiple 
times an hour.  The canary events producer uses the same code pathways and 
services that (most) of our normal event producers do.  Then, when ingesting 
into Hive, we filter out the canary events.  The ingestion code has work to do 
and can mark an hour as complete, but still end up writing no events to it.

Perhaps you could do the same?  Always emit artificial events, and filter them 
out in your windowing code? The window should still fire since it will always 
have events, even if you don't use them?




On Mon, May 9, 2022 at 8:55 AM Shilpa Shankar 
mailto:sshan...@bandwidth.com>> wrote:
Hello,
We are building a flink use case where we are consuming from a kafka topic and 
performing aggregations and generating alerts based on average, max, min 
thresholds. We also need to notify the users when there are 0 events in a 
Tumbling Event Time 

RE: Migrating Flink apps across cloud with state

2022-05-04 Thread Schwalbe Matthias
Hello Hemanga,

MirrorMaker can cause havoc in many respects, for one, it does not have strict 
exactly-once.semantics…

The way I would tackle this problem (and have done in similar situaltions):


  *   For the source topics that need to be have exactly-once-semantics and 
that are not intrinsically idempotent:
  *   Add one extra operator after the source that deduplicates events by 
unique id for a rolling time range (on the source cloud provider)
  *   Take a savepoint after the rolling time-range has passed (at least once 
completely)
  *   Move your job to the target cloud provider
  *   Reconfigure the resp. source with a new kafka consumer group.id,
  *   Change the uid() of the resp. kafka source,
  *   Configure start-by-timestamp for the resp. source with a timestamp that 
lies within the rolling time range (of above)
  *   Configure the job to ignore  recovery for state that does not have a 
corresponding operator in the job (the previous kafka source uid()s)
  *   Start the job on new cloud provider, wait for it to pick up/back-fill
  *   Take a savepoint
  *   Remove deduplication operator if that causes too much 
load/latency/whatever

This scheme sounds more complicated than it really is … and has saved my sanity 
quite a number of times 

Good luck and ready to answer more details

Thias

From: Hemanga Borah 
Sent: Tuesday, May 3, 2022 3:12 AM
To: user@flink.apache.org
Subject: Migrating Flink apps across cloud with state

Hello,
 We are attempting to port our Flink applications from one cloud provider to 
another.

 These Flink applications consume data from Kafka topics and output to various 
destinations (Kafka or databases). The applications have states stored in them. 
Some of these stored states are aggregations, for example, at times we store 
hours (or days) worth of data to aggregate over time. Some other applications 
have cached information for data enrichment, for example, we store data in 
Flink state for days, so that we can join them with newly arrived data. The 
amount of data on the input topics is a lot, and it will be expensive to 
reprocess the data from the beginning of the topic.

 As such, we want to retain the state of the application when we move to a 
different cloud provider so that we can retain the aggregations and cache, and 
do not have to start from the beginning of the input topics.

 We are replicating the Kafka topics using MirrorMaker 2. This is our procedure:

  *   Replicate the input topics of each Flink application from source cloud to 
destination cloud.
  *   Take a savepoint of the Flink application on the source cloud provider.
  *   Start the Flink application on the destination cloud provider using the 
savepoint from the source cloud provider.

However, this does not work as we want because there is a difference in offset 
in the new topics in the new cloud provider (because of MirrorMaker 
implementation). The offsets of the new topic do not match the ones stored on 
the Flink savepoint, hence, Flink cannot map to the offsets of the new topic 
during startup.

Has anyone tried to move clouds while retaining the Flink state?

Thanks,
Hemanga
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Restore Job from CheckPoint in IntelliJ IDE - MiniCluster

2022-04-22 Thread Schwalbe Matthias
Happy to hear that

(back-posted to usr list)

Thias


-Original Message-
From: Κωνσταντίνος Αγαπίδης  
Sent: Friday, April 22, 2022 3:50 PM
To: Schwalbe Matthias 
Subject: Re: Restore Job from CheckPoint in IntelliJ IDE - MiniCluster

*** EXTERNAL MESSAGE – CAUTION: Think Before You Click ***

Hi Thias,

I set the "execution.savepoint.path" config option in the beggining of the job 
and it worked for me too. Thanks for your help!

Best Regards,
Kostas

21 Απριλίου 2022 10:20 ΠΜ, "Schwalbe Matthias"  
έγραψε:

> Hi Kostas,
> 
> Did you give setting execution.savepoint.path a try?
> 
> You can set the property on local environment by means of env.configure(...).
> This work for me ... (didn't try yet on Flink 1.15)
> 
> Thias
> 
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deploy
> ment/config/#execution-savepoi
> t-path
> 
> -Original Message-
> From: Κωνσταντίνος Αγαπίδης 
> Sent: Wednesday, April 20, 2022 10:02 PM
> To: user@flink.apache.org
> Subject: Restore Job from CheckPoint in IntelliJ IDE - MiniCluster
> 
> *** EXTERNAL MESSAGE – CAUTION: Think Before You Click ***
> 
> Hi everyone,
> 
> I have a technical question about a problem I am dealing with in my 
> Flink Jobs. I am using IntelliJ, Java 1.8 and Flink Version 1.15.0 Libraries 
> and Maven Dependencies in my jobs.
> 
> I am trying to restore a job in Flink Minicluster Local Environment 
> but I cannot find documentation about this in the web. I found a 
> deprecated solution in a mail archive, but it didn't work in my case. 
> First of all I want to ask if it is possible to restore a Job in my 
> Intellij IDE when executing a Job Locally (MiniCluster). I ask because I want 
> to experiment with some concepts about state management within IntelliJ, 
> without building a jar and deploying my job.
> 
> Deprecated Solution: 
> https://lists.apache.org/thread/q1nj5pvwy4t4t9q76z6gkqw64ydth1k0
> 
> Thank you in advance!
> 
> --
> Best Regards,
> Kostas
> Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und 
> beinhaltet unter Umständen vertrauliche Mitteilungen. Da die 
> Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden 
> kann, übernehmen wir keine Haftung für die Gewährung der 
> Vertraulichkeit und Unversehrtheit dieser Mitteilung. Bei irrtümlicher 
> Zustellung bitten wir Sie um Benachrichtigung per e-Mail und um Löschung 
> dieser Nachricht sowie eventueller Anhänge. Jegliche unberechtigte Verwendung 
> oder Verbreitung dieser Informationen ist streng verboten.
> 
> This message is intended only for the named recipient and may contain 
> confidential or privileged information. As the confidentiality of 
> email communication cannot be guaranteed, we do not accept any 
> responsibility for the confidentiality and the intactness of this message. If 
> you have received it in error, please advise the sender by return e-mail and 
> delete this message and any attachments.
> Any unauthorised use or dissemination of this information is strictly 
> prohibited.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Kubernetes killing TaskManager - Flink ignoring taskmanager.memory.process.size

2022-04-21 Thread Schwalbe Matthias
Hi Dan,

Assuming from previous mails that you are using RocksDb … this could have to do 
with the glibc bug [1][2] …
I’m never sure in which setting this is already been taken care of …
However your situation is very typical with glibc as allocator underneath 
RocksDb and giving more memory won’t help much.

Greetings

Thias



[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#switching-the-memory-allocator
[2] https://issues.apache.org/jira/browse/FLINK-19125

From: Yang Wang 
Sent: Thursday, April 21, 2022 9:19 AM
To: Dan Hill 
Cc: user 
Subject: Re: Kubernetes killing TaskManager - Flink ignoring 
taskmanager.memory.process.size

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Could you please configure a bigger memory to avoid OOM and use NMTracker[1] to 
figure out the memory usage categories?

[1]. 
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html

Best,
Yang

Dan Hill mailto:quietgol...@gmail.com>> 于2022年4月21日周四 
07:42写道:
Hi.

I upgraded to Flink v1.14.4 and now my Flink TaskManagers are being killed by 
Kubernetes for exceeding the requested memory.  My Flink TM is using an extra 
~5gb of memory over the tm.memory.process.size.

Here are the flink-config values that I'm using
taskmanager.memory.process.size: 25600mb
# The default, 256mb, is too small.
taskmanager.memory.jvm-metaspace.size: 320mb
taskmanager.memory.network.fraction: 0.2
taskmanager.memory.network.max: 2560m

I'm requesting 26112Mi in my Kubernetes config (so there's some buffer).

I re-read the Flink 
docs
 on setting memory.  This seems like it should be fine.  The diagrams and docs 
show that process.size is used.

If it helps, the TMs are failing in a round robin once every ~30 minutes or so. 
 This isn't an issue with Flink v1.12.3 but is an issue with Flink v1.14.4.

My text logs have a bunch of kafka connections in them.  I don't know if that's 
related to overallocating memory.


❯ kubectl -n flink-v1-14-4 get events

LAST SEEN   TYPE  REASONOBJECT  
MESSAGE

37m Warning   Evicted   pod/flink-taskmanager-3 The 
node was low on resource: memory. Container taskmanager was using 31457992Ki, 
which exceeds its request of 26112Mi.

37m NormalKilling   pod/flink-taskmanager-3 
Stopping container taskmanager

37m NormalScheduled pod/flink-taskmanager-3 
Successfully assigned hipcamp-prod-metrics-flink-v1-14-4/flink-taskmanager-3 to 
ip-10-12-104-15.ec2.internal

37m NormalPulledpod/flink-taskmanager-3 
Container image "flink:1.14.4" already present on machine

37m NormalCreated   pod/flink-taskmanager-3 
Created container taskmanager

37m NormalStarted   pod/flink-taskmanager-3 
Started container taskmanager

37m NormalSuccessfulCreate  statefulset/flink-taskmanager   
create Pod flink-taskmanager-3 in StatefulSet flink-taskmanager successful

37m Warning   RecreatingFailedPod   statefulset/flink-taskmanager   
StatefulSet hipcamp-prod-metrics-flink-v1-14-4/flink-taskmanager is recreating 
failed Pod flink-taskmanager-3

37m NormalSuccessfulDelete  statefulset/flink-taskmanager   
delete Pod flink-taskmanager-3 in StatefulSet flink-taskmanager successful
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Restore Job from CheckPoint in IntelliJ IDE - MiniCluster

2022-04-21 Thread Schwalbe Matthias
Hi Kostas,

Did you give setting execution.savepoint.path a try?

You can set the property on local environment by means of env.configure(...).
This work for me ... (didn't try yet on Flink 1.15)


Thias



[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-savepoint-path

-Original Message-
From: Κωνσταντίνος Αγαπίδης  
Sent: Wednesday, April 20, 2022 10:02 PM
To: user@flink.apache.org
Subject: Restore Job from CheckPoint in IntelliJ IDE - MiniCluster

*** EXTERNAL MESSAGE – CAUTION: Think Before You Click ***

Hi everyone,

I have a technical question about a problem I am dealing with in my Flink Jobs. 
I am using IntelliJ, Java 1.8 and Flink Version 1.15.0 Libraries and Maven 
Dependencies in my jobs.

I am trying to restore a job in Flink Minicluster Local Environment but I 
cannot find documentation about this in the web. I found a deprecated solution 
in a mail archive, but it didn't work in my case. First of all I want to ask if 
it is possible to restore a Job in my Intellij IDE when executing a Job Locally 
(MiniCluster). I ask because I want to experiment with some concepts about 
state management within IntelliJ, without building a jar and deploying my job.

Deprecated Solution: 
https://lists.apache.org/thread/q1nj5pvwy4t4t9q76z6gkqw64ydth1k0

Thank you in advance!

--
Best Regards,
Kostas
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Watermarks event time vs processing time

2022-03-29 Thread Schwalbe Matthias
Hello Hans-Peter,

I’m a little confused which version of your code you are testing against:

  *   ProcessingTimeSessionWindows or EventTimeSessionWindows?
  *   did you keep the withIdleness() ??

As said before:

  *   for ProcessingTimeSessionWindows, watermarks play no role
  *   if you keep withIdleness(), then the respective sparse DataStream is 
event-time-less most of the time, i.e. no triggers fire to close a session 
window
  *   withIdleness() makes only sense if you merge/union/connect multiple 
DataStream where at least one stream has their watermarks updated regularly 
(i.e. it is not withIdleness())
 *   this is not your case, your DAG is linear, no union nor connects
  *   in event-time mode processing time plays no role, watermarks exclusively 
take the role of the progress of model (event) time and hence the triggering of 
windows
  *   in order to trigger a (session-)window at time A the window operator 
needs to receive a watermark of at least time A
  *   next catch regards partitioning
 *   your first watermark strategy kafkaWmstrategy generates 
per-Kafka-partition watermarks
 *   a keyBy() reshuffles these partitions onto the number of subtasks 
according to the hash of the key
 *   this results in a per subtask calculation of the lowest watermark of 
all Kafka partitions that happen to be processed by that subtask
 *   i.e. if a single Kafka partition makes no watermark progress the 
subtask watermark makes no progress
 *   this surfaces in sparse data as in your case
  *   your second watermark strategy wmStrategy makes things worse because
 *   it discards the correct watermarks of the first watermark strategy
 *   and replaces it with something that is arbitrary (at this point it is 
hard to guess the correct max lateness that is a mixture of the events from 
multiple Kafka partitions)

Concusion:
The only way to make the event time session windows work for you in a timely 
manner is to make sure watermarks on all involved partitions make progress, 
i.e. new events arrive on all partitions in a regular manner.

Hope this helps

Thias


From: HG 
Sent: Tuesday, March 29, 2022 1:07 PM
To: Schwalbe Matthias 
Cc: user 
Subject: Re: Watermarks event time vs processing time

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hello Matthias,

When I remove all the watermark strategies it does not process anything .
For example when I use WatermarkStrategy.noWatermarks() instead of the one I 
build nothing seems to happen at all.

 Also when I skip the part where I add wmStrategy  to create tuple4dswm:
 DataStream> tuple4dswm = 
tuple4ds.assignTimestampsAndWatermarks(wmStrategy);

Nothing is processed.

Regards Hans-Peter

Op wo 16 mrt. 2022 om 15:52 schreef Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>:
Hi Hanspeter,

Let me relate some hints that might help you getting concepts clearer.

From your description I make following assumptions where your are not specific 
enough (please confirm or correct in your answer):

a.   You store incoming events in state per transaction_id to be 
sorted/aggregated(min/max time) by event time later on

b.   So far you used a session window to determine the point in time when 
to emit the stored/enriched/sorted events

c.Watermarks are generated with bounded out of orderness

d.   You use session windows with a specific gap

e.   In your experiment you ever only send 1000 events and then stop 
producing incoming events

Now to your questions:

  *   For processing time session windows, watermarks play no role whatsoever, 
you simply assume that you’ve seen all events belonging so a single transaction 
id if the last such event for a specific transaction id was processed 
sessionWindowGap milliseconds ago
  *   Therefore you see all enriched incoming events the latest 
sessionWindowGap ms after the last incoming event (+ some latency)
  *   In event time mode and resp event time session windows the situation is 
exactly the same, only that processing time play no role
  *   A watermark means (ideally) that no event older than the watermark time 
ever follows the watermark (which itself is a meta-event that flows with the 
proper events on the same channels)
  *   In order for a session gap window to forward the enriched events the 
window operator needs to receive a watermark that is sessionWindowGap 
milliseconds beyond the latest incoming event (in terms of the respective event 
time)
  *   The watermark generator in order to generate a new watermark that 
triggers this last session window above needs to encounter an (any) event that 
has a timestamp of ( + outOfOrderness + 
sessionWindowGap + 1ms)
  *   Remember, the watermark generator never generated watermarks based on 
processing time, but only based on the timestamps it has seen in events 
actually encountered
  *   Coming back to your idleness configuration: it only means that the 
incoming stream becomes idle ==

RE: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Schwalbe Matthias
Oops mistyped your name, Dan

From: Schwalbe Matthias
Sent: Freitag, 18. März 2022 09:02
To: 'Dan Hill' ; Dongwon Kim 
Cc: user 
Subject: RE: Weird Flink Kafka source watermark behavior

Hi San, Dongwon,

I share the opinion that when per-partition watermarking is enabled, you should 
observe correct behavior … would be interesting to see why it does not work for 
you.

I’d like to clear one tiny misconception here when you write:

>> - The same issue happens even if I use an idle watermark.

You would expect to see glitches with watermarking when you enable idleness.
Idleness sort of trades watermark correctness for reduces latency when 
processing timers (much simplified).
With idleness enabled you have no guaranties whatsoever as to the quality of 
watermarks (which might be ok in some cases).
BTW we dominantly use a mix of fast and slow sources (that only update once a 
day) which hand-pimped watermarking and late event processing, and enabling 
idleness would break everything.

Oversight put aside things should work the way you implemented it.

One thing I could imagine to be a cause is

  *   that over time the kafka partitions get reassigned  to different consumer 
subtasks which would probably stress correct recalculation of watermarks. Hence 
#partition == number subtask might reduce the problem
  *   can you enable logging of partition-consumer assignment, to see if that 
is the cause of the problem
  *   also involuntary restarts of the job can cause havoc as this resets 
watermarking

I’ll be off next week, unable to take part in the active discussion …

Sincere greetings

Thias




From: Dan Hill mailto:quietgol...@gmail.com>>
Sent: Freitag, 18. März 2022 08:23
To: Dongwon Kim mailto:eastcirc...@gmail.com>>
Cc: user mailto:user@flink.apache.org>>
Subject: Re: Weird Flink Kafka source watermark behavior

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


I'll try forcing # source tasks = # partitions tomorrow.

Thank you, Dongwon, for all of your help!

On Fri, Mar 18, 2022 at 12:20 AM Dongwon Kim 
mailto:eastcirc...@gmail.com>> wrote:
I believe your job with per-partition watermarking should be working okay even 
in a backfill scenario.

BTW, is the problem still observed even with # sour tasks = # partitions?

For committers:
Is there a way to confirm that per-partition watermarking is used in TM log?

On Fri, Mar 18, 2022 at 4:14 PM Dan Hill 
mailto:quietgol...@gmail.com>> wrote:
I hit this using event processing and no idleness detection.  The same issue 
happens if I enable idleness.

My code matches the code example for per-partition 
watermarking<https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-strategies-and-the-kafka-connector>.

On Fri, Mar 18, 2022 at 12:07 AM Dongwon Kim 
mailto:eastcirc...@gmail.com>> wrote:
Hi Dan,

I'm quite confused as you already use per-partition watermarking.

What I meant in the reply is
- If you don't use per-partition watermarking, # tasks < # partitions can cause 
the problem for backfill jobs.
- If you don't use per-partition watermarking, # tasks = # partitions is going 
to be okay even for backfill jobs.
- If you use per-partition watermarking, # tasks < # partitions shouldn't cause 
any problems unless you turn on the idleness detection.

Regarding the idleness detection which is based on processing time, what is 
your setting? If you set the value to 10 seconds for example, you'll face the 
same problem unless the watermark of your backfill job catches up real-time 
within 10 seconds. If you increase the value to 1 minute, your backfill job 
should catch up real-time within 1 minute.

Best,

Dongwon


On Fri, Mar 18, 2022 at 3:51 PM Dan Hill 
mailto:quietgol...@gmail.com>> wrote:
Thanks Dongwon!

Wow.  Yes, I'm using per-partition watermarking [1].  Yes, my # source tasks < 
# kafka partitions.  This should be called out in the docs or the bug should be 
fixed.

On Thu, Mar 17, 2022 at 10:54 PM Dongwon Kim 
mailto:eastcirc...@gmail.com>> wrote:
Hi Dan,

Do you use the per-partition watermarking explained in [1]?
I've also experienced a similar problem when running backfill jobs specifically 
when # source tasks < # kafka partitions.
- When # source tasks = # kafka partitions, the backfill job works as expected.
- When # source tasks < # kafka partitions, a Kafka consumer consumes multiple 
partitions. This case can destroying the per-partition patterns as explained in 
[2].

Hope this helps.

p.s. If you plan to use the per-partition watermarking, be aware that idleness 
detection [3] can cause another problem when you run a backfill job. Kafka 
source tasks in a backfill job seem to read a batch of records from Kafka and 
then wait for downstream tasks to catch up the progress, which can be counted 
as idleness.

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datast

RE: Weird Flink Kafka source watermark behavior

2022-03-18 Thread Schwalbe Matthias
Hi San, Dongwon,

I share the opinion that when per-partition watermarking is enabled, you should 
observe correct behavior … would be interesting to see why it does not work for 
you.

I’d like to clear one tiny misconception here when you write:

>> - The same issue happens even if I use an idle watermark.

You would expect to see glitches with watermarking when you enable idleness.
Idleness sort of trades watermark correctness for reduces latency when 
processing timers (much simplified).
With idleness enabled you have no guaranties whatsoever as to the quality of 
watermarks (which might be ok in some cases).
BTW we dominantly use a mix of fast and slow sources (that only update once a 
day) which hand-pimped watermarking and late event processing, and enabling 
idleness would break everything.

Oversight put aside things should work the way you implemented it.

One thing I could imagine to be a cause is

  *   that over time the kafka partitions get reassigned  to different consumer 
subtasks which would probably stress correct recalculation of watermarks. Hence 
#partition == number subtask might reduce the problem
  *   can you enable logging of partition-consumer assignment, to see if that 
is the cause of the problem
  *   also involuntary restarts of the job can cause havoc as this resets 
watermarking

I’ll be off next week, unable to take part in the active discussion …

Sincere greetings

Thias




From: Dan Hill 
Sent: Freitag, 18. März 2022 08:23
To: Dongwon Kim 
Cc: user 
Subject: Re: Weird Flink Kafka source watermark behavior

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


I'll try forcing # source tasks = # partitions tomorrow.

Thank you, Dongwon, for all of your help!

On Fri, Mar 18, 2022 at 12:20 AM Dongwon Kim 
mailto:eastcirc...@gmail.com>> wrote:
I believe your job with per-partition watermarking should be working okay even 
in a backfill scenario.

BTW, is the problem still observed even with # sour tasks = # partitions?

For committers:
Is there a way to confirm that per-partition watermarking is used in TM log?

On Fri, Mar 18, 2022 at 4:14 PM Dan Hill 
mailto:quietgol...@gmail.com>> wrote:
I hit this using event processing and no idleness detection.  The same issue 
happens if I enable idleness.

My code matches the code example for per-partition 
watermarking.

On Fri, Mar 18, 2022 at 12:07 AM Dongwon Kim 
mailto:eastcirc...@gmail.com>> wrote:
Hi Dan,

I'm quite confused as you already use per-partition watermarking.

What I meant in the reply is
- If you don't use per-partition watermarking, # tasks < # partitions can cause 
the problem for backfill jobs.
- If you don't use per-partition watermarking, # tasks = # partitions is going 
to be okay even for backfill jobs.
- If you use per-partition watermarking, # tasks < # partitions shouldn't cause 
any problems unless you turn on the idleness detection.

Regarding the idleness detection which is based on processing time, what is 
your setting? If you set the value to 10 seconds for example, you'll face the 
same problem unless the watermark of your backfill job catches up real-time 
within 10 seconds. If you increase the value to 1 minute, your backfill job 
should catch up real-time within 1 minute.

Best,

Dongwon


On Fri, Mar 18, 2022 at 3:51 PM Dan Hill 
mailto:quietgol...@gmail.com>> wrote:
Thanks Dongwon!

Wow.  Yes, I'm using per-partition watermarking [1].  Yes, my # source tasks < 
# kafka partitions.  This should be called out in the docs or the bug should be 
fixed.

On Thu, Mar 17, 2022 at 10:54 PM Dongwon Kim 
mailto:eastcirc...@gmail.com>> wrote:
Hi Dan,

Do you use the per-partition watermarking explained in [1]?
I've also experienced a similar problem when running backfill jobs specifically 
when # source tasks < # kafka partitions.
- When # source tasks = # kafka partitions, the backfill job works as expected.
- When # source tasks < # kafka partitions, a Kafka consumer consumes multiple 
partitions. This case can destroying the per-partition patterns as explained in 
[2].

Hope this helps.

p.s. If you plan to use the per-partition watermarking, be aware that idleness 
detection [3] can cause another problem when you run a backfill job. Kafka 
source tasks in a backfill job seem to read a batch of records from Kafka and 
then wait for downstream tasks to catch up the progress, which can be counted 
as idleness.

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#using-watermark-strategie
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-strategies-and-the-kafka-connector
[3] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources

Best,


RE: Watermarks event time vs processing time

2022-03-16 Thread Schwalbe Matthias
Hi Hanspeter,

Event time mode should work just the same … for your example below you your 
need only one single arbitrary event per kafka partition that has a timestamp > 
1646992800560 + sessionWindowGap + outOfOrderness in order for the session 
window to be triggered.

I’m not sure why processing time window does not work without watermarking 
configured (I never use processing time mode).
You need to consider what consistency guaranties you need in processing time 
mode: in case the job fails and is restarted (or if network i/o exhibits short 
hickups beyond your session gap), then you might get results that split a 
single transaction_id into multiple session windows …
The choice is yours 

As to the aggregation method: current event time – last event time … not 
min/max … otherwise not different 

If you want to find out why event time mode blocks you might find monitoring of 
the watermarks on single operators / per subtask useful:
Look for subtasks that don’t have watermarks, or too low watermarks for a 
specific session window to trigger.


Thias


From: HG 
Sent: Mittwoch, 16. März 2022 16:41
To: Schwalbe Matthias 
Cc: user 
Subject: Re: Watermarks event time vs processing time

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi Matthias and others

Thanks for the answer.
I will remove the Idleness.
However I am not doing max/min etc. Unfortunately most examples are about 
aggregations.

The inputs are like this
{"handling_time":1646992800260,"transaction_id":"017f6af1548e-119dfb",}
{"handling_time":1646992800290,"transaction_id":"017f6af1548e-119dfb",}
{"handling_time":1646992800360,"transaction_id":"017f6af1548e-119dfb",}
{"handling_time":1646992800560,"transaction_id":"017f6af1548e-119dfb",}
The output like this
{"handling_time":1646992800260,"transaction_id":"017f6af1548e-119dfb","elapse":0,}
{"handling_time":1646992800290,"transaction_id":"017f6af1548e-119dfb","elapse":30,}
{"handling_time":1646992800360,"transaction_id":"017f6af1548e-119dfb","elapse":70,}
{"handling_time":1646992800560,"transaction_id":"017f6af1548e-119dfb",,"elapse":200}

I started with handling_time as timestamp. But that did not workout well. I 
don't know why.
Then I switched to session processing time. Which is also OK because the 
outcomes of the elapsed time does not rely on the event time.

Then I thought 'let me remove the kafka watermark assigner.
But as soon as I did that no events would appear at the sink.
So I left both watermark timestamp assigners in place.
They do no harm it seems and leaving them out appears to do. It is not ideal 
but it works..
But I'd rather know the correct way how to set it up.

Regards Hans-Peter








Op wo 16 mrt. 2022 om 15:52 schreef Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>:
Hi Hanspeter,

Let me relate some hints that might help you getting concepts clearer.

From your description I make following assumptions where your are not specific 
enough (please confirm or correct in your answer):

a.   You store incoming events in state per transaction_id to be 
sorted/aggregated(min/max time) by event time later on

b.   So far you used a session window to determine the point in time when 
to emit the stored/enriched/sorted events

c.Watermarks are generated with bounded out of orderness

d.   You use session windows with a specific gap

e.   In your experiment you ever only send 1000 events and then stop 
producing incoming events

Now to your questions:

  *   For processing time session windows, watermarks play no role whatsoever, 
you simply assume that you’ve seen all events belonging so a single transaction 
id if the last such event for a specific transaction id was processed 
sessionWindowGap milliseconds ago
  *   Therefore you see all enriched incoming events the latest 
sessionWindowGap ms after the last incoming event (+ some latency)
  *   In event time mode and resp event time session windows the situation is 
exactly the same, only that processing time play no role
  *   A watermark means (ideally) that no event older than the watermark time 
ever follows the watermark (which itself is a meta-event that flows with the 
proper events on the same channels)
  *   In order for a session gap window to forward the enriched events the 
window operator needs to receive a watermark that is sessionWindowGap 
milliseconds beyond the latest incoming event (in terms of the respective event 
time)
  *   The watermark generator in order to generate a new watermark that 
triggers this last session window above needs to encounter an (any) event that 
has 

RE: Watermarks event time vs processing time

2022-03-16 Thread Schwalbe Matthias
Hi Hanspeter,

Let me relate some hints that might help you getting concepts clearer.

From your description I make following assumptions where your are not specific 
enough (please confirm or correct in your answer):

  1.  You store incoming events in state per transaction_id to be 
sorted/aggregated(min/max time) by event time later on
  2.  So far you used a session window to determine the point in time when to 
emit the stored/enriched/sorted events
  3.  Watermarks are generated with bounded out of orderness
  4.  You use session windows with a specific gap
  5.  In your experiment you ever only send 1000 events and then stop producing 
incoming events

Now to your questions:

  *   For processing time session windows, watermarks play no role whatsoever, 
you simply assume that you’ve seen all events belonging so a single transaction 
id if the last such event for a specific transaction id was processed 
sessionWindowGap milliseconds ago
  *   Therefore you see all enriched incoming events the latest 
sessionWindowGap ms after the last incoming event (+ some latency)
  *   In event time mode and resp event time session windows the situation is 
exactly the same, only that processing time play no role
  *   A watermark means (ideally) that no event older than the watermark time 
ever follows the watermark (which itself is a meta-event that flows with the 
proper events on the same channels)
  *   In order for a session gap window to forward the enriched events the 
window operator needs to receive a watermark that is sessionWindowGap 
milliseconds beyond the latest incoming event (in terms of the respective event 
time)
  *   The watermark generator in order to generate a new watermark that 
triggers this last session window above needs to encounter an (any) event that 
has a timestamp of ( + outOfOrderness + 
sessionWindowGap + 1ms)
  *   Remember, the watermark generator never generated watermarks based on 
processing time, but only based on the timestamps it has seen in events 
actually encountered
  *   Coming back to your idleness configuration: it only means that the 
incoming stream becomes idle == timeless after a while … i.e. watermarks won’t 
make progress from this steam, and it tells all downstream operators
  *   Idleness specification is only useful if a respective operator has 
another source of valid watermarks (i.e. after a union of two streams, one 
active/one idle ….). this is not your case

I hope this clarifies your situation.

Cheers


Thias


From: HG 
Sent: Mittwoch, 16. März 2022 10:06
To: user 
Subject: Watermarks event time vs processing time

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hi,

I read from a Kafka topic events that are in JSON format
These event contain a handling time (aka event time) in epoch milliseconds, a 
transaction_id and a large nested JSON structure.
I need to group the events by transaction_id, order them by handling time and 
calculate the differences in handling time.
The events are updated with this calculated elapsed time and pushed further.
So all events that go in should come out with the elapsed time added.

For testing I use events that are old (so handling time is not nearly the wall 
clock time)
Initially I used EventTimeSessionWindows but somehow the processing did not run 
as expected.
When I pushed 1000 events eventually 800 or so would appear at the output.
This was resolved by switching to ProcessingTimeSessionWindows .
My thought was then that I could remove the watermarkstrategies with watermarks 
with timestamp assigners (handling time) for the Kafka input stream and the 
data stream.
However this was not the case.

Can anyone enlighten me as to why the watermark strategies are still needed?

Below the code

KafkaSource source = KafkaSource.builder()
.setProperties(kafkaProps)
.setProperty("ssl.truststore.type", trustStoreType)
.setProperty("ssl.truststore.password", trustStorePassword)
.setProperty("ssl.truststore.location", trustStoreLocation)
.setProperty("security.protocol", securityProtocol)

.setProperty("partition.discovery.interval.ms",
 partitionDiscoveryIntervalMs)
.setProperty("commit.offsets.on.checkpoint", 
commitOffsetsOnCheckpoint)
.setGroupId(inputGroupId)
.setClientIdPrefix(clientId)
.setTopics(kafkaInputTopic)
.setDeserializer(KafkaRecordDeserializationSchema.of(new 
JSONKeyValueDeserializationSchema(fetchMetadata)))

.setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST))
.build();

/* A watermark is needed to prevent duplicates! */
WatermarkStrategy kafkaWmstrategy = WatermarkStrategy

.forBoundedOutOfOrderness(Duration.ofSeconds(Integer.parseInt(outOfOrderness)))

RE: Savepoint API challenged with large savepoints

2022-03-10 Thread Schwalbe Matthias
Thanks, Chesnay,

I just created the 3 tickets (in my clumsy way):

  *   FLINK-26584<https://issues.apache.org/jira/browse/FLINK-26584> : State 
Processor API fails to write savepoints exceeding 5MB
  *   FLINK-26585<https://issues.apache.org/jira/browse/FLINK-26585> : State 
Processor API: Loading a state set buffers the whole state set in memory before 
starting to process
  *   FLINK-26586<https://issues.apache.org/jira/browse/FLINK-26586> : 
FileSystem uses unbuffered read I/O

I’ll be off the week starting Jan  21, but otherwise ready to discuss matters

Thias




From: Chesnay Schepler 
Sent: Donnerstag, 10. März 2022 10:47
To: Schwalbe Matthias ; user@flink.apache.org
Subject: Re: Savepoint API challenged with large savepoints

That all sounds very interesting; I'd go ahead with creating tickets.

On 08/03/2022 13:43, Schwalbe Matthias wrote:
Dear Flink Team,

In the last weeks I was faced with a large savepoint (around 40GiB) that 
contained lots of obsolete data points and overwhelmed our infrastructure (i.e. 
failed to load/restart).
We could not afford to lose the state, hence I spent the time to transcode the 
savepoint into something smaller (ended up with 2.5 GiB).
During my efforts I encountered a couple of points that make savepoint API 
uneasy with larger savepoints, found simple solutions …

I would like to contribute my findings and ‘fixes’, however on my corporate 
infrastructure I cannot fork/build Flink locally nor PR the changes later on.

Before creating Jira tickets I wanted to quickly discuss the matter.

Findings:


  1.  (We are currently on Flink 1.13 (RocksDB state backend) but all findings 
apply as well to the latest version)
  2.  WritableSavepoint.write(…) falls back to JobManagerCheckpointStorage 
which restricts savepoint size to 5MiB

 *   See relevant exception stack here [1]
 *   This is because SavepointTaskManagerRuntimeInfo.getConfiguration() 
always returns empty Configuration, hence
 *   Neither “state.checkpoint-storage” nor “state.checkpoints.dir” are/can 
be configured
 *   ‘fix’: provide SavepointTaskManagerRuntimeInfo.getConfiguration() with 
a meaningful implementation and set configuration in 
SavepointEnvironment.getTaskManagerInfo()

  1.  When loading a state, MultiStateKeyIterator load and bufferes the whole 
state in memory before it event processes a single data point

 *   This is absolutely no problem for small state (hence the unit tests 
work fine)
 *   MultiStateKeyIterator ctor sets up a java Stream that iterates all 
state descriptors and flattens all datapoints contained within
 *   The java.util.stream.Stream#flatMap function causes the buffering of 
the whole data set when enumerated later on
 *   See call stack [2]

*   I our case this is 150e6 data points (> 1GiB just for the pointers 
to the data, let alone the data itself ~30GiB)

 *   I’m not aware of some instrumentation if Stream in order to avoid the 
problem, hence
 *   I coded an alternative implementation of MultiStateKeyIterator that 
avoids using java Stream,
 *   I can contribute our implementation (MultiStateKeyIteratorNoStreams)

  1.  I found out that, at least when using LocalFileSystem on a windows 
system, read I/O to load a savepoint is unbuffered,

 *   See example stack [3]
 *   i.e. in order to load only a long in a serializer, it needs to go into 
kernel mode 8 times and load the 8 bytes one by one
 *   I coded a BufferedFSDataInputStreamWrapper that allows to opt-in 
buffered reads on any FileSystem implementation
 *   In our setting savepoint load is now 30 times faster
 *   I’ve once seen a Jira ticket as to improve savepoint load time in 
general (lost the link unfortunately), maybe this approach can help with it
 *   not sure if HDFS has got the same problem
 *   I can contribute my implementation

Looking forward to your comments


Matthias (Thias) Schwalbe


[1] exception stack:
8215140 [MapPartition (bb312595cb5ccc27fd3b2c729bbb9136) (2/4)#0] ERROR 
BatchTask - Error in task code:  MapPartition 
(bb312595cb5ccc27fd3b2c729bbb9136) (2/4)
java.util.concurrent.ExecutionException: java.io.IOException: Size of the state 
is larger than the maximum permitted memory-backed state. Size=180075318 , 
maxSize=5242880 . Consider using a different state backend, like the File 
System State backend.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:636)
at 
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54)
at 
org.apache.flink.state.api.output.SnapshotUtils.snapshot(SnapshotUtils.java:67)
at 
org.apache.flink.state.api.output.operators.KeyedStateBootstrapOperator.endInput(KeyedStateBoot

RE: Could not stop job with a savepoint

2022-03-09 Thread Schwalbe Matthias
Hi Vinicius,

Your case, the taskmanager being actively killed by yarn was the other way this 
happened.

You are using RocksDBStateBackend, right?
Not being sure, I’ve got the strong suspicion that this has got to do with the 
glibc bug that is seemingly in the works.
There is some documentation here [1] and a solution that has been implemented 
for k8s containers [2] which replaces the glibc allocator with libjemalloc.so .

However we are not completely through with our encounter of the same problem.
Our intermediate solution is to reserve some unused extra memory, so the 
problem is delayed but not completely prevented (we restart our jobs daily by 
means of savepoint taking):

flink-conf.yaml:
…
taskmanager.memory.managed.fraction: 0.2
#reserve 2GB extra unused space (out of 8GB per TM) in order to mitigate the 
glibc memory leakage problem
taskmanager.memory.task.off-heap.size: 2048mb
…

I’m not sure if and starting with which Flink version libjemalloc.so is 
integrated by default into the flink runtime
… Flink team to the rescue !

Hope this helps

Thias

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/memory/mem_trouble/#container-memory-exceeded
[2] https://issues.apache.org/jira/browse/FLINK-19125

From: Vinicius Peracini 
Sent: Mittwoch, 9. März 2022 17:56
To: Schwalbe Matthias 
Cc: Dawid Wysakowicz ; user@flink.apache.org
Subject: Re: Could not stop job with a savepoint

So apparently the YARN container for Task Manager is running out of memory 
during the savepoint execution. Never had any problems with checkpoints though. 
Task Manager configuration:

"taskmanager.memory.process.size": "10240m",
"taskmanager.memory.managed.fraction": "0.6",
"taskmanager.memory.jvm-overhead.fraction": "0.07",
"taskmanager.memory.jvm-metaspace.size": "192mb",
"taskmanager.network.memory.buffer-debloat.enabled": "true",

On Wed, Mar 9, 2022 at 1:33 PM Vinicius Peracini 
mailto:vinicius.perac...@zenvia.com>> wrote:
Bom dia Schwalbe!

Thanks for the reply.

I'm using Flink 1.14.0. EMR is a managed cluster platform to run big data 
applications on AWS. This way Flink services are running on YARN. I tried to 
create another savepoint today and was able to retrieve the Job Manager log:

2022-03-09 15:42:10,294 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
   [] - Triggering savepoint for job 6f9d71e57efba96dad7f5328ab9ac717.
2022-03-09 15:42:10,298 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering 
checkpoint 1378 (type=SAVEPOINT) @ 1646840530294 for job 
6f9d71e57efba96dad7f5328ab9ac717.
2022-03-09 15:45:19,636 WARN  akka.remote.transport.netty.NettyTransport
   [] - Remote connection to 
[/172.30.0.169:57520<http://172.30.0.169:57520>] failed with 
java.io.IOException: Connection reset by peer
2022-03-09 15:45:19,648 WARN  akka.remote.ReliableDeliverySupervisor
   [] - Association with remote system 
[akka.tcp://flink@ip-172-30-0-169.ec2.internal:46639] has failed, address is 
now gated for [50] ms. Reason: [Disassociated]
2022-03-09 15:45:19,652 WARN  akka.remote.ReliableDeliverySupervisor
   [] - Association with remote system 
[akka.tcp://flink-metrics@ip-172-30-0-169.ec2.internal:41533] has failed, 
address is now gated for [50] ms. Reason: [Disassociated]
2022-03-09 15:45:19,707 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - 
LEFT_JOIN_MESSAGE_BULK -> Map (1/3) (866e32468227f9f0adac82e9b83b970a) switched 
from RUNNING to FAILED on container_1646341714746_0005_01_04 @ 
ip-172-30-0-165.ec2.internal (dataPort=40231).
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Connection unexpectedly closed by remote task manager 
'ip-172-30-0-169.ec2.internal/172.30.0.169:34413<http://172.30.0.169:34413>'. 
This might indicate that the remote task manager was lost.
at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:186)
 ~[flink-dist_2.12-1.14.0.jar:1.14.0]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
 ~[flink-dist_2.12-1.14.0.jar:1.14.0]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
 ~[flink-dist_2.12-1.14.0.jar:1.14.0]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
 ~[flink-dist_2.12-1.14.0.jar:1.14.0]
at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
 ~[flink-dist_2.12-1.14.0.jar:1.14.0]
at 
org.apache.flink.runtime.io.network.netty.N

RE: Incremental checkpointing & RocksDB Serialization

2022-03-08 Thread Schwalbe Matthias
Hi Vidya,

As to the choice of serializer:

  *   Flink provides two implementations that support state migration, AVRO 
serializer, and Pojo serializer
  *   Pojo serializer happens to be one of the fastest available serializers 
(faster than AVRO)
  *   If your record sticks to Pojo coding rules it is probably a good choice, 
no extra serializer coding needed
  *   See here [1]

As to the extra big incremental checkpoints at the end of a time window:

  *   This is quite plausible,
  *   windowing uses the ‘namespace’ subkey of keyed state
  *   ideally incremental checkpoints only store changes made since the last 
checkpoint, and
  *   on a window change many window instances (i.e. one per key and time 
interval) disappear and are eventually recreated for the next time interval, 
hence the bigger checkpoint
  *   serialization efforts depend on the choice of state backend:
 *   RocksDBStateBackend dominantly uses serializers when reading and 
writing state but to a lesser extend for checkpoints
 *   FsStateBackend does not use serializers when reading and writing state 
but dominantly during checkpoints


In order to improve your situation you need to take a closer look into

  *   The numbers (how many keys, how many active window instances 
(globally/per key), how many events are collected per window instance)
  *   The specific implementation of the rollup/aggregation function
 *   There are setups that store all events and iterate whenever a window 
result is needed (triggered)
 *   Other setups pre-aggregate incoming events and summarize only when a 
window result is needed (triggered)
 *   This choice makes a big difference when it comes to state size

Hope this helps … feel free to get back with further questions 


Thias



[1] 
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html#pojoserializer

From: Vidya Sagar Mula 
Sent: Dienstag, 8. März 2022 02:44
To: Yun Tang 
Cc: user 
Subject: Re: Incremental checkpointing & RocksDB Serialization

Hi Yun,

Thank you for the response.


1.  You could tune your job to avoid backpressure. Maybe you can upgrade 
your flink engine to at least flink-1.13 to know how to monitor the back 
pressure status [1].
[VIDYA] - In the view of my organization, it's a very big activity to upgrade 
to Flink version from our current one(1.11). I need to continue for my dev 
activity with 1.11 only.
1.  You can refer to [2] to know how to custom your serializer.
[VIDYA] - Thanks for providing me with the link references for custom 
serializer. I am wondering, how is the serialization part in the incremental 
checkpointing is different from Full checkpointing. My pipeline logic is same 
for both Full checkpoint and Incremental checkpoint, except the checkpoint.type 
variable change and some other env variables. But, the code pipeline logic 
should be same for both types of checkpoints.

- Full checkpoint of pipeline is not taking considerably long time when 
compared to incremental checkpointing at the end of the window. I see the 
backpressure is High and CPU utilization is high with incremental 
checkpointing. Thread dump shows the stack related to serialization. How is the 
serialization part different between full checkpointing vs Incremental 
checkpointing? I know, RocksDB library has some serializers for Incremental.

- While I am not writing custom serializer for my pipeline in case of Full 
checkpointing, is it the general pattern to implement custom serializer in case 
of Incremental?

- With respect with serializers for Full vs Incremental checkpointing, What's 
the general usage pattern across the Flink community? If I write custom 
serializer for Incremental, how does it go with Full checkpointing.

Please clarify.

Thanks,
Vidya.




[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/ops/monitoring/back_pressure/
[2] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/fault-tolerance/custom_serialization/

On Sun, Mar 6, 2022 at 12:11 AM Yun Tang 
mailto:myas...@live.com>> wrote:
Hi Vidya,


  1.  You could tune your job to avoid backpressure. Maybe you can upgrade your 
flink engine to at least flink-1.13 to know how to monitor the back pressure 
status [1]
  2.  You can refer to [2] to know how to custom your serializer.


[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/ops/monitoring/back_pressure/
[2] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/fault-tolerance/custom_serialization/

Best,
Yun Tang

From: Vidya Sagar Mula mailto:mulasa...@gmail.com>>
Sent: Sunday, March 6, 2022 4:16
To: Yun Tang mailto:myas...@live.com>>
Cc: user mailto:user@flink.apache.org>>
Subject: Re: Incremental checkpointing & RocksDB Serialization

Hi Yun Tang,
Thank you for the reply. I have follow up questions and need some more details. 
Can you please clarify my inline questions?

> Why is 

Savepoint API challenged with large savepoints

2022-03-08 Thread Schwalbe Matthias
Dear Flink Team,

In the last weeks I was faced with a large savepoint (around 40GiB) that 
contained lots of obsolete data points and overwhelmed our infrastructure (i.e. 
failed to load/restart).
We could not afford to lose the state, hence I spent the time to transcode the 
savepoint into something smaller (ended up with 2.5 GiB).
During my efforts I encountered a couple of points that make savepoint API 
uneasy with larger savepoints, found simple solutions ...

I would like to contribute my findings and 'fixes', however on my corporate 
infrastructure I cannot fork/build Flink locally nor PR the changes later on.

Before creating Jira tickets I wanted to quickly discuss the matter.

Findings:


  *   (We are currently on Flink 1.13 (RocksDB state backend) but all findings 
apply as well to the latest version)
  *   WritableSavepoint.write(...) falls back to JobManagerCheckpointStorage 
which restricts savepoint size to 5MiB
 *   See relevant exception stack here [1]
 *   This is because SavepointTaskManagerRuntimeInfo.getConfiguration() 
always returns empty Configuration, hence
 *   Neither "state.checkpoint-storage" nor "state.checkpoints.dir" are/can 
be configured
 *   'fix': provide SavepointTaskManagerRuntimeInfo.getConfiguration() with 
a meaningful implementation and set configuration in 
SavepointEnvironment.getTaskManagerInfo()
  *   When loading a state, MultiStateKeyIterator load and bufferes the whole 
state in memory before it event processes a single data point
 *   This is absolutely no problem for small state (hence the unit tests 
work fine)
 *   MultiStateKeyIterator ctor sets up a java Stream that iterates all 
state descriptors and flattens all datapoints contained within
 *   The java.util.stream.Stream#flatMap function causes the buffering of 
the whole data set when enumerated later on
 *   See call stack [2]
*   I our case this is 150e6 data points (> 1GiB just for the pointers 
to the data, let alone the data itself ~30GiB)
 *   I'm not aware of some instrumentation if Stream in order to avoid the 
problem, hence
 *   I coded an alternative implementation of MultiStateKeyIterator that 
avoids using java Stream,
 *   I can contribute our implementation (MultiStateKeyIteratorNoStreams)
  *   I found out that, at least when using LocalFileSystem on a windows 
system, read I/O to load a savepoint is unbuffered,
 *   See example stack [3]
 *   i.e. in order to load only a long in a serializer, it needs to go into 
kernel mode 8 times and load the 8 bytes one by one
 *   I coded a BufferedFSDataInputStreamWrapper that allows to opt-in 
buffered reads on any FileSystem implementation
 *   In our setting savepoint load is now 30 times faster
 *   I've once seen a Jira ticket as to improve savepoint load time in 
general (lost the link unfortunately), maybe this approach can help with it
 *   not sure if HDFS has got the same problem
 *   I can contribute my implementation

Looking forward to your comments


Matthias (Thias) Schwalbe


[1] exception stack:
8215140 [MapPartition (bb312595cb5ccc27fd3b2c729bbb9136) (2/4)#0] ERROR 
BatchTask - Error in task code:  MapPartition 
(bb312595cb5ccc27fd3b2c729bbb9136) (2/4)
java.util.concurrent.ExecutionException: java.io.IOException: Size of the state 
is larger than the maximum permitted memory-backed state. Size=180075318 , 
maxSize=5242880 . Consider using a different state backend, like the File 
System State backend.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:636)
at 
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54)
at 
org.apache.flink.state.api.output.SnapshotUtils.snapshot(SnapshotUtils.java:67)
at 
org.apache.flink.state.api.output.operators.KeyedStateBootstrapOperator.endInput(KeyedStateBootstrapOperator.java:90)
at 
org.apache.flink.state.api.output.BoundedStreamTask.processInput(BoundedStreamTask.java:107)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:661)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
at 
org.apache.flink.state.api.output.BoundedOneInputStreamTaskRunner.mapPartition(BoundedOneInputStreamTaskRunner.java:80)
at 
org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:107)
at 
org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:514)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:357)
at 

RE: Could not stop job with a savepoint

2022-03-07 Thread Schwalbe Matthias
Bom Dia Vinicius,

Can You still find (and post) the exception stack from your jobmanager log, the 
flink client log does not reveal enough information.
Your situation reminds me of something similar I had.
In the log you might find something like this or similar:

2022-03-07 02:15:41,347 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
   [] - Triggering stop-with-savepoint for job 
e12f22653f79194863ab426312dd666a.
2022-03-07 02:15:41,380 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering 
checkpoint 4983974 (type=SAVEPOINT_SUSPEND) @ 1646615741347 for job 
e12f22653f79194863ab426312dd666a.
2022-03-07 02:15:43,042 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Decline 
checkpoint 4983974 by task 0e659ac720e3e0b3e4072dbc1cc85cd3 of job 
e12f22653f79194863ab426312dd666a at 
container_e1093_1646358077201_0002_01_01 @ ulxxphaddtn02.adgr.net 
(dataPort=44767).
org.apache.flink.util.SerializedThrowable: Asynchronous task checkpoint failed.
at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:279)
 ~[flink-dist_2.11-1.13.0.jar:1.13.0]

BTW what Flink version are you running?
What is EMR (what technology underneath).



From: Vinicius Peracini 
Sent: Montag, 7. März 2022 20:46
To: Dawid Wysakowicz 
Cc: user@flink.apache.org
Subject: Re: Could not stop job with a savepoint

Hi Dawid, thanks for the reply.

The job was still in progress and producing events. Unfortunately I was not 
able to stop the job with a savepoint or to just create a savepoint. I had to 
stop the job without the savepoint and restore the state using the last 
checkpoint. Still reviewing my configuration and trying to figure out why this 
is happening. Any help would be appreciated.

Thanks!


On Mon, Mar 7, 2022 at 11:56 AM Dawid Wysakowicz 
mailto:dwysakow...@apache.org>> wrote:

Hi,

From the exception it seems the job has been already done when you're 
triggering the savepoint.

Best,

Dawid
On 07/03/2022 14:56, Vinicius Peracini wrote:
Hello everyone,

I have a Flink job (version 1.14.0 running on EMR) and I'm having this issue 
while trying to stop a job with a savepoint on S3:

org.apache.flink.util.FlinkException: Could not stop with a savepoint job 
"df3a3c590fabac737a17f1160c21094c".
at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:581)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:569)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1069)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at 
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint Coordinator 
is suspending.
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:579)
... 9 more

I'm using incremental and unaligned checkpoints (aligned checkpoint timeout is 
30 seconds). I also tried to create the savepoint without stopping the job 
(using flink savepoint command) and got the same error. Any idea what is 
happening here?

Thanks in advance,

Aviso Legal: Este documento pode conter informações confidenciais e/ou 
privilegiadas. Se você não for o destinatário ou a pessoa autorizada a receber 
este documento, não deve usar, copiar ou divulgar as informações nele contidas 
ou tomar qualquer ação baseada nessas informações.

Disclaimer: The information contained in this document may be privileged and 
confidential and protected from disclosure. If the reader of this document is 
not the intended recipient, or an employee agent responsible for delivering 
this document to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited.

Aviso Legal: Este documento pode conter informações confidenciais e/ou 
privilegiadas. Se você não for o destinatário ou a pessoa autorizada a receber 
este documento, não deve usar, copiar ou divulgar as informações nele contidas 
ou tomar qualquer ação baseada nessas informações.

Disclaimer: The information contained in this document may be privileged and 
confidential and protected from disclosure. If the reader of this document is 
not the intended recipient, or an 

RE: MapState.entries()

2022-03-07 Thread Schwalbe Matthias
Hi Alexey,

To my best knowledge it's lazy with RocksDBStateBackend, using the Java 
iterator you could even modify the map (e.g. remove()).

Cheers

Thias


From: Alexey Trenikhun 
Sent: Dienstag, 8. März 2022 06:11
To: Flink User Mail List 
Subject: MapState.entries()

Hello,
We are using RocksDBStateBackend, is MapState.entries() call in this case 
"lazy" - deserializes single entry while next(), or MapState.entries() returns 
collection, which is fully loaded into memory?


Thanks,
Alexey
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: processwindowfunction output Iterator

2022-03-01 Thread Schwalbe Matthias
Goedemorgen Hans,

You can call the out.collect(…) multiple times, i.e. for each forwarded event … 
how about this 

Thias


From: HG 
Sent: Montag, 28. Februar 2022 16:25
To: user 
Subject: processwindowfunction output Iterator

Hi,


Can processwindowfunction output an Iterator?
I need to sort and subtract timestamps from keyed events and then output them 
all with added elapsed times.

Regards Hans
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Trouble sinking to Kafka

2022-02-23 Thread Schwalbe Matthias
Good morning Marco,


Your fix is pretty plausible:

  *   Kafka transactions get started at the beginning of a checkpoint period 
and contain all events collected through this period,
  *   At the end of the checkpoint period the associated transaction is 
committed and concurrently the transaction of the next checkpoint period is 
started
  *   In your case (checkpoint period + minimum distance) always last at least 
1 minute and hence the transaction timeouts
Kafka transactions work a little different to traditional RDBMS transactions:

  *   They are basically a pointer offset in each kafka partition that marks a 
range of pending events to be committed
  *   A kafka reader can either read-uncommitted, and sees these uncommitted 
events immediately or
  *   If in read-committed mode: needs to wait for the committed record offset 
(per partition) to advance
  *   If transactions don’t commit, such reader effectively gets halted
  *   Kafka transaction timeout is a means to prevent consumers to get blocked 
for too long if one of the producers fails to commit (e.g. crashed)

Your fix to increase kafka transaction timeout is sound in this respect.

Documentation on the kafka page is very detailed …

Open questions? … get back to the community 

Cheers

Thias


From: Marco Villalobos 
Sent: Mittwoch, 23. Februar 2022 19:11
To: Nicolaus Weidner 
Cc: user 
Subject: Re: Trouble sinking to Kafka

I fixed this, but I'm not 100% sure why.

Here is my theory:

My checkpoint interval is one minute, and the minimum pause interval is also 
one minute. My transaction timeout time is also one minute. I think the 
checkpoint causes Flink to hold the transaction open for one minute, and thus 
it times out.

After I changed the 
transaction.max.timeout.ms to one hour, and 
the transaction.timeout.ms to five minutes, it 
all worked like a charm.

Is my theory correct?

The documentation kind of suggestion this is the cause:  
https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/connectors/kafka.html

However, I think the documentation could benefit with a few examples and 
scenarios that can ill-considered configurations.

Thank you.

On Wed, Feb 23, 2022 at 9:29 AM Nicolaus Weidner 
mailto:nicolaus.weid...@ververica.com>> wrote:
Hi Marco,

I'm no expert on the Kafka producer, but I will try to help. [1] seems to have 
a decent explanation of possible error causes for the error you encountered.
Which leads me to two questions:


if (druidProducerTransactionMaxTimeoutMs > 0) {

  
properties.setProperty("transaction.max.timeout.ms",
 Integer.toString(druidProducerTransactionMaxTimeoutMs));
   }
   if (druidProducerTransactionTimeoutMs > 0) {
  
properties.setProperty("transaction.timeout.ms", 
Integer.toString(druidProducerTransactionTimeoutMs));
   }

Have you tried increasing the timeout settings, to see if transactions timed 
out?


   properties.setProperty("transactional.id", 
"local.druid");

Do you use multiple producers (parallelism > 1)? It seems you always set the 
same transactional.id, which I expect causes problems 
when you have multiple producer instances (see "zombie fencing" in [2]). In 
that case, just make sure they are unique.

And one additional question: Does the error occur consistently, or only 
occasionally?

Best,
Nico

[1] 
https://stackoverflow.com/questions/53058715/what-is-reason-for-getting-producerfencedexception-during-producer-send
[2] https://stackoverflow.com/a/52304789
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Basic questions about resuming stateful Flink jobs

2022-02-17 Thread Schwalbe Matthias
Hi James,

Coming back to your original question on how to restart jobs from 
savepoints/checkpoints on LocalStreamEnvironment (the one used in a debugger):

Out of the box LocalStreamEnvironment does not allow setting a snapshot path to 
resume the job from.
The trick for me to do it anyway was to remodel the execute method and add a 
call to

jobGraph.setSavepointRestoreSettings(SavepointRestoreSettings.forPath(fromSavepoint,
 true))

(fromSavepoint being the savepointPath)

This is somewhat ugly but works (only ever used in debugger session, not in 
prod code).

The remodeled execute method look like this (for Flink 1.13.0, and should be 
similar for other releases): [1]


Feel free to get back with additional questions 

Thias

[1] remodeled execute(…) (scala):

  def execute(jobName: String): JobExecutionResult = {

if (fromSavepoint != null && 
env.streamEnv.getJavaEnv.isInstanceOf[LocalStreamEnvironment]) {
  // transform the streaming program into a JobGraph
  val locEnv = env.streamEnv.getJavaEnv.asInstanceOf[LocalStreamEnvironment]
  val streamGraph = locEnv.getStreamGraph
  streamGraph.setJobName(jobName)

  val jobGraph = streamGraph.getJobGraph()
  jobGraph.setAllowQueuedScheduling(true)

  
jobGraph.setSavepointRestoreSettings(SavepointRestoreSettings.forPath(fromSavepoint,
 true))

  val configuration = new org.apache.flink.configuration.Configuration
  configuration.addAll(jobGraph.getJobConfiguration)
  configuration.setString(TaskManagerOptions.MANAGED_MEMORY_SIZE, "0")

  // add (and override) the settings with what the user defined
  val cls = classOf[LocalStreamEnvironment]
  val cfgField = cls.getDeclaredField("configuration")
  cfgField.setAccessible(true)
  val cofg = 
cfgField.get(locEnv).asInstanceOf[org.apache.flink.configuration.Configuration]
  configuration.addAll(cofg)


  if (!configuration.contains(RestOptions.BIND_PORT)) 
configuration.setString(RestOptions.BIND_PORT, "0")

  val numSlotsPerTaskManager = 
configuration.getInteger(TaskManagerOptions.NUM_TASK_SLOTS, 
jobGraph.getMaximumParallelism)

  val cfg = new 
MiniClusterConfiguration.Builder().setConfiguration(configuration).setNumSlotsPerTaskManager(numSlotsPerTaskManager).build

  val miniCluster = new MiniCluster(cfg)

  try {
miniCluster.start()
configuration.setInteger(RestOptions.PORT, 
miniCluster.getRestAddress.get.getPort)
return miniCluster.executeJobBlocking(jobGraph)
  } finally {
//transformations.clear
miniCluster.close()
  }

} else {
 throw new 
InvalidParameterException("flink.stream-environment.from-savepoint may only be 
used for local debug execution")
}
  }






From: Piotr Nowojski 
Sent: Donnerstag, 17. Februar 2022 09:23
To: Cristian Constantinescu 
Cc: Sandys-Lumsdaine, James ; James 
Sandys-Lumsdaine ; user@flink.apache.org
Subject: Re: Basic questions about resuming stateful Flink jobs

Hi James,

> Do I copy the checkpoint into a savepoint directory and treat it like a 
> savepoint?

You don't need to copy the checkpoint. Actually you can not do that, as 
checkpoints are not relocatable. But you can point to the checkpoint directory 
and resume from it like you would from a savepoint.

Regarding the testing, I would suggest taking a look at the docs [1] and 
MiniClusterWithClientResource in particular. If you are using it, you can 
access the cluster client (MiniClusterWithClientResource#getClusterClient) and 
this client should be an equivalent of the CLI/Rest API. You can also use it to 
recover from savepoints - check for `setSavepointRestoreSettings` usage in [2].

But the real question would be why do you want to do it? You might not 
necessarily need to test for recovery at this level. From a user code 
perspective, it doesn't matter if you use checkpoint/savepoint, where it's 
stored. IMO what you want to do is to have:

1. Proper unit tests using TestHarness(es)

Again, take a look at [1]. You can setup unit tests, process some records, 
carefully control timers, then call 
`AbstractStreamOperatorTestHarness#snapshot` to take snapshot and 
`AbstractStreamOperatorTestHarness#initializeState` to test the recovery code 
path. For examples you can take a look at usages of those methods in the Flink 
code base. For example [3].

2. Later, I would recommend complementing such unit tests with some end-to-end 
tests, that would make sure everything is integrated properly, that your 
cluster is configured correctly etc. Then you don't need to use MiniCluster, as 
you can simply use Rest API/CLI. But crucially you don't need to be so thorough 
with covering all of the cases on this level, especially the failure handling, 
as you can rely more on the unit tests. Having said that, you might want to 
have a test that kills/restarts one TM on an end-to-end level.

Best,
Piotrek

[1] 

RE: Memory issues with Rocksdb ColumnFamilyOptions

2022-02-04 Thread Schwalbe Matthias
Hi Natalie,

I happen to currently work on a similar problem: I’ve got a savepoint of about 
40Gb just for one of the operator states, 70 Million keys.
With ExistingSavepoint there is currently a number of problems:

  *   When reading from a local copy of the savepoint, non-buffered I/O is used 
and that takes forever
 *   I created a façade buffered LocalFileSystem replacing the original (30 
times faster) (PoC)
  *   Once loaded the keyed state and the timers into fresh RocksDB, it gets 
enumerated key by key, however
 *   MultiStateKeyIterator uses Streams, and unfortunately the 
implementation has to load the complete key set into a buffer before it even 
starts to return a single key
 *   I am also working on a (PoC) replacement for MultiStateKeyIterator 
that uses a traditional implementation without excessive buffering
 *   There are successful unit test that test the original implementation 
(RocksDBRocksStateKeysIteratorTest), but only with little data. Hence the 
problem does not surface in the test
  *   I plan to file a number of tickets on Jira once I’m confident I can come 
up with a good recommendation


Could the state size also be your problem? … What are your numbers?

I hope this helps or starts a discussion …

Sincere greetings

Thias



From: Natalie Dunn 
Sent: Dienstag, 1. Februar 2022 17:16
To: user@flink.apache.org
Subject: Memory issues with Rocksdb ColumnFamilyOptions

Hi All,

I am working on trying to process a Savepoint in order to produce basic 
statistics on it for monitoring. I’m running into an issue where processing a 
large Savepoint is running out of memory before I can process the Savepoint 
completely.

One thing I noticed in profiling the code is that there seems to be a lot of 
memory given to the  RocksDB ColumnFamilyOptions class because it is producing 
a lot of Java.lang.ref.Finalizer objects that don’t seem to be garbage 
collected.

I see in the RocksDB code that these should be closed but it doesn’t seem like 
they are being closed. 
https://github.com/facebook/rocksdb/blob/f57745814f2d9e937383b4bfea55373306182c14/java/src/main/java/org/rocksdb/AbstractNativeReference.java#L71

Is there a way to close these via the Flink API? Also, more generally, why am I 
seeing hundreds of thousands of these generated?

In case it’s helpful, here’s a genericized/simplified version of the code:


import org.apache.flink.api.java.ExecutionEnvironment;

import org.apache.flink.configuration.Configuration;

import org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend;

import org.apache.flink.state.api.ExistingSavepoint;
import org.apache.flink.state.api.Savepoint;

import org.apache.flink.state.api.functions.KeyedStateReaderFunction;

import org.apache.flink.api.java.operators.DataSource;

import org.apache.flink.api.java.operators.MapOperator;

import org.apache.flink.api.common.functions.RichReduceFunction;

Configuration config = new Configuration();
config.setInteger("state.backend.rocksdb.files.open", 2);
ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment(config);
env.getConfig().enableObjectReuse();

EmbeddedRocksDBStateBackend stateBackend = new EmbeddedRocksDBStateBackend();

final EmbeddedRocksDBStateBackend configuredRocksDBStateBackend =
stateBackend.configure(
config, Thread.currentThread().getContextClassLoader());
// The below function just downloads the savepoint from our cloud storage and 
runs Savepoint.load()
ExistingSavepoint savepoint = loadSavepoint(env, configuredRocksDBStateBackend, 
savepointPath);


// ReFunctionStateReader() is a KeyedStateReaderFunction and does basic 
processing in readKey

DataSource source = savepoint.readKeyedState("process-1", new 
FunctionStateReader());

final MapOperator sizes =
source
.map(s -> new Metrics(s.key, 
s.stateFields.values().stream().mapToInt(Integer::parseInt).sum(),
0, 0, 0, 0, 0, 0, 0))
.returns(TypeInformation.of(new TypeHint<>() {
}));

// MetricsRed() below is a RichReduceFunction
DataSet stats = sizes.reduce(new MetricsRed());


If you spot anything wrong with this approach that would cause memory issues, 
please let me know, I  am not 100% sure that the specific issue/question above 
is the full cause of the memory issues that I have been having.

Thank you!
Natalie
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named 

RE: Regarding Queryable state in Flink

2022-01-25 Thread Schwalbe Matthias
Hi Jessy,

Have you considered using the state processor api [1] for offline analysis of 
checkpoints and savepoints?

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/libs/state_processor_api/

Sincere greetings

Thias


From: Jessy Ping 
Sent: Montag, 24. Januar 2022 16:47
To: user 
Subject: Regarding Queryable state in Flink

Hi Team,

We are currently running our streaming application based Flink(Datastream API ) 
on a non-prod cluster.And planning to move it to production cluster soon.. We 
are keeping cerating keyed state backed by rocksdb in the flink application. We 
need a mechanism to query these keyed state values for debugging and 
troubleshooting. Is it a good idea to use Queryable state for a single link-job 
running in application-mode on kubernetes for an average load of 10k 
events/second ?.
Or is it a better idea to keep these state values in an external k,v store ?.

So in short --> Is the queryable state stable enough to use in production 
systems ?


Thanks
Jessy
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: unaligned checkpoint for job with large start delay

2022-01-11 Thread Schwalbe Matthias
Hi Mason,

Since you are using RocksDB, you could enable this metric [1] 
state-backend-rocksdb-metrics-estimate-num-keys which gives (afaik) good 
indication of the number of active windows.
I’ve never seen (despite the warning) negative effect on the runtime.

Hope this help …

Thias




[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/config/#state-backend-rocksdb-metrics-estimate-num-keys

From: Mason Chen 
Sent: Dienstag, 11. Januar 2022 19:20
To: Piotr Nowojski 
Cc: Mason Chen ; user 
Subject: Re: unaligned checkpoint for job with large start delay

Hi Piotrek,

No worries—I hope you had a good break.

Counting how many windows have been registered/fired and plotting that over 
time.
It’s straightforward to count windows that are fired (the trigger exposes the 
run time context and we can collect the information in that code path). 
However, it’s not so clear how to count the windows that have been registered 
since the window assigner does not expose the run time context—is this even the 
right place to count? It’s not necessarily the case that an assignment results 
in a new window registered. Am I missing anything else relevant from the user 
facing interface perspective?

 Unfortunately at the moment I don't know how to implement such a metric 
without affecting performance on the critical path, so I don't see this 
happening soon :(
Perhaps, it can be an opt in feature? I do it see it being really useful since 
most users aren’t really familiar with windows and these metrics can help 
easily identify the common problem of too many windows firing.

The additional metrics certainly help in diagnosing some of the symptoms of the 
root problem.

Best,
Mason


On Jan 10, 2022, at 1:00 AM, Piotr Nowojski 
mailto:pnowoj...@apache.org>> wrote:

Hi Mason,

Sorry for a late reply, but I was OoO.

I think you could confirm it with more custom metrics. Counting how many 
windows have been registered/fired and plotting that over time.

I think it would be more helpful in this case to check how long a task has been 
blocked being "busy" processing for example timers. FLINK-25414 shows only 
blocked on being hard/soft backpressure. Unfortunately at the moment I don't 
know how to implement such a metric without affecting performance on the 
critical path, so I don't see this happening soon :(

Best,
Piotrek

wt., 4 sty 2022 o 18:02 Mason Chen 
mailto:mason.c...@apple.com>> napisał(a):
Hi Piotrek,


In other words, something (presumably a watermark) has fired more than 151 200 
windows at once, which is taking ~1h 10minutes to process and during this time 
the checkpoint can not make any progress. Is this number of triggered windows 
plausible in your scenario?

It seems plausible—there are potentially many keys (and many windows). Is there 
a way to confirm with metrics? We can add a window fire counter to the window 
operator that only gets incremented at the end of windows evaluation, in order 
to see the huge jumps in window fires. I can this benefiting other users who 
troubleshoot the problem of large number of window firing.

Best,
Mason


On Dec 29, 2021, at 2:56 AM, Piotr Nowojski 
mailto:pnowoj...@apache.org>> wrote:

Hi Mason,

> and it has to finish processing this output before checkpoint can begin—is 
> this right?

Yes. Checkpoint will be only executed once all triggered windows will be fully 
processed.

But from what you have posted it looks like all of that delay is coming from 
hundreds of thousands of windows firing all at the same time. Between 20:30 and 
~21:40 there must have been a bit more than 36 triggers/s * 60s/min * 70min = 
151 200triggers fired at once (or in a very short interval). In other words, 
something (presumably a watermark) has fired more than 151 200 windows at once, 
which is taking ~1h 10minutes to process and during this time the checkpoint 
can not make any progress. Is this number of triggered windows plausible in 
your scenario?

Best,
Piotrek


czw., 23 gru 2021 o 12:12 Mason Chen 
mailto:mason.c...@apple.com>> napisał(a):
Hi Piotr,

Thanks for the thorough response and the PR—will review later.

Clarifications:
1. The flat map you refer to produces at most 1 record.
2. The session window operator’s window process function emits at least 1 
record.
3. The 25 ms sleep is at the beginning of the window process function.

Your explanation about how records being bigger than the buffer size can cause 
blockage makes sense to me. However, my average record size is around 770 bytes 
coming out of the source and 960 bytes coming out of the window. Also, we don’t 
override the default `taskmanager.memory.segment-size`. My Flink job memory 
config is as follows:

```
taskmanager.memory.jvm-metaspace.size: 512 mb
taskmanager.memory.jvm-overhead.max: 2Gb
taskmanager.memory.jvm-overhead.min: 512Mb
taskmanager.memory.managed.fraction: '0.4'
taskmanager.memory.network.fraction: '0.2'
taskmanager.memory.network.max: 2Gb

RE: unexpected result of interval join when using sql

2021-12-15 Thread Schwalbe Matthias
Probably an oversight ... did you actually mean to publish your password? 
Better change it the sooner possible ...

Thias


From: cy 
Sent: Donnerstag, 16. Dezember 2021 06:55
To: user@flink.apache.org
Subject: unexpected result of interval join when using sql

Hi
Flink 1.14.0 Scala 2.12

I'm using flink sql interval join ability, here is my table schema and sql

create table `queue_3_ads_ccops_perf_o_ebs_volume_capacity` ( `dtEventTime` 
timestamp(3), `dtEventTimeStamp` bigint, `sourceid` string, `cluster_name` 
string, `poolname` string, `storage_poolname` string, `usage` decimal(10, 4), 
`provisioned_size` decimal(10, 4), `startat` timestamp(3), `endat` 
timestamp(3), `vrespool_id` int, `uuid` string, `version` string, `localTime` 
timestamp(3), `cluster_id` int, `extend1` string, `extend2` string, `extend3` 
string, `mon_ip` string, `bussiness_ip` string, `datasource` string, `thedate` 
int, `name` string, `used_size` int, watermark for `startat` as `startat` - 
interval '60' minutes ) with ( 'connector' = 'kafka', 'topic' = 
'queue_3_ads_ccops_perf_o_ebs_volume_capacity', 'format' = 'json', 
'scan.startup.mode' = 'earliest-offset', 'properties.bootstrap.servers' = 
'10.172.234.67:9092,10.172.234.68:9092,10.172.234.69:9092', 
'properties.group.id' = 'layer-vdisk', 'properties.security.protocol' = 
'SASL_PLAINTEXT', 'properties.sasl.mechanism' = 'SCRAM-SHA-512', 
'properties.sasl.jaas.config' = 
'org.apache.flink.kafka.shaded.org.apache.kafka.common.security.scram.ScramLoginModule
 required username="bkdata_admin" password="D41J48Cz3iwW7k6fFogX1A";' );

SELECT
source.sourceid AS sourceid,
cast(source.startat AS timestamp) AS source_startat,
cast(target.startat AS timestamp) AS target_startat,
source.used_size AS source_used_size,
target.used_size AS target_used_size,
source.usage AS source_usage,
target.usage AS target_usage
FROM queue_3_ads_ccops_perf_o_ebs_volume_capacity source, 
queue_3_ads_ccops_perf_o_ebs_volume_capacity target
WHERE source.sourceid = target.sourceid
AND source.sourceid in (
'volume-9dfed0d9-28b2-418a-9215-ce762ef80920',
'volume-9ece34f1-f4bb-475a-8e64-a2e37711b4fc',
'volume-9f0ec4cc-5cc4-49a8-b715-a91a25df3793',
'volume-9f38e0b3-2324-4505-a8ad-9b1ccb72181f',
'volume-9f3ec256-10fb-4d8b-a8cb-8498324cf309'
)
AND source.startat >= FLOOR(target.startat TO HOUR) + INTERVAL '1' HOUR AND 
source.startat < FLOOR(target.startat TO HOUR) + INTERVAL '2' HOUR;

and result
[cid:image001.png@01D7F251.9B4CE6A0]

I'm confused about first row that source_startat and target_startat was not 
matched the time condition.
Also I try to execute the sql below

SELECT TO_TIMESTAMP('2021-12-13 14:05:06') >= FLOOR(TO_TIMESTAMP('2021-12-13 
12:05:08') TO HOUR) + INTERVAL '1' HOUR AND TO_TIMESTAMP('2021-12-13 14:05:06') 
< FLOOR(TO_TIMESTAMP('2021-12-13 12:05:08') TO HOUR) + INTERVAL '2' HOUR;

the result false is correct.

So is anything wrong with flink sql interval join?

Need your help, thank you.





Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: CoGroupedStreams and disableAutoGeneratedUIDs

2021-12-12 Thread Schwalbe Matthias
Hi Dan,

When I run into such problem I consider using the not so @public api levels:

  *   First of all uids are especially needed for operator that hold state and 
is not so important for operators that don’t hold state primitives, not sure of 
the implications created by disableAutoGeneratedUIDs
  *   A DataStream is actually a Transformation[] assigned to the 
StreamEnvironment (see DataStream#getTransformation())
  *   You can assign name() and uid() directly to Transformations
  *   Transformations export their input transformation: 
Transformation@getInputs()
  *   This this ways you can locate the two Map transformations and assign uids
  *   However the two maps are stateless and technically don’t need a uid

What do you think?

Thias










From: Dan Hill 
Sent: Montag, 13. Dezember 2021 06:30
To: user 
Subject: CoGroupedStreams and disableAutoGeneratedUIDs

Hi.  I tried to use CoGroupedStreams w/ disableAutoGeneratedUIDs.  
CoGroupedStreams creates two map operators without the ability to set uids on 
them.  These appear as "Map" in my operator graph.  I noticed that the 
CoGroupedStreams.apply function has two map calls without setting uids.  If I 
try to run with disableAutoGeneratedUIDs, I get the following error 
"java.lang.IllegalStateException: Auto generated UIDs have been disabled but no 
UID or hash has been assigned to operator Map".

How can I fix this?  Extend the base CoGroupedStreams class?


```
public  DataStream apply(CoGroupFunction function, 
TypeInformation resultType) {
function = 
(CoGroupFunction)this.input1.getExecutionEnvironment().clean(function);
CoGroupedStreams.UnionTypeInfo unionType = new 
CoGroupedStreams.UnionTypeInfo(this.input1.getType(), this.input2.getType());
CoGroupedStreams.UnionKeySelector unionKeySelector = new 
CoGroupedStreams.UnionKeySelector(this.keySelector1, this.keySelector2);
DataStream> taggedInput1 = 
this.input1.map(new 
CoGroupedStreams.Input1Tagger()).setParallelism(this.input1.getParallelism()).returns(unionType);
DataStream> taggedInput2 = 
this.input2.map(new 
CoGroupedStreams.Input2Tagger()).setParallelism(this.input2.getParallelism()).returns(unionType);
DataStream> unionStream = 
taggedInput1.union(new DataStream[]{taggedInput2});
this.windowedStream = (new KeyedStream(unionStream, unionKeySelector, 
this.keyType)).window(this.windowAssigner);
if (this.trigger != null) {
this.windowedStream.trigger(this.trigger);
}

if (this.evictor != null) {
this.windowedStream.evictor(this.evictor);
}

if (this.allowedLateness != null) {
this.windowedStream.allowedLateness(this.allowedLateness);
}

return this.windowedStream.apply(new 
CoGroupedStreams.CoGroupWindowFunction(function), resultType);
}
```
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Any way to require .uid(...) calls?

2021-12-05 Thread Schwalbe Matthias
Hi Dan,

In case you also want to keep automatic UID assignment, we do something like 
this (scala):

override def run(args: ApplicationArguments): Unit = {
  require(jobName != null, "a specific jobName needs to be configured, if 
hosted in Spring Boot, configure 'flink.job.name' in application.yaml !")

  val graph = env.getStreamGraph(jobName, false)
  val nodes = graph.getStreamNodes
  var missingUid = false
  for(node: StreamNode <- nodes.asScala){
if(node.getTransformationUID == null){
  missingUid = true
  val message = s"Operator[${node.getId}: ${node.getOperatorName}]: Missing 
uid(...) for state migration]"
  println(message)
  if(forceOperatorUid){
if(logger.isErrorEnabled())logger.error(message)
  }
  else{
if(logger.isWarnEnabled())logger.warn(message)
  }
}
  }

  val exPlan = env.getExecutionPlan
  if(missingUid){
val message = s"job execution plan: \n$exPlan"
if(forceOperatorUid){
  if(logger.isErrorEnabled())logger.error(message)
}
else{
  if(logger.isWarnEnabled())logger.warn(message)
}
  }
  else {
if (logger.isInfoEnabled()) logger.info(s"job execution plan: \n$exPlan")
  }

  println
  println
  println("job execution plan:")
  println
  println(exPlan)
  println
  println

  if(forceOperatorUid){
require(!missingUid, s"Job not executed because of configuration parameter: 
flink.job.forceOperatorUid: $forceOperatorUid (for state migration)")
  }

  env.execute(jobName)
}


That also gives us a little more explicit diagnostics.

Hope this helps 

Thias



From: Dan Hill 
Sent: Montag, 6. Dezember 2021 05:03
To: Chesnay Schepler 
Cc: user 
Subject: Re: Any way to require .uid(...) calls?

Thanks!

On Sun, Dec 5, 2021 at 1:23 PM Chesnay Schepler 
mailto:ches...@apache.org>> wrote:
I'm not sure if there is a configuration option for doing so, but the 
generation of UIDs can be disabled via 
ExecutionConfig#disableAutoGeneratedUIDs, which would fail a job if not all 
operators have a UID.

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().disableAutoGeneratedUIDs();

On 05/12/2021 21:43, Dan Hill wrote:
Hi.

I want to make sure we keep consistent uids on my Flink operators.  Is there a 
way to require uids?  It's pretty easy to add operators and not have explicit 
uids on them.

Also, I noticed an issue (Flink v1.12.3) where a filter operator does not chain 
when it's between a ProcessFunction and a cogroup window operator.  I can't get 
a uid set on this map.  I've tried a few variations and haven't been able to 
chain it.



[cid:image002.png@01D7EA7C.94892860]


Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Windows and data loss.

2021-12-01 Thread Schwalbe Matthias
Hi John,

Sorry for the delay … I’m a little tight on spare time for user@flink currently.
If you are still interested we could pick up the discussion and continue.
However I’m don’t exactly understand what you want to achieve:

  1.  Would processing time windows be enough for you (and misplacement of 
events into the wrong window acceptable)?
  2.  Do you want to use event time windows, but cannot afford losing late 
events? (we can work out a scheme, that this would work)
  3.  How do you currently organize your input events in kafka?
 *   1 event per log row?
 *   Kafka-event timestamp extracted from/per the log row?
 *   You mentioned shuffling (random assignment) to kafka partition,

  i.Is this per log row, or is this 
per log file

 ii.Do you kafka-key by log file, 
or even by log application

 *   Do you select log files to be collected in file timestamp order
  1.  I assume your windows are keyed by application, or do you use another 
keyBy()?
  2.  What watermarking strategy did you configure?
 *   You mentioned that watermarks advance even if file-ingress is blocked
 *   Can you publish/share the 3 odd lines of code for your watermark 
strategy setup?

Just as said before, ignoring-late-events is a default strategy, that can be 
adjusted by means of a custom window trigger which trades off between latency, 
state size, correctness of the final results.

Thias

From: John Smith 
Sent: Freitag, 26. November 2021 17:17
To: Schwalbe Matthias 
Cc: Caizhi Weng ; user 
Subject: Re: Windows and data loss.

Or as an example we have a 5 minutes window and lateness of 5 minutes.

We have the following events in the logs
10:00:01 PM > Already pushed to Kafka
10:00:30 PM > Already pushed to Kafka
10:01:00 PM > Already pushed to Kafka
10:03:45 PM > Already pushed to Kafka
10:04:00 PM > Log agent crashed for 30 minutes not delivered to Kafla yet
10:05:10 PM > Pushed to Kafka cause I came from a log agent that isn't dead.

Flink window of 10:00:00
10:00:01 PM > Received
10:00:30 PM > Received
10:01:00 PM > Received
10:03:45 PM > Received
10:04:00 PM > Still nothing

Flink window of 10:00:00 5 lateness minutes are up.
10:00:01 PM > Counted
10:00:30 PM > Counted
10:01:00 PM > Counted
10:03:45 PM > Counted
10:04:00 PM > Still nothing

Flink window of 10:05:00 started
10:05:10 PM.> I'm new cause I came from a log agent that isn't dead.
10:04:00 PM > Still nothing

Flink window of 10:05:00 5 lateness minutes are up.
10:05:10 PM.> I have been counted, I'm happy!
10:04:00 PM > Still nothing

And so on...

Flink window of 10:30:00 started
10:04:00 PM > Hi guys, sorry I'm late 30 minutes, I ran into log agent 
problems. Sorry you are late, you missed the Flink bus.

On Fri, 26 Nov 2021 at 10:53, John Smith 
mailto:java.dev@gmail.com>> wrote:
Ok,

So processing time we get 100% accuracy because we don't care when the event 
comes, we just count and move along.
As for event time processing, what I meant to say is if for example if the log 
shipper is late at pushing events into Kafka, Flink will not notice this, the 
watermarks will keep watermarking. So given that, let's say we have a window of 
5 minutes and a lateness of 5 minutes, it means we will see counts on the 
"dashboard" every 10 minutes. But say the log shipper fails/falls behind for 30 
minutes or more, the Flink Kafka consumer will simply not see any events and it 
will continue chugging along, after 30 minutes a late event comes in at 2 
windows already too late, that event is discarded.

Or did I miss the point on the last part?


On Fri, 26 Nov 2021 at 09:38, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Actually not, because processing-time does not matter at all.
Event-time timers are always compared to watermark-time progress.
If system happens to be compromised for (say) 4 hours, also watermarks won’t 
progress, hence the windows get not evicted and wait for watermarks to pick up 
from when the system crashed.

Your watermark strategy can decide how strict you handle time progress:

  *   Super strict: the watermark time indicates that there will be no events 
with an older timestamp
  *   Semi strict: you accept late events and give a time-range when this can 
happen (still processing time put aside)

 *   You need to configure acceptable lateness in your windowing operator
 *   Accepted lateness implies higher overall latency

  *   Custom strategy

 *   Use a combination of accepted lateness and a custom trigger in your 
windowing operator
 *   The trigger decide when and how often window results are emitted
 *   The following operator would the probably implement some 
idempotence/updating scheme fo

RE: Windows and data loss.

2021-11-26 Thread Schwalbe Matthias
Actually not, because processing-time does not matter at all.
Event-time timers are always compared to watermark-time progress.
If system happens to be compromised for (say) 4 hours, also watermarks won’t 
progress, hence the windows get not evicted and wait for watermarks to pick up 
from when the system crashed.

Your watermark strategy can decide how strict you handle time progress:

  *   Super strict: the watermark time indicates that there will be no events 
with an older timestamp
  *   Semi strict: you accept late events and give a time-range when this can 
happen (still processing time put aside)
 *   You need to configure acceptable lateness in your windowing operator
 *   Accepted lateness implies higher overall latency
  *   Custom strategy
 *   Use a combination of accepted lateness and a custom trigger in your 
windowing operator
 *   The trigger decide when and how often window results are emitted
 *   The following operator would the probably implement some 
idempotence/updating scheme for the window values
 *   This way you get immediate low latency results and allow for later 
corrections if late events arrive

My favorite source on this is Tyler Akidau’s book [1] and the excerpt blog: [2] 
[3]
I believe his code uses Beam, but the same ideas can be implemented directly in 
Flink API

[1] https://www.oreilly.com/library/view/streaming-systems/9781491983867/
[2] https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/
[3] https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102/

… happy to discuss further 

Thias



From: John Smith 
Sent: Freitag, 26. November 2021 14:09
To: Schwalbe Matthias 
Cc: Caizhi Weng ; user 
Subject: Re: Windows and data loss.

But if we use event time, if a failure happens potentially those events can't 
be delivered in their windo they will be dropped if they come after the 
lateness and watermark settings no?


On Fri, 26 Nov 2021 at 02:35, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi John,

Going with processing time is perfectly sound if the results meet your 
requirements and you can easily live with events misplaced into the wrong time 
window.
This is also quite a bit cheaper resource-wise.
However you might want to keep in mind situations when things break down 
(network interrupt, datacenter flooded etc. ). With processing time events 
count into the time window when processed, with event time they count into the 
time window when originally created a the source … even if processed much later 
…

Thias



From: John Smith mailto:java.dev@gmail.com>>
Sent: Freitag, 26. November 2021 02:55
To: Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>>
Cc: Caizhi Weng mailto:tsreape...@gmail.com>>; user 
mailto:user@flink.apache.org>>
Subject: Re: Windows and data loss.

Well what I'm thinking for 100% accuracy no data loss just to base the count on 
processing time. So whatever arrives in that window is counted. If I get some 
events of the "current" window late and they go into another window it's ok.

My pipeline is like so

browser(user)->REST API-->log file-->Filebeat-->Kafka (18 
partitions)->flink->destination
Filebeat inserts into Kafka it's kindof a big bucket of "logs" which I use 
flink to filter the specific app and do the counts. The logs are round robin 
into the topic/partitions. Where I FORSEE a delay is Filebeat can't push fast 
enough into Kafka AND/OR the flink consumer has not read all events for that 
window from all partitions.

On Thu, 25 Nov 2021 at 11:28, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi John,

… just a short hint:
With datastream API you can

  *   hand-craft a trigger that decides when an how often emit intermediate, 
punctual and late window results, and when to evict the window and stop 
processing late events
  *   in order to process late event you also need to specify for how long you 
will extend the window processing (or is that done in the trigger … I don’t 
remember right know)
  *   overall window state grows, if you extend window processing to after it 
is finished …

Hope this helps 

Thias

From: Caizhi Weng mailto:tsreape...@gmail.com>>
Sent: Donnerstag, 25. November 2021 02:56
To: John Smith mailto:java.dev@gmail.com>>
Cc: user mailto:user@flink.apache.org>>
Subject: Re: Windows and data loss.

Hi!

Are you using the datastream API or the table / SQL API? I don't know if 
datastream API has this functionality, but in table / SQL API we have the 
following configurations [1].

  *   table.exec.emit.late-fire.enabled: Emit window results for late records;
  *   table.exec.emit.late-fire.delay: How often shall we emit results for late 
records (for example, once per 10 minutes or for every record).

[1] 
https://github.com/apache/flink/blob/601ef3b3bce040264daa3aedcb9d98ead8303

RE: Windows and data loss.

2021-11-25 Thread Schwalbe Matthias
Hi John,

Going with processing time is perfectly sound if the results meet your 
requirements and you can easily live with events misplaced into the wrong time 
window.
This is also quite a bit cheaper resource-wise.
However you might want to keep in mind situations when things break down 
(network interrupt, datacenter flooded etc. ). With processing time events 
count into the time window when processed, with event time they count into the 
time window when originally created a the source … even if processed much later 
…

Thias



From: John Smith 
Sent: Freitag, 26. November 2021 02:55
To: Schwalbe Matthias 
Cc: Caizhi Weng ; user 
Subject: Re: Windows and data loss.

Well what I'm thinking for 100% accuracy no data loss just to base the count on 
processing time. So whatever arrives in that window is counted. If I get some 
events of the "current" window late and they go into another window it's ok.

My pipeline is like so

browser(user)->REST API-->log file-->Filebeat-->Kafka (18 
partitions)->flink->destination
Filebeat inserts into Kafka it's kindof a big bucket of "logs" which I use 
flink to filter the specific app and do the counts. The logs are round robin 
into the topic/partitions. Where I FORSEE a delay is Filebeat can't push fast 
enough into Kafka AND/OR the flink consumer has not read all events for that 
window from all partitions.

On Thu, 25 Nov 2021 at 11:28, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi John,

… just a short hint:
With datastream API you can

  *   hand-craft a trigger that decides when an how often emit intermediate, 
punctual and late window results, and when to evict the window and stop 
processing late events
  *   in order to process late event you also need to specify for how long you 
will extend the window processing (or is that done in the trigger … I don’t 
remember right know)
  *   overall window state grows, if you extend window processing to after it 
is finished …

Hope this helps 

Thias

From: Caizhi Weng mailto:tsreape...@gmail.com>>
Sent: Donnerstag, 25. November 2021 02:56
To: John Smith mailto:java.dev@gmail.com>>
Cc: user mailto:user@flink.apache.org>>
Subject: Re: Windows and data loss.

Hi!

Are you using the datastream API or the table / SQL API? I don't know if 
datastream API has this functionality, but in table / SQL API we have the 
following configurations [1].

  *   table.exec.emit.late-fire.enabled: Emit window results for late records;
  *   table.exec.emit.late-fire.delay: How often shall we emit results for late 
records (for example, once per 10 minutes or for every record).

[1] 
https://github.com/apache/flink/blob/601ef3b3bce040264daa3aedcb9d98ead8303485/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/WindowEmitStrategy.scala#L214

John Smith mailto:java.dev@gmail.com>> 
于2021年11月25日周四 上午12:45写道:
Hi I understand that when using windows and having set the watermarks and 
lateness configs. That if an event comes late it is lost and we can output it 
to side output.

But wondering is there a way to do it without the loss?

I'm guessing an "all" window with a custom trigger that just fires X period and 
whatever is on that bucket is in that bucket?
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may co

RE: Windows and data loss.

2021-11-25 Thread Schwalbe Matthias
Hi John,

… just a short hint:
With datastream API you can

  *   hand-craft a trigger that decides when an how often emit intermediate, 
punctual and late window results, and when to evict the window and stop 
processing late events
  *   in order to process late event you also need to specify for how long you 
will extend the window processing (or is that done in the trigger … I don’t 
remember right know)
  *   overall window state grows, if you extend window processing to after it 
is finished …

Hope this helps 

Thias

From: Caizhi Weng 
Sent: Donnerstag, 25. November 2021 02:56
To: John Smith 
Cc: user 
Subject: Re: Windows and data loss.

Hi!

Are you using the datastream API or the table / SQL API? I don't know if 
datastream API has this functionality, but in table / SQL API we have the 
following configurations [1].

  *   table.exec.emit.late-fire.enabled: Emit window results for late records;
  *   table.exec.emit.late-fire.delay: How often shall we emit results for late 
records (for example, once per 10 minutes or for every record).

[1] 
https://github.com/apache/flink/blob/601ef3b3bce040264daa3aedcb9d98ead8303485/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/WindowEmitStrategy.scala#L214

John Smith mailto:java.dev@gmail.com>> 
于2021年11月25日周四 上午12:45写道:
Hi I understand that when using windows and having set the watermarks and 
lateness configs. That if an event comes late it is lost and we can output it 
to side output.

But wondering is there a way to do it without the loss?

I'm guessing an "all" window with a custom trigger that just fires X period and 
whatever is on that bucket is in that bucket?
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Custom partitioning of keys with keyBy

2021-11-04 Thread Schwalbe Matthias
Hi Yuval,

… I had to do some guesswork with regard to your use case … still not exactly 
clear what you want to achieve, however I remember having done something 
similar in that area 2 years ago.
Unfortunately I cannot find the implementation anymore ☹


  *   If you tried a combination of .partitionCustom() and 
reinterpretAsKeyedStream(): this will fail, because reinterpretAsKeyedStream() 
forces a ForwardPartitioner.
  *   You could still model your code after the implementation of 
reinterpretAsKeyedStream and use your own partitioner instead [1]
  *   Partitioning is relevant in two places:
 *   The outgoing Transform for selection of the output channel
 *   The incoming Transform for selecting the correct key range for state 
primitives
 *   You need to make sure that both sides agree

… for the last question regarding the more sophisticated scenario … please give 
me a little more time for a sketch … I also want to understand a little better 
your use case

Hope this helps

Thias





[1] 
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamUtils.java#L185-L210

From: Yuval Itzchakov 
Sent: Donnerstag, 4. November 2021 08:25
To: naitong Xiao 
Cc: user 
Subject: Re: Custom partitioning of keys with keyBy

Thank you Schwalbe, David and Naitong for your answers!

David: This is what we're currently doing ATM, and I wanted to know if there's 
any simplified approach to this. This is what we have so far: 
https://gist.github.com/YuvalItzchakov/9441a4a0e80609e534e69804e94cb57b
Naitong: The keyBy internally will rehash the key you provide it. How do you 
make sure that the re-hashed key is still in the desired key group range?
Schwalbe:

  *   Assuming that all your 4 different keys are evenly distributed, and you 
send them to (only) 3 buckets, you would expect at least one bucket to cover 2 
of your keys, hence the 50% - You're right, this is the desire behavior I 
actually want, I don't want them to be really uniformly distributed as I want 
to batch multiple keys together in the same bucket.
  *   With low entropy keys avoiding data skew is quite difficult - I 
understand, and we are well aware of the implications.
  *   But your situation could be worse, all 4 keys could end up in the same 
bucket, if the hash function in use happens to generate collisions for the 4 
keys, in which case 2 of your 3 buckets would not process any events … this 
could also lead to watermarks not progressing … - We take care of this 
internally as we understand there may be skewing to the buckets. I don't care 
about watermarks at this stage.
  *   There is two proposal on how to improve the situation:

 *   Use the same parallelism and max parallelism for the relevant 
operators and implement a manual partitioner

*   A manual partitioner is also good in situations where you want to 
lower the bias and you exactly know the distribution of your key space and 
rearrange keys to even-out numbers - I looked into custom partitioning, but it 
seems to not work with KeyedDataStream, and I need the distribution to be 
performed when keying the stream.

 *   More sophisticated (if possible), divide-and-conquer like - 
Interesting idea, but I'm not sure I follow. Could you possibly provide a 
sketch of the transformations on the stream?

*   Key by your ‘small’ key plus soma arbitrary attribute with higher 
entropy
*   Window aggregate first on that artificial key
*   Aggregate the results on your original ‘small’ key
*   This could be interesting for high-throughput situation where you 
actually want to run in parallelism higher than the number of different ‘small’ 
keys


On Thu, Nov 4, 2021 at 5:48 AM naitong Xiao 
mailto:xiaonait...@gmail.com>> wrote:
I think I had a similar scenario several months ago, here is my related code:

val MAX_PARALLELISM = 16
val KEY_RAND_SALT = “73b46”

logSource.keyBy{ value =>
 val keyGroup = KeyGroupRangeAssignment.assignToKeyGroup(value.deviceUdid, 
MAX_PARALLELISM)
 s"$KEY_RAND_SALT$keyGroup"
}

The keyGroup is just like your bucket id,  and the KEY_RAND_SALT was generated 
by some script to map bucket id evenly to operators under the max parallelism.

Sent with a Spark
On Nov 3, 2021, 9:47 PM +0800, Yuval Itzchakov 
mailto:yuva...@gmail.com>>, wrote:

Hi,
I have a use-case where I'd like to partition a KeyedDataStream a bit 
differently than how Flinks default partitioning works with key groups.


What I'd like to be able to do is take all my data and split it up evenly 
between 3 buckets which will store the data in the state. Using the key above 
works, but splits the data unevenly between the different key groups, as 
usually the key space is very small (0 - 3). What ends up happening is that 
sometimes 50% of the keys end up on the same operator index, where ideally I'd 
like to distribute it evenly 

RE: Custom partitioning of keys with keyBy

2021-11-03 Thread Schwalbe Matthias
Hi Yuval,

Just a couple of comments:


  *   Assuming that all your 4 different keys are evenly distributed, and you 
send them to (only) 3 buckets, you would expect at least one bucket to cover 2 
of your keys, hence the 50%
  *   With low entropy keys avoiding data skew is quite difficult
  *   But your situation could be worse, all 4 keys could end up in the same 
bucket, if the hash function in use happens to generate collisions for the 4 
keys, in which case 2 of your 3 buckets would not process any events … this 
could also lead to watermarks not progressing …
  *   There is two proposal on how to improve the situation:
 *   Use the same parallelism and max parallelism for the relevant 
operators and implement a manual partitioner
*   A manual partitioner is also good in situations where you want to 
lower the bias and you exactly know the distribution of your key space and 
rearrange keys to even-out numbers
 *   More sophisticated (if possible), divide-and-conquer like:
*   Key by your ‘small’ key plus soma arbitrary attribute with higher 
entropy
*   Window aggregate first on that artificial key
*   Aggregate the results on your original ‘small’ key
*   This could be interesting for high-throughput situation where you 
actually want to run in parallelism higher than the number of different ‘small’ 
keys

Hope this helps

Thias


From: Yuval Itzchakov 
Sent: Mittwoch, 3. November 2021 14:41
To: user 
Subject: Custom partitioning of keys with keyBy

Hi,
I have a use-case where I'd like to partition a KeyedDataStream a bit 
differently than how Flinks default partitioning works with key groups.

[cid:image001.png@01D7D0C8.69E83060]
What I'd like to be able to do is take all my data and split it up evenly 
between 3 buckets which will store the data in the state. Using the key above 
works, but splits the data unevenly between the different key groups, as 
usually the key space is very small (0 - 3). What ends up happening is that 
sometimes 50% of the keys end up on the same operator index, where ideally I'd 
like to distribute it evenly between all operator indexes in the cluster.

Is there any way of doing this?
--
Best Regards,
Yuval Itzchakov.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: FlinkKafkaConsumer -> KafkaSource State Migration

2021-11-01 Thread Schwalbe Matthias
Thanks Fabian,



That was the information I was missing.

(Late reply ... same here, FlinkForward  )



Thias



-Original Message-
From: Fabian Paul 
Sent: Donnerstag, 28. Oktober 2021 08:38
To: Schwalbe Matthias 
Cc: Mason Chen ; user 
Subject: Re: FlinkKafkaConsumer -> KafkaSource State Migration



Hi,



Sorry for the late reply but most of use were involved in the Flink Forward 
conference. The upgrade strategies for the Kafka sink and source are pretty 
similar. Source and sink do not rely on state migration but leveraging Kafka as 
source of truth.



When running with FlinkKafkaConsumer Mason pointed out correctly that you have 
stop the job with a save point and set `setCommittedOffsetsOnCheckpoint(true)` 
[1].



For the FlinkKafkaProducer it is similar on a final savepoint the producer will 
finalize all pending transactions and submit them to Kafka. The KafkaSink can 
start without the need of any state migration because there should not be any 
pending transactions anymore.



I do not think you must use `allowNonRestoredState` because there shouldn’t be 
any state anymore after stopping with a savepoint the source or sink.



Best,

Fabian



[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/#upgrading-to-the-latest-connector-version
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: FlinkKafkaConsumer -> KafkaSource State Migration

2021-10-27 Thread Schwalbe Matthias
I would also be interested on instructions/discussion on how to state-migrate 
from pre-unified sources/sinks to unified ones (Kafka) 

Thias

From: Mason Chen 
Sent: Mittwoch, 27. Oktober 2021 01:52
To: user 
Subject: FlinkKafkaConsumer -> KafkaSource State Migration

Hi all,

I read these instructions for migrating to the KafkaSource:
https://nightlies.apache.org/flink/flink-docs-release-1.14/release-notes/flink-1.14/#deprecate-flinkkafkaconsumer.

Do we need to employ any uid/allowNonRestoredState tricks if our Flink job is 
also stateful outside of the source? Or what is the mechanism that resolves the 
topic/partition/offsets in the stateful upgrade? Will restoring from 
FlinkKafkaConsumer cause an exception due to incompatibility of the union state 
to the current (what is it again)?

Best,
Mason
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Duplicate Calls to Cep Filter

2021-10-27 Thread Schwalbe Matthias
Hi Puneet,

…  not able to answer your question, but I would be curious to also print out 
the value with your diagnostic message.
… assuming we’ll see an ‘a’ and a ‘b’ for both filters resp.

… simple explanation would be that the filters are applied to all input, 
regardless of the pattern matching that follow the input filtering (just 
guessing)

Thias


From: Puneet Duggal 
Sent: Mittwoch, 27. Oktober 2021 11:12
To: user 
Subject: Duplicate Calls to Cep Filter

Hi,

I am facing an issue where a single event is causing execution of a cep filter 
multiple times. I went through this 
video explaining automata 
formation from pattern sequence but it still does not explain the behaviour 
that I am facing. Following is the sample pattern for which I am testing this 
behaviour.

Pattern innerPattern =

Pattern

.begin("start")

.where(new SimpleCondition() {

@Override

public boolean filter(String value) throws 
Exception {

System.out.println("In the beginning");

return value.equals("a");

}

})

.followedBy("middle")

.where(new SimpleCondition() {

@Override

public boolean filter(String value) throws 
Exception {

System.out.println("In the middle");

return value.equals("b");

}

});

On passing events a and b to this pattern.. result is

1> a

In the beginning

1> b

In the middle

In the middle

In the beginning

Matched

3> {start=[a], middle=[b]}

As you can see ... on ingestion of b middle pattern is getting called twice. 
Any ideas of this behaviour.



Thanks and regards,

Puneet Duggal
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Huge backpressure when using AggregateFunction with Session Window

2021-10-26 Thread Schwalbe Matthias
Hi Ori, … answering from remote …


  *   If not completely mistaken, Scala Vector is immutable, creating a copy 
whenever you append, but
  *   This is not the main problem, the vectors collected so far get 
deserialized with every incoming event (from state storage) and afterward 
serialized into stat storage
  *   This won’t matter so much if you only collect 2 or 3 events into a 
session window, but with maybe 1000 such events it does (you didn’t share your 
numbers  )
  *   For the ProcessFunction implementation you could use a Vector Builder and 
the assign the result.
  *   Regarding the "without touching the previously stored event" question, 
more detailed (I was in a rush)
 *   Windowing with ProcessFunction collects every event assigned to a 
session window into a list state … iterating/aggregating over the collected 
event only once when the window is triggered (i.e. the session is finished)
 *   While collecting the events into the list state it add()-s the new 
event to the list state
 *   For rocksdb this involves only serializing the single added event and 
appending the binary representation to the list state of the respective (key, 
session window key (namespace in Flink speak)), i.e.
 *   The previously stored events for the session window are not touched 
when a new event is added
  *   Next question: the overhead can easily be the cause of such backpressure, 
depending on the numbers:
 *   Serialized size of your accumulator, proportional to the number of 
aggregated events
 *   Size and entropy, frquency of your key space -> cache hits vs. cache 
fails in RocksDb
  *   Of course there could be additional sources of backpressure

I hope this helps, … I’ll be back next week

Thias

From: Ori Popowski 
Sent: Donnerstag, 21. Oktober 2021 15:32
To: Schwalbe Matthias 
Cc: user 
Subject: Re: Huge backpressure when using AggregateFunction with Session Window


Thanks for taking the time to answer this.

  *   You're correct that the SimpleAggregator is not used in the job setup. I 
didn't copy the correct piece of code.
  *   I understand the overhead involved. But I do not agree with the O(n^2) 
complexity. Are you implying that Vector append is O(n) by itself?
  *   I understand your points regarding ProcessFunction except for the 
"without touching the previously stored event". Also with AggregateFunction + 
concatenation I don't touch the elements other than the new element. I forgot 
to mention by the way, that the issue reproduces also with Lists which should 
be much faster for appends and concats.
Could overhead by itself account for the backpressure?
From this job the only conclusion is that Flink just cannot do aggregating 
operations which collect values, only simple operations which produce a scalar 
values (like sum/avg). It seems weird to me Flink would be so limited in such 
way.



On Wed, Oct 20, 2021 at 7:03 PM Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi Ori,

Just a couple of comments (some code is missing for a concise explanation):

  *   SimpleAggregator is not used in the job setup below (assuming another job 
setup)
  *   SimpleAggregator is called for each event that goes into a specific 
session window, however

 *   The scala vectors will ever grow with the number of events that end up 
in a single window, hence
 *   Your BigO complexity will be O(n^2), n: number of events in window (or 
worse)
 *   For each event the accumulator is retrieved from window state and 
stored to window state (and serialized, if on RocksDB Backend)

  *   On the other hand when you use a process function

 *   Flink keeps a list state of events belonging to the session window, and
 *   Only when the window is triggered (on session gap timeout) all events 
are retrieved from window state and processed
 *   On RocksDbBackend the new events added to the window are appended to 
the existing window state key without touching the previously stored events, 
hence
 *   Serialization is only done once per incoming event, and
 *   BigO complexity is around O(n)

… much simplified

When I started with similar questions I spent quite some time in the debugger, 
breaking into the windowing functions and going up the call stack, in order to 
understand how Flink works … time well spent


I hope this helps …

I won’t be able to follow up for the next 1 ½ weeks, unless you try to meet me 
on FlinkForward conference …

Thias

From: Ori Popowski mailto:ori@gmail.com>>
Sent: Mittwoch, 20. Oktober 2021 16:17
To: user mailto:user@flink.apache.org>>
Subject: Huge backpressure when using AggregateFunction with Session Window

I have a simple Flink application with a simple keyBy, a SessionWindow, and I 
use an AggregateFunction to incrementally aggregate a result, and write to a 
Sink.

Some of the requirements involve accumulating lists of fields from the events 
(for example, all URLs)

RE: Huge backpressure when using AggregateFunction with Session Window

2021-10-20 Thread Schwalbe Matthias
Hi Ori,

Just a couple of comments (some code is missing for a concise explanation):

  *   SimpleAggregator is not used in the job setup below (assuming another job 
setup)
  *   SimpleAggregator is called for each event that goes into a specific 
session window, however
 *   The scala vectors will ever grow with the number of events that end up 
in a single window, hence
 *   Your BigO complexity will be O(n^2), n: number of events in window (or 
worse)
 *   For each event the accumulator is retrieved from window state and 
stored to window state (and serialized, if on RocksDB Backend)
  *   On the other hand when you use a process function
 *   Flink keeps a list state of events belonging to the session window, and
 *   Only when the window is triggered (on session gap timeout) all events 
are retrieved from window state and processed
 *   On RocksDbBackend the new events added to the window are appended to 
the existing window state key without touching the previously stored events, 
hence
 *   Serialization is only done once per incoming event, and
 *   BigO complexity is around O(n)

… much simplified

When I started with similar questions I spent quite some time in the debugger, 
breaking into the windowing functions and going up the call stack, in order to 
understand how Flink works … time well spent


I hope this helps …

I won’t be able to follow up for the next 1 ½ weeks, unless you try to meet me 
on FlinkForward conference …

Thias

From: Ori Popowski 
Sent: Mittwoch, 20. Oktober 2021 16:17
To: user 
Subject: Huge backpressure when using AggregateFunction with Session Window

I have a simple Flink application with a simple keyBy, a SessionWindow, and I 
use an AggregateFunction to incrementally aggregate a result, and write to a 
Sink.

Some of the requirements involve accumulating lists of fields from the events 
(for example, all URLs), so not all the values in the end should be primitives 
(although some are, like total number of events, and session duration).

This job is experiencing a huge backpressure 40 minutes after launching.

I've found out that the append and concatenate operations in the logic of my 
AggregateFunction's add() and merge() functions are what's ruining the job 
(i.e. causing the backpressure).

I've managed to create a reduced version of my job, where I just append and 
concatenate some of the event values and I can confirm that a backpressure 
starts just 40 minutes after launching the job:


class SimpleAggregator extends AggregateFunction[Event, Accumulator, 
Session] with LazyLogging {

  override def createAccumulator(): Accumulator = (
Vector.empty,
Vector.empty,
Vector.empty,
Vector.empty,
Vector.empty
  )

  override def add(value: Event, accumulator: Accumulator): Accumulator = {
(
  accumulator._1 :+ value.getEnvUrl,
  accumulator._2 :+ value.getCtxVisitId,
  accumulator._3 :+ value.getVisionsSId,
  accumulator._4 :+ value.getTime.longValue(),
  accumulator._5 :+ value.getTime.longValue()
)
  }

  override def merge(a: Accumulator, b: Accumulator): Accumulator = {
(
  a._1 ++ b._1,
  a._2 ++ b._2,
  a._3 ++ b._3,
  a._4 ++ b._4,
  a._5 ++ b._5
)
  }

  override def getResult(accumulator: Accumulator): Session = {
Session.newBuilder()
  .setSessionDuration(1000)
  .setSessionTotalEvents(1000)
  .setSId("-" + UUID.randomUUID().toString)
  .build()
  }
}

This is the job overall (simplified version):


class App(
  source: SourceFunction[Event],
  sink: SinkFunction[Session]
) {

  def run(config: Config): Unit = {
val senv = StreamExecutionEnvironment.getExecutionEnvironment
senv.setMaxParallelism(256)
val dataStream = senv.addSource(source).uid("source")
dataStream
  .assignAscendingTimestamps(_.getTime)
  .keyBy(event => (event.getWmUId, event.getWmEnv, 
event.getSId).toString())
  
.window(EventTimeSessionWindows.withGap(config.sessionGap.asFlinkTime))
  .allowedLateness(0.seconds.asFlinkTime)
  .process(new ProcessFunction).uid("process-session")
  .addSink(sink).uid("sink")

senv.execute("session-aggregation")
  }
}

After 3 weeks of grueling debugging, profiling, checking the serialization and 
more I couldn't solve the backpressure issue.
However, I got an idea and used Flink's ProcessWindowFunction which just 
aggregates all the events behind the scenes and just gives them to me as an 
iterator, where I can then do all my calculations.
Surprisingly, there's no backpressure. So even though the ProcessWindowFunction 
actually aggregates more data, and also does concatenations and appends, for 
some reason there's no backpressure.

To finish this long post, what I'm trying to 

RE: Any issues with reinterpretAsKeyedStream when scaling partitions?

2021-10-15 Thread Schwalbe Matthias
… didn’t mean to hit the send button so soon 

I guess we are getting closer to a solution


Thias



From: Schwalbe Matthias
Sent: Freitag, 15. Oktober 2021 08:49
To: 'Dan Hill' ; user 
Subject: RE: Any issues with reinterpretAsKeyedStream when scaling partitions?

Hi Dan again ,

I shed a second look … from what I see from your call stack I conclude that 
indeed you have a network shuffle between your two operators,
In which case reinterpretAsKeyedStream wouldn’t work

($StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:277 
indicates that the two operators are not chained)


… just as a double-check could you please share both your

  *   Execution plan (call println(env.getExecutionPlan) right before your call 
env.execute) (json), and
  *   Your job plan (screenshot from flink dashboard)

There is a number of preconditions before two operators get chained, and 
probably one of them fails (see [1]):

  *   The two operators need to allow chaining the resp. other (see [2] … 
chaining strategy)
  *   We need a ForwardPartitioner in between
  *   We need to be in streaming mode
  *   Both operators need the same parallelism
  *   Chaining needs to be enabled for the streaming environment
  *   The second operator needs to be single-input (i.e. no TwoInputOp nor 
union() before)


[1] 
https://github.com/apache/flink/blob/2dabdd95c15ccae2a97a0e898d1acfc958a0f7f3/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L861-L873
[2] 
https://github.com/apache/flink/blob/2dabdd95c15ccae2a97a0e898d1acfc958a0f7f3/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L903-L932


From: Dan Hill mailto:quietgol...@gmail.com>>
Sent: Donnerstag, 14. Oktober 2021 17:50
To: user mailto:user@flink.apache.org>>
Subject: Any issues with reinterpretAsKeyedStream when scaling partitions?

I have a job that uses reinterpretAsKeyedStream across a simple map to avoid a 
shuffle.  When changing the number of partitions, I'm hitting an issue with 
registerEventTimeTimer complaining that "key group from 110 to 119 does not 
contain 186".  I'm using Flink v1.12.3.

Any thoughts on this?  I don't know if there is a known issue with 
reinterpretAsKeyedStream.

Rough steps:
1. I have a raw input stream of View records.  I keyBy the View using 
Tuple2(platform_id, log_user_id).
2. I do a small transformation of View to a TinyView.  I 
reinterpretAsKeyedStream the TinyView as a KeyedStream with the same key.  The 
keys are the same.
3. I use the TinyView in a KeyedCoProcessFunction.

When I savepoint and start again with a different number of partitions, my 
KeyedCoProcessFunction hits an issue with registerEventTimeTimer and complains 
that "key group from 110 to 119 does not contain 186".  I verified that the key 
does not change and that we use Tuple2 with primitives Long and String.



2021-10-14 08:17:07
java.lang.IllegalArgumentException: view x insertion issue with 
registerEventTimeTimer for key=(120,3bfd5b19-9d86-4455-a5a1-480f8596a174), 
flat=platform_id: 120
log_user_id: "3bfd5b19-9d86-4455-a5a1-480f8596a174"
log_timestamp: 1634224329606
view_id: "8fcdf922-7c79-4902-9778-3f20f39b0bc2"

at 
ai.promoted.metrics.logprocessor.common.functions.inferred.BaseInferred.processElement1(BaseInferred.java:318)
at 
ai.promoted.metrics.logprocessor.common.functions.inferred.BaseInferred.processElement1(BaseInferred.java:59)
at 
ai.promoted.metrics.logprocessor.common.functions.LogSlowOnTimer.processElement1(LogSlowOnTimer.java:36)
at 
org.apache.flink.streaming.api.operators.co.KeyedCoProcessOperator.processElement1(KeyedCoProcessOperator.java:78)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.processRecord1(StreamTwoInputProcessorFactory.java:199)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.lambda$create$0(StreamTwoInputProcessorFactory.java:164)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:277)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:95)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.Mailb

RE: Any issues with reinterpretAsKeyedStream when scaling partitions?

2021-10-15 Thread Schwalbe Matthias
Hi Dan again ,

I shed a second look … from what I see from your call stack I conclude that 
indeed you have a network shuffle between your two operators,
In which case reinterpretAsKeyedStream wouldn’t work

($StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:277 
indicates that the two operators are not chained)


… just as a double-check could you please share both your

  *   Execution plan (call println(env.getExecutionPlan) right before your call 
env.execute) (json), and
  *   Your job plan (screenshot from flink dashboard)

There is a number of preconditions before two operators get chained, and 
probably one of them fails (see [1]):

  *   The two operators need to allow chaining the resp. other (see [2] … 
chaining strategy)
  *   We need a ForwardPartitioner in between
  *   We need to be in streaming mode
  *   Both operators need the same parallelism
  *   Chaining needs to be enabled for the streaming environment
  *   The second operator needs to be single-input (i.e. no TwoInputOp nor 
union() before)


[1] 
https://github.com/apache/flink/blob/2dabdd95c15ccae2a97a0e898d1acfc958a0f7f3/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L861-L873
[2] 
https://github.com/apache/flink/blob/2dabdd95c15ccae2a97a0e898d1acfc958a0f7f3/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L903-L932


From: Dan Hill 
Sent: Donnerstag, 14. Oktober 2021 17:50
To: user 
Subject: Any issues with reinterpretAsKeyedStream when scaling partitions?

I have a job that uses reinterpretAsKeyedStream across a simple map to avoid a 
shuffle.  When changing the number of partitions, I'm hitting an issue with 
registerEventTimeTimer complaining that "key group from 110 to 119 does not 
contain 186".  I'm using Flink v1.12.3.

Any thoughts on this?  I don't know if there is a known issue with 
reinterpretAsKeyedStream.

Rough steps:
1. I have a raw input stream of View records.  I keyBy the View using 
Tuple2(platform_id, log_user_id).
2. I do a small transformation of View to a TinyView.  I 
reinterpretAsKeyedStream the TinyView as a KeyedStream with the same key.  The 
keys are the same.
3. I use the TinyView in a KeyedCoProcessFunction.

When I savepoint and start again with a different number of partitions, my 
KeyedCoProcessFunction hits an issue with registerEventTimeTimer and complains 
that "key group from 110 to 119 does not contain 186".  I verified that the key 
does not change and that we use Tuple2 with primitives Long and String.



2021-10-14 08:17:07
java.lang.IllegalArgumentException: view x insertion issue with 
registerEventTimeTimer for key=(120,3bfd5b19-9d86-4455-a5a1-480f8596a174), 
flat=platform_id: 120
log_user_id: "3bfd5b19-9d86-4455-a5a1-480f8596a174"
log_timestamp: 1634224329606
view_id: "8fcdf922-7c79-4902-9778-3f20f39b0bc2"

at 
ai.promoted.metrics.logprocessor.common.functions.inferred.BaseInferred.processElement1(BaseInferred.java:318)
at 
ai.promoted.metrics.logprocessor.common.functions.inferred.BaseInferred.processElement1(BaseInferred.java:59)
at 
ai.promoted.metrics.logprocessor.common.functions.LogSlowOnTimer.processElement1(LogSlowOnTimer.java:36)
at 
org.apache.flink.streaming.api.operators.co.KeyedCoProcessOperator.processElement1(KeyedCoProcessOperator.java:78)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.processRecord1(StreamTwoInputProcessorFactory.java:199)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.lambda$create$0(StreamTwoInputProcessorFactory.java:164)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:277)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:95)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
at 

RE: Helper methods for catching unexpected key changes?

2021-10-08 Thread Schwalbe Matthias
Good morning Dan,

Being short of information on how you arranged your job, I can only make 
general comments:

ReinterpretAsKeyedStream only applies to data streams that are in fact 
partitioned by the same key, i.e. your job would look somewhat like this:

DataStreamUtils.reinterpretAsKeyedStream(
Stream
.keyBy(keyExtractor1)
.process(keyedProcessFunction1)//or any of the other keyed operators
,keyExtractor2 …
)
.process(keyedProcessFunction2) //or any of the other keyed operators

keyExtractor1 and keyExtractor2 need to come to the same result for related 
events (input/output of keyedProcessFuntion1 resp.)

I assume your exception happens in keyedProcessFunction2?

reinterpretAsKeyedStream makes sense if you want to chain keyedProcessFunction1 
and keyedProcessFunction2, otherwise keyBy() will do …

I hope these hints help, otherwise feel free to get back to the mailing list 
with a more detailed description of your arrangement 

Cheers

Thias





From: Dan Hill 
Sent: Freitag, 8. Oktober 2021 06:49
To: user 
Subject: Helper methods for catching unexpected key changes?

Hi.  I'm getting the following errors when using reinterpretAsKeyedStream.  I 
don't expect the key to change for rows in reinterpretAsKeyedStream.  Are there 
any utilities that I can use that I can use with reinterpetAsKeyedStream to 
verify that the key doesn't change?  E.g. some wrapper operator?



2021-10-02 16:38:46
java.lang.IllegalArgumentException: key group from 154 to 156 does not contain 
213
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
at 
org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue.globalKeyGroupToLocalIndex(KeyGroupPartitionedPriorityQueue.java:191)
at 
org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue.computeKeyGroupIndex(KeyGroupPartitionedPriorityQueue.java:186)
at 
org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue.getKeyGroupSubHeapForElement(KeyGroupPartitionedPriorityQueue.java:179)
at 
org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue.add(KeyGroupPartitionedPriorityQueue.java:114)
at 
org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.registerEventTimeTimer(InternalTimerServiceImpl.java:233)
at 
org.apache.flink.streaming.api.SimpleTimerService.registerEventTimeTimer(SimpleTimerService.java:52)
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Upgrading from 1.11.3 -> 1.13.1 - random jobs stays in "CREATED"state, then fails with Slot request bulk is not fulfillable!

2021-09-29 Thread Schwalbe Matthias
Hi Tobias,

If your number of pipelines equals number of Flink job then this is exactly 
what you should observe:
It takes one slot per Flink job and parallelism, hence for parallelism 1 you 
would have to provide at least 40 slots.

… independent of Flink version

… for Beam on Flink I’m not sure, assuming similar matters


Thias


From: Kaymak, Tobias 
Sent: Freitag, 24. September 2021 14:53
To: user 
Subject: Upgrading from 1.11.3 -> 1.13.1 - random jobs stays in "CREATED"state, 
then fails with Slot request bulk is not fulfillable!

Hi,

I am trying to upgrade our Flink cluster from version 1.11.3 -> 1.13.1
We use it to execute over 40 pipelines written in Apache Beam 2.32.0.

While moving the pipelines one-by-one over to the new cluster I noticed at some 
point that it did not start a new pipeline after I moved about 20.

4 TM with 8 slots are running, giving 32 slots to run things.

When I kill the jobmanager pod to make it reload the config, a random pipeline 
is then stuck in the CREATED state. No log is shown but after some minutes it's 
visible that:

Slot request bulk is not fulfillable! Could not allocate the required slot 
within slot request timeout

I found this post: 
http://mail-archives.apache.org/mod_mbox/flink-issues/202106.mbox/%3cjira.13382840.162321628.576520.1623246960...@atlassian.jira%3E

However, I am running the official Docker images of Flink, TM and JM are in 
sync.

I checked that there is no memory pressure on TM and JM:
[cid:image001.png@01D7B50B.6E949D20]
[cid:image002.png@01D7B50B.6E949D20]

Any advice on how to debug this situation?

jobmanager.memory.heap.size: 3500m
jobmanager.memory.jvm-overhead.max: 1536m
jobmanager.memory.process.size: 5gb
jobmanager.memory.off-heap.size: 512m
jobmanager.memory.jvm-metaspace.size: 512m

taskmanager.memory.process.size: 54gb
taskmanager.memory.jvm-metaspace.size: 2gb
taskmanager.memory.task.off-heap.size: 2gb

Best,
Tobi
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: stream processing savepoints and watermarks question

2021-09-24 Thread Schwalbe Matthias
Hi all,

The way I understand the matter is that the described behavior is intentional 
for event time timers:


  *   When called, an event time handler can register new timers

· The timestamp parameter (override def onTimer(timestamp: Long, ctx: 
KeyedProcessFunction[K, I, O]#OnTimerContext, out: Collector[O]): Unit = { …)

 *   Is set to the timeout time of the timer, not the watermark that caused 
the timeout !
  *   onTimer, as said, can register new timers, timeout time can even be in 
the past,
  *   in that case the timeout is handled immediately after current onTimer(…) 
call
  *   as long as onTimer registers new timers it will iterate through all these 
timers

When an operator receives a watermark it fires all registered timers with 
timeout <= watermark, in timeout order, also the ones registered in onTimer().

This is especially the case for a  MAX_WATERMARK watermark, but will be the 
same for any watermark that lies in the future.

For your case, @Marco, you could break this pattern by comparing the timeout to 
be registered with current processing time and if that lies safely too much in 
the future: not register the timeout.
That would break the infinite iteration over timers …

I believe the behavior exhibited by flink is intentional and no defect!

What do you think?

Thias


From: JING ZHANG 
Sent: Freitag, 24. September 2021 12:25
To: Guowei Ma 
Cc: Marco Villalobos ; user 
Subject: Re: stream processing savepoints and watermarks question

Hi Guowei,
Thanks for quick response, maybe I didn't express it clearly in the last email.
In fact, above case happened in reality, not what I imagined.
When MAX_WATERMARK is received, the operator will try to fire all registered 
event-time timers. However in the above case, new timers are continuous being 
registered.
I would try to reproduce the problem in an ITCase, and once completed I would 
provide the code.

Best,
JING ZHANG

Guowei Ma mailto:guowei@gmail.com>> 于2021年9月24日周五 
下午5:16写道:
Hi, JING

Thanks for the case.
But I am not sure this would happen. As far as I know the event timer could 
only be triggered when there is a watermark (except the "quiesce phase").
I think it could not advance any watermarks after MAX_WATERMARK is received.

Best,
Guowei


On Fri, Sep 24, 2021 at 4:31 PM JING ZHANG 
mailto:beyond1...@gmail.com>> wrote:
Hi Guowei,
I could provide a case that I have encountered which timers to fire 
indefinitely when doing drain savepoint.
After an event timer is triggered, it registers another event timer whose value 
equals the value of triggered timer plus an interval time.
If a MAX_WATERMARK comes, the timer is triggered, then registers another timer 
and forever.
I'm not sure whether Macro meets a similar problem.

Best,
JING ZHANG



Guowei Ma mailto:guowei@gmail.com>> 于2021年9月24日周五 
下午4:01写道:
Hi Macro

Indeed, as mentioned by JING, if you want to drain when triggering savepoint, 
you will encounter this MAX_WATERMARK.
But I have a problem. In theory, even with MAX_WATERMARK, there will not be an 
infinite number of timers. And these timers should be generated by the 
application code.
You can share your code if it is convenient for you.

Best,
Guowei


On Fri, Sep 24, 2021 at 2:02 PM JING ZHANG 
mailto:beyond1...@gmail.com>> wrote:
Hi Macro,
Do you specified drain flag when stop a job with a savepoint?
If the --drain flag is specified, then a MAX_WATERMARK will be emitted before 
the last checkpoint barrier.

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint

Best,
JING ZHANG

Marco Villalobos mailto:mvillalo...@kineteque.com>> 
于2021年9月24日周五 下午12:54写道:
Something strange happened today.
When we tried to shutdown a job with a savepoint, the watermarks became equal 
to 2^63 - 1.

This caused timers to fire indefinitely and crash downstream systems with 
overloaded untrue data.

We are using event time processing with Kafka as our source.

It seems impossible for a watermark to be that large.

I know its possible stream with a batch execution mode.  But this was stream 
processing.

What can cause this?  Is this normal behavior when creating a savepoint?
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 

RE: byte array as keys in Flink

2021-09-24 Thread Schwalbe Matthias
Hi Dan,

Did you consider using java.utils.UUID as key type? It consists of two longs 
which should perform well for use as key.
TypeInformation will map to GenericTypeInfo, i.e. it uses KryoSerializer, 
unless you register a specific TypeInformation for this class …

I didn’t give it a try … keep us posted if that works 

Thias


From: Guowei Ma 
Sent: Freitag, 24. September 2021 09:34
To: Caizhi Weng 
Cc: Dan Hill ; user 
Subject: Re: byte array as keys in Flink

Hi Hill

As far as I know you could not use byte[] as a keyby. You could find more 
information from [1].

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#keyby

Best,
Guowei


On Fri, Sep 24, 2021 at 3:15 PM Caizhi Weng 
mailto:tsreape...@gmail.com>> wrote:
Hi!

It depends on the state backend you use. For example if you use a heap memory 
state backend it is backed by a hash map and it uses the hash code of byte[] to 
compare the two byte[] (see HeapMapState#put). However for rocksdb state 
backend it uses the serialized bytes (that is to say, the content of byte[]) to 
compare with the records and thus two byte[] with the same content can match 
(see RocksDBMapState#put).

Dan Hill mailto:quietgol...@gmail.com>> 于2021年9月24日周五 
上午7:43写道:
Context
I want to perform joins based on UUIDs.  String version is less efficient so I 
figured I should use the byte[] version.  I did a shallow dive into the Flink 
code I'm not sure it's safe to use byte[] as a key (since it uses object 
equals/hashcode).

Request
How do other Flink devs do for byte[] keys? I want to use byte[] as a key in a 
MapState.


Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Unbounded Kafka Source

2021-09-22 Thread Schwalbe Matthias
Hi,

If I remember right, this is actually the intended behaviour:

In batch mode: .setBounded(…)
In streaming mode: source that finishes anyway at set offset: use 
.setUnbounded(…)
In streaming mode: source that never finishes: don’t set a final offset (don’t 
.setUnbounded(…))

I might be mistaken …

Thias


From: Robert Metzger 
Sent: Mittwoch, 22. September 2021 17:51
To: Robert Cullen 
Cc: user 
Subject: Re: Unbounded Kafka Source

Hi,

What happens if you do not set any boundedness on the KafkaSource?
For a DataStream job in streaming mode, the Kafka source should be unbounded.

From reading the code, it seems that setting unbounded(latest) should not 
trigger the behavior you mention ... but the Flink docs are not clearly written 
[1], as it says that you can make a Kafka source bounded by calling 
"setUnbounded" ... which is weird, because "setUnbounded" should not make 
something bounded?!

Are there any log messages from the Source that can give us any hints?

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#boundedness

On Wed, Sep 22, 2021 at 5:37 PM Robert Cullen 
mailto:cinquate...@gmail.com>> wrote:
I have an unbounded kafka source that has records written to it every second.  
Instead of the job waiting to process the new messages it closes.  How do I 
keep the stream open?


KafkaSource dataSource = KafkaSource
.builder()
.setBootstrapServers(kafkaServer)
.setTopics(Arrays.asList("fluentd"))
.setGroupId("")
.setDeserializer(new FluentdRecordDeserializer())
//.setStartingOffsets(OffsetsInitializer.earliest())
//.setBounded(OffsetsInitializer.latest())
.setUnbounded(OffsetsInitializer.latest())
.build();



--
Robert Cullen
240-475-4490
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Flink operator stuck on created

2021-09-20 Thread Schwalbe Matthias
Hi Dave,

In batch mode an operator/task only starts running once all input tasks are 
finished. So without further detail this is perfectly in line with what you 
describe.

Thias


From: Dave Maughan 
Sent: Montag, 20. September 2021 13:15
To: user@flink.apache.org
Subject: Re: Flink operator stuck on created

I should note - this job is being run in batch mode. Could there be a deadlock 
related to FLINK-16430?

On Mon, 20 Sept 2021 at 11:26, Dave Maughan 
mailto:davidamaug...@gmail.com>> wrote:
Hi,

I have a Flink job on EMR with an operator stuck on CREATED. The subtasks are 
not being assigned to task manager slots. The previous operator is running and 
has non-zero Bytes Sent and Records Sent. When the job started the Job manager 
requested new workers to start a bunch of the operators but it's not requesting 
any more so the available slots is 0 and the job just seems to have stalled. 
Any pointers on what I might be doing wrong?

I'm specifying parallelism 24 and that's how many task slots are being 
requested by the job manager but also how many subtasks are being created each 
operator. Should I (if so how) specify these two numbers separately?

Thanks,
Dave
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Flink restarts on Checkpoint failure

2021-09-02 Thread Schwalbe Matthias
Good morning Daniel,

Another reason could be backpressure with aligned checkpoints:

  *   Flink processes checkpoints by sending checkpoint markers through the job 
graph, beginning with source operators towards the sink operators
  *   These checkpoint markers are sort of a meta event that is sent along you 
custom events (much like watermarks and latency markers)
  *   These checkpoint markers cannot pass by (i.e. go faster than) your custom 
events
  *   In your situation, because it happen right after you start the job,
 *   it might be a source that forwards many events (e.g. for backfilling) 
while a later operator cannot process these events in the same speed
 *   therefore the events queue in front of that operator as well as the 
checkpoint markers which consequently have a hard time to align event for 
longer than the checkpoint timeout
  *   how to fix this situation:
 *   diagnostics: Flink dashboard has a tab for checkpoints that show how 
long checkpoint progress and alignment take for each task/subtask
 *   which version of Flink are you using?
 *   Depending on the version of Flink you can enable unaligned checkpoints 
(having some other implications)
 *   You could also increase scale out factor for the backfill phase and 
then lower it again …


  *   FlinkRuntimeException: Exceeded checkpoint tolerable failure threshold: 
this depends on what recovery strategy you have configured …

I might be mistaken, however this is what I look into when I run into similar 
situations


Feel free to get back to the mailing list for further clarifications …

Thias


From: Caizhi Weng 
Sent: Donnerstag, 2. September 2021 04:24
To: Daniel Vol 
Cc: user 
Subject: Re: Flink restarts on Checkpoint failure

Hi!

There are a ton of possible reasons for a checkpoint failure. The most possible 
reasons might be
* The JVM is busy with garbage collecting when performing the checkpoints. This 
can be checked by looking into the GC logs of a task manager.
* The state suddenly becomes quite large due to some specific data pattern. 
This can be checked by looking at the state size for the completed portion of 
that checkpoint.

You might also want to profile the CPU usage when the checkpoint is happening.

Daniel Vol mailto:vold...@gmail.com>> 于2021年9月1日周三 下午7:08写道:
Hello,

I see the following error in my jobmanager log (Flink on EMR):
Checking cluster logs I see :
2021-08-21 17:17:30,489 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 1 (type=CHECKPOINT) @ 1629566250303 for job 
c513e9ebbea4ab72d80b1338896ca5c2.
2021-08-21 17:17:33,572 [jobmanager-future-thread-5] INFO  
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream
  - close closed:false s3://***/_metadata
2021-08-21 17:17:33,800 [jobmanager-future-thread-5] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Completed 
checkpoint 1 for job c513e9ebbea4ab72d80b1338896ca5c2 (737859873 bytes in 3496 
ms).
2021-08-21 17:27:30,474 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 2 (type=CHECKPOINT) @ 1629566850302 for job 
c513e9ebbea4ab72d80b1338896ca5c2.
2021-08-21 17:27:46,012 [jobmanager-future-thread-3] INFO  
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream
  - close closed:false s3://***/_metadata
2021-08-21 17:27:46,158 [jobmanager-future-thread-3] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Completed 
checkpoint 2 for job c513e9ebbea4ab72d80b1338896ca5c2 (1210889410 bytes in 
15856 ms).
2021-08-21 17:37:30,468 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Triggering 
checkpoint 3 (type=CHECKPOINT) @ 1629567450302 for job 
c513e9ebbea4ab72d80b1338896ca5c2.
2021-08-21 17:47:30,469 [Checkpoint Timer] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Checkpoint 3 of 
job c513e9ebbea4ab72d80b1338896ca5c2 expired before completing.
2021-08-21 17:47:30,476 [flink-akka.actor.default-dispatcher-34] INFO 
org.apache.flink.runtime.jobmaster.JobMaster - Trying to recover from a global 
failure.
org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable 
failure threshold.
at 
org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:66)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1673)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1650)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:91)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:1783)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at 

RE: Unrecoverable apps due to timeouts on transaction state initialization

2021-09-01 Thread Schwalbe Matthias
Hi Chohan,



Which Kafka client version are you using? ... considering that this started 
today, did you recently change the Kafka client version?



Giving a little more context (exception call stack/more log) might help finding 
out what is going on ... 



Regards



Thias



-Original Message-
From: Shahid Chohan 
Sent: Mittwoch, 1. September 2021 05:05
To: user@flink.apache.org
Subject: Unrecoverable apps due to timeouts on transaction state initialization



Today I started seeing the following exception across all of the exactly-once 
kafka sink apps I have deployed



org.apache.kafka.common.errors.TimeoutException: 
org.apache.kafka.common.errors.TimeoutException: Timeout expired while 
initializing transactional state in 6ms.

Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired 
while initializing transactional state in 6ms.



The apps are all on Flink v1.10.2



I tried the following workarounds sequentially for a single app but I still 
continued to get the same exception

- changing the sink uid and restoring with allowing non-restored-state

- changing the kafka producer id and restoring with allowing non-restored-state

- changing the output kafka topic to a new one and restoring with allowing 
non-restored-state

- deploying from scratch (no previous checkpoint/savepoint)

- doubling the timeout for state initialization from 60s to 120s



My mental model is that we have completely disassociated the flink app from any 
pending transactions on the kafka side (by changing the uid, producer id, and 
output topic) and so it should be able to recover from scratch. The kafka 
clusters are otherwise healthy and accepting writes for non-exactly-once flink 
apps and all other kafka producers.



On the kafka side, we have the following configs set.



transaction.max.timeout.ms=360

transaction.remove.expired.transaction.cleanup.interval.ms=8640



I'm considering changing the cleanup to something shorter so that if there are 
hanging transactions on the kafka side then maybe they can get garbage 
collected sooner. Or I might just wait it out and accept the downtime.



But otherwise, I am out of ideas and unsure how to proceed. Any help would be 
much appreciated.
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Flink sql CLI parsing avro format data error 解析avro数据失败

2021-08-27 Thread Schwalbe Matthias
Hi Wayne,

The only obvious difference between you ACRO schema and the table schema is, 
that you AVRO ccc is nullable and your SQL ‘ccc’ is not nullable.
Please adjust one of the two.
Also (not entirely sure) in order to correctly map an AVRO nullable to SQL it 
needs to have a default value like this:


  },{

"name" : "ccc",

"type" : [ "null", "string" ]

"default" : null

  },


Try if that makes any difference 

Thias

From: Wayne <1...@163.com>
Sent: Freitag, 27. August 2021 02:29
To: user@flink.apache.org
Subject: Flink sql CLI parsing avro format data error 解析avro数据失败




i have Apache Avro schema

我的avro schema 如下

{

  "type" : "record",

  "name" : "KafkaAvroMessage",

  "namespace" : "xxx",

  "fields" : [ {

"name" : "aaa",

"type" : "string"

  }, {

"name" : "bbb",

"type" : [ "null", "string" ],

"default" : null



  },{

"name" : "ccc",

"type" : [ "null", "string" ]



  },

{

  "name" : "ddd",

  "type" : "string",

  "default" : ""

} ]

}
The sql worte is like this
我下的sql如下

CREATE TABLE xxx (

`aaa` STRING NOT NULL,

`bbb` STRING ,

`ccc` STRING NOT NULL,

`ddd` STRING NOT NULL

) WITH(

...

'format' = 'avro'

);
The sql can parse the aaa bbb ddd field correctly,but cannot parse the ccc 
field. Excuse me,what is the problem with my sql writing,please give me a 
correct wording
这个sql可以解析 aaa、bbb、ddd 字段,不能解析ccc字段,请问我如果想用flinksql 解析这个schema,正确的写法是什么










Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


Re: Looking for suggestions about multithreaded CEP to be used with flink

2021-08-23 Thread Schwalbe Matthias
Hi Tejas,



I had your question sit in my mind for a while before I realized I had 
something to say about it 





Although not related to CEP, we had had a very similar problem with too many 
threads/tasks in an overwhelming split-join-pattern of about 1600 concurrent 
paths.



A colleague of mine worked on this for his master's thesis [1] ... we came to 
the conclusion to

- radically reduce fine grained parallelism,  i.e.

- use it (almost) only for Flink scale out (partitioning for specific key 
attributes)

- transform our algorithms to run multiple cases sequentially instead of 
parallel

- unify multiple key domains by generalizing the key and duplicating incoming 
events per key domain together with the unified key (ideas taken from e.g. [2])

- try to unify all state primitives that work on the same key into list or map 
state primitives, and iterate on these (this works especially well for RocksDB)

- patch Flink task chaining to create longer chains and allow chaining for 
operator with multiple inputs (only mentioned in [1]) ... (= fewer 
tasks/threads)



In your specific case with the CEP rules it is probably best

- to implement the patterns yourself or integrate some external library, but

- to make the CEP rules 'data' and broadcast them into the respective operators 
(ideally a single operator only), that iterates over the rules for each 
incoming event

- for the stateless rules, I once integrated Spring Boot SpEL for a similar 
rules system (precompiled when initially loaded, rules are actually quite fast)

- for the stateful rules

  - you could either integrate some proper library (which leaves you with the 
problem of integrating intermediate state into the Flink TypeInformation 
serialization system)

  - or implement it yourself e.g. by means of a regular expressions library 
that exposes its state transition tables generated for specific regular 
expressions



This way you could use Flink for what it is excellent (low-latency 
high-throughput stream processing with consistent state over restarts/crashes 
(e.g.)) and optimize in areas that are not optimal for your use case.





[1] https://www.merlin.uzh.ch/publication/show/21108

[2] https://www.youtube.com/watch?v=tHStmoj8WbQ

[3] 
https://docs.spring.io/spring-integration/docs/5.3.0.RELEASE/reference/html/spel.html



Feel free to get back to me for further discussion (on the user list)



Thias







On 2021/08/19 23:39:18, Tejas B  wrote:

> Hi,>

> Here's our use case :>

> We are planning to build a rule based engine on top of flink with huge number 
> of rules(1000s). the rules could be stateless or stateful. >

> Example stateless rule is : A.id = 3 && A.name = 'abc' || A.color = red. >

> Example stateful rule is : A is event.id =3, B is event.name = 'abc', C is 
> event.color = red and we are looking for pattern AB*C over time window of 1 
> hour.>

>

> Now we have tried to use flink CEP for this purpose and program crashed 
> because of lot of threads. The explanation is : every application of 
> CEP.pattern creates a new operator in the graph and flink can't support that 
> many vertices in a graph.>

>

> Other approach could be to use processFunction in flink, but still to run the 
> rules on events stream you'd have to use some sort of CEP or write your own.>

>

> My question is, does anybody have any other suggestions on how to achieve 
> this ? Any other CEPs that integrate and work better with flink (siddhi, 
> jasper, drools) ? Any experience would be helpful.>

>
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Bug with PojoSerializer? java.lang.IllegalArgumentException: Can not set final double field Event.rating to java.lang.Integer

2021-08-13 Thread Schwalbe Matthias
Good Morning Nathan,

The exception stack does not give enough information yet to come to a solution, 
the way I would continue is this:

  *   Given that you run in a local environment probably means that you could 
run your job in a debugger and
  *   Place an exception break point for java.lang.IllegalArgumentException
  *   Once after the 10^6 events you trap into the problem walk up the call 
stack to 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer#initializeFields and
  *   Try to find out why this PojoSerializer instance has some int serializer 
instead of a double serializer in the respective field-serializer index:

protected void initializeFields(T t) {
for (int i = 0; i < numFields; i++) {
if (fields[i] != null) {
try {
fields[i].set(t, fieldSerializers[i].createInstance()); //it 
happens all here
} catch (IllegalAccessException e) {
throw new RuntimeException("Cannot initialize fields.", e);
}
}
}
}

I hope you come closer to a solution if you poke a little around.

Feel free to get back to me for further clarifications.

Thias


From: Nathan Yu 
Sent: Freitag, 13. August 2021 05:27
To: JING ZHANG 
Cc: user@flink.apache.org
Subject: RE: Bug with PojoSerializer? java.lang.IllegalArgumentException: Can 
not set final double field Event.rating to java.lang.Integer

Yea, it goes through many events from this input before the exception is 
thrown. I don’t know how the input schema can change though, as the input is 
always producing objects from the same class.


From: JING ZHANG mailto:beyond1...@gmail.com>>
Sent: Thursday, August 12, 2021 10:22 PM
To: Nathan Yu mailto:nuonathan...@twosigma.com>>
Cc: user@flink.apache.org
Subject: Re: Bug with PojoSerializer? java.lang.IllegalArgumentException: Can 
not set final double field Event.rating to java.lang.Integer

Hi Yu,
The exception is thrown after processing some input data instead of at the 
beginning of the input, right?
Is there any possible that input schema has updated?

Nathan Yu mailto:nuonathan...@twosigma.com>> 
于2021年8月13日周五 上午8:38写道:

• Using local environment: 
StreamExecutionEnvironment.createLocalEnvironment()

•Event is a POJO class, with int, double, enum, and String fields

•Unfortunately it’s hard for me to reproduce in a small example, as it 
seems to occur after 10e6+ events.

•Using flink-core-1.12.4

Stack:
Caused by: java.lang.IllegalArgumentException: Can not set final double field 
com.twosigma.research.options.optticks.core.types.Event.askPrice to 
java.lang.Integer
at 
java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
at 
java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
at 
java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:58)
at 
java.base/jdk.internal.reflect.UnsafeQualifiedDoubleFieldAccessorImpl.set(UnsafeQualifiedDoubleFieldAccessorImpl.java:77)
at java.base/java.lang.reflect.Field.set(Field.java:780)
at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.initializeFields(PojoSerializer.java:205)
at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:388)
at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:409)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:191)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:46)
at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:53)
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


RE: Odd Serialization exception

2021-08-12 Thread Schwalbe Matthias
Good morning Daniel,

… so my guess was not the cause of your problem , anyway it seems like you 
always want to use your LogSessionProducer with Session?
In that case you could drop the generics from the class like this:

class LogSessionProducer(schema: SerializationSchema[Session], props: 
Properties)
  extends FlinkKinesisProducer[Session](schema, props) with LazyLogging {
  ...
  override def invoke(value: Session, context: SinkFunction.Context[_]): Unit = 
{
  ...

As to your assumption that the problems could be in your override def open() …
… I don’t see you invoke the super.open(…) function which would leave the 
producer only half initialized

Thias




From: Daniel Vol 
Sent: Donnerstag, 12. August 2021 08:01
To: Guowei Ma 
Cc: user 
Subject: Re: Odd Serialization exception

Hi Guowei,

I am running on EMR 5.32.0 with Flink 1.11.2

In meanwhile I did some tests and commented out part of the new code -

override def invoke(value: OUT, context: SinkFunction.Context[_]): Unit = {
try {
//  val session = value.asInstanceOf[Session]
//  sessionDuration = 17L //session.getSessionDuration
//  sessionSize = 19 //session.getSessionTotalEvents
  super.invoke(value, context)
  sessionsWritten.inc()
}
Though I still get Caused by: org.apache.flink.util.SerializedThrowable: null
So, my assumption is that something wrong with "override def open()" method

Thanks!

On Thu, Aug 12, 2021 at 8:44 AM Guowei Ma 
mailto:guowei@gmail.com>> wrote:
Hi, Daniel
Could you tell me the version of Flink you use? I want to look at the 
corresponding code.
Best,
Guowei


On Wed, Aug 11, 2021 at 11:23 PM Daniel Vol 
mailto:vold...@gmail.com>> wrote:
Hi Matthias,

First, thanks for a fast reply.
I am new to Flink, so probably I miss a lot in terms of flow and objects passed.

The motivation is to get internal data from the transferred OUT Object to send 
metrics. So I do downscale it but as per my perspective it is not forwarded 
(super called with original value) or expected to be used in later steps (this 
expected to be a local scope variable)
As I am suspect that you are right - can you point me to how can I get internal 
data from OUT without changing it or affecting next steps.
As well - when I create the object - I specify OUT type (which is Session):

val flinkKinesisProducer = new LogSessionProducer[Session](new 
KinesisEventSerializer[Session], producerConfig)
"… but of course I might be completely be mistaken due to incomplete 
information."
What kind of information can I supply?

Thanks a lot!

Daniel


On 11 Aug 2021, at 17:28, Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:

Hi Daniel,

On the first look there is one thing that catches my eye:
In line ‘val session = value.asInstanceOf[Session]' it looks like you are 
downcasting the event from OUT to Session.
In Flink this is a dangerous thing to do … DataStream[OUT] uses a specific 
serializer[OUT] to transport events from one operator to the next (or at least 
from one task to the next, if configured this way).
These serializers usually only understand one type, OUT in your case. Only in 
certain circumstances the java object (the event) is transported directly from 
one operator to the next.

I guess this is what happened, you serializer that only understands OUT can not 
cope with a Session object …

… but of course I might be completely be mistaken due to incomplete information.

I hope this helps 

Feel free to get back to me for clarifications (on the mailing list)

Cheers

Thias




From: Daniel Vol mailto:vold...@gmail.com>>
Sent: Mittwoch, 11. August 2021 14:47
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Odd Serialization exception

I started to get the following exception:

2021-08-11 09:45:30,299 [Window(EventTimeSessionWindows(180), 
EventTimeTrigger, SessionAggregator, PassThroughWindowFunction) -> Sink: 
Unnamed (1/8)] INFO  o.a.f.s.runtime.tasks.SubtaskCheckpointCoordinatorImpl  - 
Could not complete snapshot 134 for operator 
Window(EventTimeSessionWindows(180), EventTimeTrigger, SessionAggregator, 
PassThroughWindowFunction) -> Sink: Unnamed (1/8). Failure reason: Checkpoint 
was declined.
org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete 
snapshot 134 for operator Window(EventTimeSessionWindows(180), 
EventTimeTrigger, SessionAggregator, PassThroughWindowFunction) -> Sink: 
Unnamed (1/8). Failure reason: Checkpoint was declined.
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:215)
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:156)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:314)
at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.che

  1   2   >