from:"Piotr Nowojski"

Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-03-30 Thread Piotr Nowojski

Hey,

> 1. The Flink community agrees that we upgrade Kryo to a later version,
which means breaking all checkpoint/savepoint compatibility and releasing a
Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
dropped. This is probably the quickest way, but would still mean that we
expose Kryo in the Flink APIs, which is the main reason why we haven't been
able to upgrade Kryo at all.

This sounds pretty bad to me.

Has anyone looked into what it would take to provide a smooth migration
from Kryo2 -> Kryo5?

Best,
Piotrek

czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa 
napisał(a):

> Hi Martijn,
>
> just to be sure, if all state-related classes use a POJO serializer, Kryo
> will never come into play, right? Given FLINK-16686 [1], I wonder how many
> users actually have jobs with Kryo and RocksDB, but even if there aren't
> many, that still leaves those who don't use RocksDB for
> checkpoints/savepoints.
>
> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
> users choose between v2/v5 jars by separating them like log4j2 jars?
>
> [1] https://issues.apache.org/jira/browse/FLINK-16686
>
> Regards,
> Alexis.
>
> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
> martijnvis...@apache.org>:
>
>> Hi all,
>>
>> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
>> which I'm including in this discussion thread to avoid that it gets lost.
>>
>> From my perspective, there's two main ways to get to Java 17:
>>
>> 1. The Flink community agrees that we upgrade Kryo to a later version,
>> which means breaking all checkpoint/savepoint compatibility and releasing a
>> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
>> dropped. This is probably the quickest way, but would still mean that we
>> expose Kryo in the Flink APIs, which is the main reason why we haven't been
>> able to upgrade Kryo at all.
>> 2. There's a contributor who makes a contribution that bumps Kryo, but
>> either a) automagically reads in all old checkpoints/savepoints in using
>> Kryo v2 and writes them to new snapshots using Kryo v5 (like is mentioned
>> in the Kryo migration guide [2][3] or b) provides an offline tool that
>> allows users that are interested in migrating their snapshots manually
>> before starting from a newer version. That potentially could prevent the
>> need to introduce a new Flink major version. In both scenarios, ideally the
>> contributor would also help with avoiding the exposure of Kryo so that we
>> will be in a better shape in the future.
>>
>> It would be good to get the opinion of the community for either of these
>> two options, or potentially for another one that I haven't mentioned. If it
>> appears that there's an overall agreement on the direction, I would propose
>> that a FLIP gets created which describes the entire process.
>>
>> Looking forward to the thoughts of others, including the Users (therefore
>> including the User ML).
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]  https://lists.apache.org/thread/qcw8wy9dv8szxx9bh49nz7jnth22p1v2
>> [2] https://lists.apache.org/thread/gv49jfkhmbshxdvzzozh017ntkst3sgq
>> [3] https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5
>>
>> On Sun, Mar 19, 2023 at 8:16 AM Tamir Sagi 
>> wrote:
>>
>>> I agree, there are several options to mitigate the migration from v2 to
>>> v5.
>>> yet, Oracle roadmap is to end JDK 11 support in September this year.
>>>
>>>
>>>
>>> 
>>> From: ConradJam 
>>> Sent: Thursday, March 16, 2023 4:36 AM
>>> To: d...@flink.apache.org 
>>> Subject: Re: [Discussion] - Release major Flink version to support JDK
>>> 17 (LTS)
>>>
>>> EXTERNAL EMAIL
>>>
>>>
>>>
>>> Thanks for your start this discuss
>>>
>>>
>>> I have been tracking this problem for a long time, until I saw a
>>> conversation in ISSUSE a few days ago and learned that the Kryo version
>>> problem will affect the JDK17 compilation of snapshots [1] FLINK-24998 ,
>>>
>>> As @cherry said it ruined our whole effort towards JDK17
>>>
>>> I am in favor of providing an external tool to migrate from Kryo old
>>> version checkpoint to the new Kryo new checkpoint at one time (Maybe this
>>> tool start in flink 2.0 ?), does this tool currently have any plans or
>>> ideas worth discuss
>>>
>>>
>>> I think it should not be difficult to be compatible with JDK11 and JDK17.
>>> We should indeed abandon JDK8 in 2.0.0. It is also mentioned in the doc
>>> that it is marked as Deprecated [2]
>>>
>>>
>>> Here I add that we need to pay attention to the version of Scala and the
>>> version of JDK17
>>>
>>>
>>> [1] FLINK-24998  IGSEGV in Kryo / C2 CompilerThread on Java 17
>>> https://issues.apache.org/jira/browse/FLINK-24998
>>>
>>> [2] FLINK-30501 Update Flink build instruction to deprecate Java 8
>>> instead
>>> of requiring Java 11  https://issues.apache.org/jira/browse/FLINK-30501
>>>
>>> Tamir Sagi  于2023年3月16日周四 00:54写道：
>>>
>>> > Hey dev community,
>>> >
>>> > I'm writing

Re: Kafka transactions drastically limit usability of Flink savepoints

2022-11-17 Thread Piotr Nowojski

Hi Yordan,

Indeed it looks like a missing feature. Probably someone implementing the
new KafkaSink didn't realize how important this is. I've created a ticket
to work on this issue [1], but I don't know when or who could fix it.

I think a workaround might be to create a new `KafkaSink` instance that
will have a new, different operator uid, and simply drop/ignore the old
instance and its state (by using the `allowNonRestoredState` option [2]).

Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-30068
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#allowing-non-restored-state


śr., 16 lis 2022 o 11:36 Yordan Pavlov  napisał(a):

> Hi Piotr,
>
> the option you mention is applicable only for the deprecated
> KafkaProducer, is there an equivalent to the modern KafkaSink? I found
> this article comparing the behavior of the two:
>
> https://ververica.zendesk.com/hc/en-us/articles/360013269680-Best-Practices-for-Using-Kafka-Sources-Sinks-in-Flink-Jobs
>
> it suggests that the default behavior of KafkaSink would be: "The
> recovery continues with an ERROR message like the following is
> logged:", however this is not what I observe, instead the job fails. I
> am attaching the relevant part of the log. This error happens upon
> trying to recover from a one month old savepoint.
>
> Regards,
> Yordan
>
> On Tue, 15 Nov 2022 at 18:53, Piotr Nowojski  wrote:
> >
> > Hi Yordan,
> >
> > I don't understand where the problem is, why do you think savepoints are
> unusable? If you recover with `ignoreFailuresAfterTransactionTimeout`
> enabled, the current Flink behaviour shouldn't cause any problems (except
> for maybe some logged errors).
> >
> > Best,
> > Piotrek
> >
> > wt., 15 lis 2022 o 15:36 Yordan Pavlov 
> napisał(a):
> >>
> >> Hi,
> >> we are using Kafka savepoints as a recovery tool and want to store
> >> multiple ones for the past months. However as we use Kafka
> >> transactions for our KafkaSink this puts expiration time on our
> >> savepoints. We can use a savepoint only as old as our Kafka
> >> transaction timeout. The problem is explained in this issue:
> >> https://issues.apache.org/jira/browse/FLINK-16419
> >> the relative comment being this one:
> >> "FlinkKafkaProducer or KafkaSink do not know during recovery if they
> >> have to recover and commit or if it has already happened. Due to that,
> >> they are always attempting to recover and commit transactions during
> >> startup."
> >> I'm surprised that more people are not hitting this problem as this
> >> makes Savepoints pretty much unusable as a recovery mechanism.
>

Re: Kafka transactions drastically limit usability of Flink savepoints

2022-11-15 Thread Piotr Nowojski

Hi Yordan,

I don't understand where the problem is, why do you think savepoints are
unusable? If you recover with `ignoreFailuresAfterTransactionTimeout`
enabled, the current Flink behaviour shouldn't cause any problems (except
for maybe some logged errors).

Best,
Piotrek

wt., 15 lis 2022 o 15:36 Yordan Pavlov  napisał(a):

> Hi,
> we are using Kafka savepoints as a recovery tool and want to store
> multiple ones for the past months. However as we use Kafka
> transactions for our KafkaSink this puts expiration time on our
> savepoints. We can use a savepoint only as old as our Kafka
> transaction timeout. The problem is explained in this issue:
> https://issues.apache.org/jira/browse/FLINK-16419
> the relative comment being this one:
> "FlinkKafkaProducer or KafkaSink do not know during recovery if they
> have to recover and commit or if it has already happened. Due to that,
> they are always attempting to recover and commit transactions during
> startup."
> I'm surprised that more people are not hitting this problem as this
> makes Savepoints pretty much unusable as a recovery mechanism.
>

Re: Modify savepoints in Flink

2022-10-21 Thread Piotr Nowojski

Hi,

Yes and no. StateProcessor API can read any Flink state, but you have to
describe the state you want it to access. Take a look at the example in the
docs [1].

First you have an example of a theoretical production function
`StatefulFunctionWithTime`, which state you want to modify. Note the
`ValueState` and `ListState` fields and their descriptors. That's the state
of that particular function. Descriptors determine how the state is
serialised. Usually they are pretty simple.
Below is the `ReaderFunction`, that you want to use to access/modify the
state via the StateProcessor API. To do so, you have to specify the state
you want to access and effectively mimic/copy paste the state descriptors
from the production code.

If you want to modify the state of a source/sink function, you would have
to first take a look into the source code of such a connector to know what
to modify and copy its descriptors. Also note that for source/sink the
state is most likely non-keyed.

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/libs/state_processor_api/#keyed-state

pt., 21 paź 2022 o 14:37 Sriram Ganesh  napisał(a):

> I have question on this. Different connector can have different
> serialisation and de-serlisation technique right?. Wouldn't that impact?.
> If I use StateProcessor API, would that be agnostic to all the sources and
> sinks?.
>
> On Fri, Oct 21, 2022, 18:00 Piotr Nowojski  wrote:
>
>> ops
>>
>> > Alternatively, you can modify a code of your function/operator for
>> which you want to modify the state. For example in the
>> `org.apache.flink.streaming.api.checkpoint.CheckpointedFunction#initializeState`
>> method you could add some code that would do a migration of your old state
>> to a new one.
>> > And you can drop such code later, in the next savepoint.
>>
>> That was not entirely true. This would work for the non-keyed state. For
>> the keyed state there is no easy alternative (you would have to iterate
>> through all of the keys, which I think is not exposed via Public API) -
>> best to use StateProcessor API.
>>
>> Best,
>> Piotrek
>>
>> pt., 21 paź 2022 o 10:54 Sriram Ganesh  napisał(a):
>>
>>> Thanks !. Will try this.
>>>
>>> On Fri, Oct 21, 2022 at 2:22 PM Piotr Nowojski 
>>> wrote:
>>>
>>>> Hi Sriram,
>>>>
>>>> You can read and modify savepoints using StateProcessor API [1].
>>>>
>>>> Alternatively, you can modify a code of your function/operator for
>>>> which you want to modify the state. For example in the
>>>> `org.apache.flink.streaming.api.checkpoint.CheckpointedFunction#initializeState`
>>>> method you could add some code that would do a migration of your old state
>>>> to a new one.
>>>>
>>>> ```
>>>> private transient ValueState old;
>>>> private transient ValueState new;
>>>> (...)
>>>> initializeState(...) {
>>>>   (...)
>>>>   if (new.value() == null && old.value() != null) {
>>>> // code to migrate from old to new one
>>>> new.update(migrate(old.value());
>>>> old.update(null);
>>>>   }
>>>> }
>>>> ```
>>>>
>>>> And you can drop such code later, in the next savepoint.
>>>>
>>>> Best,
>>>> Piotrek
>>>>
>>>> [1]
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/libs/state_processor_api/
>>>>
>>>> pt., 21 paź 2022 o 10:05 Sriram Ganesh 
>>>> napisał(a):
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am working on a scenario where I need to modify the existing
>>>>> savepoint operator state. Ex: Wanted to remove some offset of the
>>>>> savepoint.
>>>>>
>>>>> What is the better practice for these scenarios?. Could you please
>>>>> help me with any example as such?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> --
>>>>> *Sriram G*
>>>>> *Tech*
>>>>>
>>>>>
>>>
>>> --
>>> *Sriram G*
>>> *Tech*
>>>
>>>

Re: Modify savepoints in Flink

2022-10-21 Thread Piotr Nowojski

ops

> Alternatively, you can modify a code of your function/operator for which
you want to modify the state. For example in the
`org.apache.flink.streaming.api.checkpoint.CheckpointedFunction#initializeState`
method you could add some code that would do a migration of your old state
to a new one.
> And you can drop such code later, in the next savepoint.

That was not entirely true. This would work for the non-keyed state. For
the keyed state there is no easy alternative (you would have to iterate
through all of the keys, which I think is not exposed via Public API) -
best to use StateProcessor API.

Best,
Piotrek

pt., 21 paź 2022 o 10:54 Sriram Ganesh  napisał(a):

> Thanks !. Will try this.
>
> On Fri, Oct 21, 2022 at 2:22 PM Piotr Nowojski 
> wrote:
>
>> Hi Sriram,
>>
>> You can read and modify savepoints using StateProcessor API [1].
>>
>> Alternatively, you can modify a code of your function/operator for which
>> you want to modify the state. For example in the
>> `org.apache.flink.streaming.api.checkpoint.CheckpointedFunction#initializeState`
>> method you could add some code that would do a migration of your old state
>> to a new one.
>>
>> ```
>> private transient ValueState old;
>> private transient ValueState new;
>> (...)
>> initializeState(...) {
>>   (...)
>>   if (new.value() == null && old.value() != null) {
>> // code to migrate from old to new one
>> new.update(migrate(old.value());
>> old.update(null);
>>   }
>> }
>> ```
>>
>> And you can drop such code later, in the next savepoint.
>>
>> Best,
>> Piotrek
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/libs/state_processor_api/
>>
>> pt., 21 paź 2022 o 10:05 Sriram Ganesh  napisał(a):
>>
>>> Hi All,
>>>
>>> I am working on a scenario where I need to modify the existing savepoint
>>> operator state. Ex: Wanted to remove some offset of the savepoint.
>>>
>>> What is the better practice for these scenarios?. Could you please help
>>> me with any example as such?
>>>
>>> Thanks in advance.
>>>
>>> --
>>> *Sriram G*
>>> *Tech*
>>>
>>>
>
> --
> *Sriram G*
> *Tech*
>
>

Re: Modify savepoints in Flink

2022-10-21 Thread Piotr Nowojski

Hi Sriram,

You can read and modify savepoints using StateProcessor API [1].

Alternatively, you can modify a code of your function/operator for which
you want to modify the state. For example in the
`org.apache.flink.streaming.api.checkpoint.CheckpointedFunction#initializeState`
method you could add some code that would do a migration of your old state
to a new one.

```
private transient ValueState old;
private transient ValueState new;
(...)
initializeState(...) {
  (...)
  if (new.value() == null && old.value() != null) {
// code to migrate from old to new one
new.update(migrate(old.value());
old.update(null);
  }
}
```

And you can drop such code later, in the next savepoint.

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/libs/state_processor_api/

pt., 21 paź 2022 o 10:05 Sriram Ganesh  napisał(a):

> Hi All,
>
> I am working on a scenario where I need to modify the existing savepoint
> operator state. Ex: Wanted to remove some offset of the savepoint.
>
> What is the better practice for these scenarios?. Could you please help me
> with any example as such?
>
> Thanks in advance.
>
> --
> *Sriram G*
> *Tech*
>
>

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread Piotr Nowojski

Hi Xintong,

I'm not sure if slack is the right tool for the job. IMO it works great as
an adhoc tool for discussion between developers, but it's not searchable
and it's not persistent. Between devs, it works fine, as long as the result
of the ad hoc discussions is backported to JIRA/mailing list/design doc.
For users, that simply would be extremely difficult to achieve. In the
result, I would be afraid we are answering the same questions over, and
over and over again, without even a way to provide a link to the previous
thread, because nobody can search for it .

I'm +1 for having an open and shared slack space/channel for the
contributors, but I think I would be -1 for such channels for the users.

For users, I would prefer to focus more on, for example, stackoverflow.
With upvoting, clever sorting of the answers (not the oldest/newest at top)
it's easily searchable - those features make it fit our use case much
better IMO.

Best,
Piotrek



pt., 6 maj 2022 o 11:08 Xintong Song  napisał(a):

> Thank you~
>
> Xintong Song
>
>
>
> -- Forwarded message -
> From: Xintong Song 
> Date: Fri, May 6, 2022 at 5:07 PM
> Subject: Re: [Discuss] Creating an Apache Flink slack workspace
> To: private 
> Cc: Chesnay Schepler 
>
>
> Hi Chesnay,
>
> Correct me if I'm wrong, I don't find this is *repeatedly* discussed on the
> ML. The only discussions I find are [1] & [2], which are 4 years ago. On
> the other hand, I do find many users are asking questions about whether
> Slack should be supported [2][3][4]. Besides, I also find a recent
> discussion thread from ComDev [5], where alternative communication channels
> are being discussed. It seems to me ASF is quite open to having such
> additional channels and they have been worked well for many projects
> already.
>
> I see two reasons for brining this discussion again:
> 1. There are indeed many things that have change during the past 4 years.
> We have more contributors, including committers and PMC members, and even
> more users from various organizations and timezones. That also means more
> discussions and Q are happening.
> 2. The proposal here is different from the previous discussion. Instead of
> maintaining a channel for Flink in the ASF workspace, here we are proposing
> to create a dedicated Apache Flink slack workspace. And instead of *moving*
> the discussion to Slack, we are proposing to add a Slack Workspace as an
> addition to the ML.
>
> Below is your opinions that I found from your previous -1 [1]. IIUR, these
> are all about the using ASF Slack Workspace. If I overlooked anything,
> please let me know.
>
> > 1. According to INFRA-14292 <
> > https://issues.apache.org/jira/browse/INFRA-14292> the ASF Slack isn't
> > run by the ASF. This alone puts this service into rather questionable
> > territory as it /looks/ like an official ASF service. If anyone can
> provide
> > information to the contrary, please do so.
>
> 2. We already discuss things on the mailing lists, JIRA and GitHub. All of
> > these are available to the public, whereas the slack channel requires an
> > @apache mail address, i.e. you have to be a committer. This minimizes the
> > target audience rather significantly. I would much rather prefer
> something
> > that is also available to contributors.
>
>
> I do agree this should be decided by the whole community. I'll forward this
> to dev@ and user@ ML.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://lists.apache.org/thread/gxwv49ssq82g06dbhy339x6rdxtlcv3d
> [2] https://lists.apache.org/thread/kcym1sozkrtwxw1fjbnwk1nqrrlzolcc
> [3] https://lists.apache.org/thread/7rmd3ov6sv3wwhflp97n4czz25hvmqm6
> [4] https://lists.apache.org/thread/n5y1kzv50bkkbl3ys494dglyxl45bmts
> [5] https://lists.apache.org/thread/fzwd3lj0x53hkq3od5ot0y719dn3kj1j
>
> On Fri, May 6, 2022 at 3:05 PM Chesnay Schepler 
> wrote:
>
> > This has been repeatedly discussed on the ML over the years and was
> > rejected every time.
> >
> > I don't see that anything has changed that would invalidate the
> previously
> > raised arguments against it, so I'm still -1 on it.
> >
> > This is also not something the PMC should decide anyway, but the project
> > as a whole.
> >
> > On 06/05/2022 06:48, Jark Wu wrote:
> >
> > Thank Xintong, for starting this exciting topic.
> >
> > I think Slack would be an essential addition to the mailing list.
> > I have talked with some Flink users, and they are surprised
> > Flink doesn't have Slack yet, and they would love to use Slack.
> > We can also see a trend that new open-source communities
> > are using Slack as the community base camp.
> >
> > Slack is also helpful for brainstorming and asking people for opinions
> and
> > use cases.
> > I think Slack is not only another place for Q but also a connection to
> > the Flink users.
> > We can create more channels to make the community have more social
> > attributes, for example,
> >  - Share ideas, projects, integrations, articles, and presentations
> > related to Flink in

Re: Avro deserialization issue

2022-04-13 Thread Piotr Nowojski

Hey,

Could you be more specific about how it is not working? A compiler error
that there is no such class as RuntimeContextInitializationContextAdapters?
This class has been introduced in Flink 1.12 in FLINK-18363 [1]. I don't
know this code and I also don't know where it's documented, but:
a) maybe you should just mimic in reverse the changes done in the pull
request from this issue [2]? `deserializer.open(() ->
getRuntimeContext().getMetricGroup().addGroup("something"))`?
b) RuntimeContextInitializationContextAdapters is `@Internal` class that is
not part of a Public API, so even in 1.13.x you should be using it. You
should probably just implement your
own DeserializationSchema.InitializationContext.

Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-18363
[2] https://github.com/apache/flink/pull/13844/files

pon., 11 kwi 2022 o 15:42 Anitha Thankappan 
napisał(a):

>
> Hi,
>
> I developed a flink connector to read data from bigquery . The Bigquery
> read rows are in AVRO format.
> I tried it with 1.13.1 its working fine. But my requirement is 1.11.0, in
> that case the code:
> deserializer.open(RuntimeContextInitializationContextAdapters.deserializationAdapter(getRuntimeContext())
> is not working.
>
> What could be the alternative for this in 1.11.0?
>
> Thanks and Regards,
> Anitha Thankappan
>
> *This message contains information that may be privileged or confidential
> and is the property of the Quantiphi Inc and/or its affiliates**. It is
> intended only for the person to whom it is addressed. **If you are not
> the intended recipient, any review, dissemination, distribution, copying,
> storage or other use of all or any portion of this message is strictly
> prohibited. If you received this message in error, please immediately
> notify the sender by reply e-mail and delete this message in its *
> *entirety*
>

Re: Low Watermark

2022-02-25 Thread Piotr Nowojski

Hi,

It's the minimal watermark among all 10 parallel instances of that Task.

Using metric (currentInputWatermark) [1] you can access the watermark of
each of those 10 sub tasks individually.

Best,
Piotrek

[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/

pt., 25 lut 2022 o 14:10 Isidoros Ioannou  napisał(a):

> Hello, could someone please explain what the Low Watermark indicates in
> the Flink UI in the attached image?
> I have event time enabled with a boundOutOfOrdernessStrategy of 3s for the
> incoming events and I use CEP with a within window of 5 minutes.
>

Re: Basic questions about resuming stateful Flink jobs

2022-02-17 Thread Piotr Nowojski

> PipelineOptions once you build that object from args. I've never used the
> Flink libs, just the runner, but from [1] and [3] it looks like you can
> configure things in code if you prefer that.
>
> Hope it helps,
> Cristian
>
> [1] https://beam.apache.org/documentation/runners/flink/
> [2]
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/task_failure_recovery/
> [3]
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/savepoints/#configuration
>
>
> On Wed, Feb 16, 2022 at 12:28 PM Sandys-Lumsdaine, James <
> james.sandys-lumsda...@systematica.com> wrote:
>
>> Thanks for your reply, Piotr.
>>
>>
>>
>> Some follow on questions:
>>
>> >". Nevertheless you might consider enabling them as this allows you to
>> manually cancel the job if it enters an endless recovery/failure loop, fix
>> the underlying issue, and restart the job from the externalised checkpoint.
>>
>>
>>
>> How is this done? Are you saying the retained checkpoint (i.e. the last
>> checkpoint that isn’t deleted) can somehow be used when restarting the
>> Flink application? If I am running in my IDE and just using the local
>> streaming environment, how can I test my recovery code either with a
>> retained checkpoint? All my attempts so far just say “No checkpoint found
>> during restore.” Do I copy the checkpoint into a savepoint directory and
>> treat it like a savepoint?
>>
>>
>>
>> On the topic of savepoints, that web page [1] says I need to use
>> “bin/flink savepoint” or “bin/flink stop --savepointPath” – but again, if
>> I’m currently not running in a real cluster how else can I create and
>> recover from the save points?
>>
>>
>>
>> From what I’ve read there is state, checkpoints and save points – all of
>> them hold state - and currently I can’t get any of these to restore when
>> developing in an IDE and the program builds up all state from scratch. So
>> what else do I need to do in my Java code to tell Flink to load a savepoint?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> James.
>>
>>
>>
>>
>>
>> *From:* Piotr Nowojski 
>> *Sent:* 16 February 2022 16:36
>> *To:* James Sandys-Lumsdaine 
>> *Cc:* user@flink.apache.org
>> *Subject:* Re: Basic questions about resuming stateful Flink jobs
>>
>>
>>
>> *CAUTION: External email. The email originated outside of our company *
>>
>> Hi James,
>>
>>
>>
>> Sure! The basic idea of checkpoints is that they are fully owned by the
>> running job and used for failure recovery. Thus by default if you stopped
>> the job, checkpoints are being removed. If you want to stop a job and then
>> later resume working from the same point that it has previously stopped,
>> you most likely want to use savepoints [1]. You can stop the job with a
>> savepoint and later you can restart another job from that savepoint.
>>
>>
>>
>> Regarding the externalised checkpoints. Technically you could use them in
>> the similar way, but there is no command like "take a checkpoint and stop
>> the job". Nevertheless you might consider enabling them as this allows you
>> to manually cancel the job if it enters an endless recovery/failure
>> loop, fix the underlying issue, and restart the job from the externalised
>> checkpoint.
>>
>>
>>
>> Best,
>>
>> Piotrek
>>
>>
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/savepoints/
>>
>>
>>
>> śr., 16 lut 2022 o 16:44 James Sandys-Lumsdaine 
>> napisał(a):
>>
>> Hi all,
>>
>>
>>
>> I have a 1.14 Flink streaming workflow with many stateful functions that
>> has a FsStateBackend and checkpointed enabled, although I haven't set a
>> location for the checkpointed state.
>>
>>
>>
>> I've really struggled to understand how I can stop my Flink job and
>> restart it and ensure it carries off exactly where is left off by using the
>> state or checkpoints or savepoints. This is not clearly explained in the
>> book or the web documentation.
>>
>>
>>
>> Since I have no control over my Flink job id I assume I can not force
>> Flink to pick up the state recorded under the jobId directory for the
>> FsStateBackend. Therefore I *think* Flink should read back in the last
>> checkpointed data but I don't understand how to force my program to read
>> this in? Do I use retained checkpoints o

Re: Basic questions about resuming stateful Flink jobs

2022-02-16 Thread Piotr Nowojski

Hi James,

Sure! The basic idea of checkpoints is that they are fully owned by the
running job and used for failure recovery. Thus by default if you stopped
the job, checkpoints are being removed. If you want to stop a job and then
later resume working from the same point that it has previously stopped,
you most likely want to use savepoints [1]. You can stop the job with a
savepoint and later you can restart another job from that savepoint.

Regarding the externalised checkpoints. Technically you could use them in
the similar way, but there is no command like "take a checkpoint and stop
the job". Nevertheless you might consider enabling them as this allows you
to manually cancel the job if it enters an endless recovery/failure
loop, fix the underlying issue, and restart the job from the externalised
checkpoint.

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/savepoints/

śr., 16 lut 2022 o 16:44 James Sandys-Lumsdaine 
napisał(a):

> Hi all,
>
> I have a 1.14 Flink streaming workflow with many stateful functions that
> has a FsStateBackend and checkpointed enabled, although I haven't set a
> location for the checkpointed state.
>
> I've really struggled to understand how I can stop my Flink job and
> restart it and ensure it carries off exactly where is left off by using the
> state or checkpoints or savepoints. This is not clearly explained in the
> book or the web documentation.
>
> Since I have no control over my Flink job id I assume I can not force
> Flink to pick up the state recorded under the jobId directory for the
> FsStateBackend. Therefore I *think* Flink should read back in the last
> checkpointed data but I don't understand how to force my program to read
> this in? Do I use retained checkpoints or not? How can I force my program
> either use the last checkpointed state (e.g. when running from my IDE,
> starting and stopping the program) or maybe force it *not *to read in the
> state and start completely fresh?
>
> The web documentation talks about bin/flink but I am running from my IDE
> so I want my Java code to control this progress using the Flink API in Java.
>
> Can anyone give me some basic pointers as I'm obviously missing something
> fundamental on how to allow my program to be stopped and started without
> losing all the state.
>
> Many thanks,
>
> James.
>
>

Re: getting "original" ingestion timestamp after using a TimestampAssigner

2022-02-16 Thread Piotr Nowojski

Hi Frank,

I'm not sure exactly what you are trying to accomplish, but yes. In
the TimestampAssigner you can only return what should be the new timestamp
for the given record.

If you want to use "ingestion time" - "true even time"  as some kind of
delay metric, you will indeed need to have both of them calculated
somewhere. You could:
1. As you described, use first ingestion time assigner, a mapper function
to extract this to a separate field, re assign the true event time, and
calculate the delay
2. Or you could simply assign the correct event time and in a simple single
mapper, chained directly to the source, use for example
`System.currentTimeMillis() - eventTime` to calculate this delay in a
single step. After all, that's more or less what Flink is doing to
calculate the ingestion time [1]

Best, Piotrek

[1]
https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-core/src/main/java/org/apache/flink/api/common/eventtime/IngestionTimeAssigner.java

śr., 16 lut 2022 o 09:46 Frank Dekervel  napisał(a):

> Hello,
>
> I'm getting messages from a kafka stream. The messages are JSON records
> with a "timestamp" key in the json. This timestamp key contains the time
> at which the message was generated. Now i'd like if these messages had a
> delivery delay (eg delay between message generation and arrival in
> kafka). So i don't want to have the "full" delay (eg difference between
> generation time and processing time), just de delivery delay.
>
> In my timestamp assigner i get a "long" with the original timestamp as
> an argument, but i cannot yield an updated record from the timestamp
> assigner (eg with an extra field "deliveryDelay" or so).
>
> So i guess my only option is to not specify the timestamp/watermark
> extractor in the env.fromSource, then first mapping the stream to add a
> lateness field and only after that reassign timestamps/watermarks ... is
> that right ?
>
> Thanks!
>
> Greetings,
> Frank
>
>
>
>

Re: Python Function for Datastream Transformation in Flink Java Job

2022-02-16 Thread Piotr Nowojski

Hi,

As far as I can tell the answer is unfortunately no. With Table API (SQL)
things are much simpler, as you have a restricted number of types of
columns that you need to support and you don't need to support arbitrary
Java classes as the records.

I'm shooting blindly here, but maybe you can use your Python UDF in Table
API and then convert a Table to DataStream? [1]

Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/data_stream_api/

śr., 16 lut 2022 o 09:46 Jesry Pandawa 
napisał(a):

> Hello,
>
> Currently, Flink already supports adding Python UDF and using that on
> Flink Java job. It can be used on Table API. Can we do the same for
> creating custom python function for Datastream transformation and use that
> on Flink Java job?
>
> Regards,
>
> Jesry
>

Re: Job manager slots are in bad state.

2022-02-16 Thread Piotr Nowojski

Hi Josson,

Would you be able to reproduce this issue on a more recent version of
Flink? I'm afraid that we won't be able to help with this issue as this
affects a Flink version that is not supported for quite some time and
moreover `SlotSharingManager` has been completed removed in Flink 1.13.

Can you upgrade to a more recent Flink version and try it out? I would
assume the bug should be gone in 1.13.x or 1.14.x branches. If not, you can
also try out Flink 1.11.4, as maybe it has fixed this issue as well.

Best,
Piotrek

śr., 16 lut 2022 o 08:16 Josson Paul  napisał(a):

> We are using Flink version 1.11.2.
> At times if task managers are restarted for some reason, the job managers
> throw the exception that I attached here. It is an illegal state exception.
> We never had this issue with Flink 1.8. It started happening after
> upgrading to Flink 1.11.2.
>
> Why are the slots not released if it is in a bad state?. The issue doesn't
> get resolved even if I restart all the task managers. It will get resolved
> only if I restart Job manager.
>
> java.util.concurrent.CompletionException: java.util.concurrent.
> CompletionException: java.lang.IllegalStateException
> at org.apache.flink.runtime.jobmaster.slotpool.
> SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:433)
> at java.base/java.util.concurrent.CompletableFuture.uniHandle(
> CompletableFuture.java:930)
> at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(
> CompletableFuture.java:907)
> at java.base/java.util.concurrent.CompletableFuture.postComplete(
> CompletableFuture.java:506)
> at java.base/java.util.concurrent.CompletableFuture
> .completeExceptionally(CompletableFuture.java:2088)
> at org.apache.flink.runtime.concurrent.FutureUtils
> .lambda$forwardTo$21(FutureUtils.java:1132)
> at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(
> CompletableFuture.java:859)
> at java.base/java.util.concurrent.CompletableFuture
> .uniWhenCompleteStage(CompletableFuture.java:883)
> at java.base/java.util.concurrent.CompletableFuture.whenComplete(
> CompletableFuture.java:2251)
> at org.apache.flink.runtime.concurrent.FutureUtils.forward(FutureUtils
> .java:1100)
> at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager
> .createRootSlot(SlotSharingManager.java:155)
> at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl
> .allocateMultiTaskSlot(SchedulerImpl.java:477)
> at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl
> .allocateSharedSlot(SchedulerImpl.java:311)
> at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl
> .internalAllocateSlot(SchedulerImpl.java:160)
> at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl
> .allocateSlotInternal(SchedulerImpl.java:143)
> at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl
> .allocateSlot(SchedulerImpl.java:113)
> at org.apache.flink.runtime.executiongraph.
> SlotProviderStrategy$NormalSlotProviderStrategy.allocateSlot(
> SlotProviderStrategy.java:115)
> at org.apache.flink.runtime.scheduler.DefaultExecutionSlotAllocator
> .lambda$allocateSlotsFor$0(DefaultExecutionSlotAllocator.java:104)
> at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(
> CompletableFuture.java:1106)
> at java.base/java.util.concurrent.CompletableFuture.thenCompose(
> CompletableFuture.java:2235)
> at org.apache.flink.runtime.scheduler.DefaultExecutionSlotAllocator
> .allocateSlotsFor(DefaultExecutionSlotAllocator.java:102)
> at org.apache.flink.runtime.scheduler.DefaultScheduler.allocateSlots(
> DefaultScheduler.java:339)
> at org.apache.flink.runtime.scheduler.DefaultScheduler
> .allocateSlotsAndDeploy(DefaultScheduler.java:312)
> at org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy
> .allocateSlotsAndDeploy(EagerSchedulingStrategy.java:76)
> at org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy
> .restartTasks(EagerSchedulingStrategy.java:57)
> at org.apache.flink.runtime.scheduler.DefaultScheduler
> .lambda$restartTasks$2(DefaultScheduler.java:265)
> at java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(
> CompletableFuture.java:783)
> at java.base/java.util.concurrent.CompletableFuture$Completion.run(
> CompletableFuture.java:478)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(
> AkkaRpcActor.java:402)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(
> AkkaRpcActor.java:195)
> at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor
> .handleRpcMessage(FencedAkkaRpcActor.java:74)
> at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(
> AkkaRpcActor.java:152)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> at

Re: Performance Issues in Source Operator while migrating to Flink-1.14 from 1.9

2022-02-16 Thread Piotr Nowojski

Hi,

Unfortunately the new KafkaSource was contributed without good benchmarks,
and so far you are the first one that noticed and reported this issue.
Without more direct comparison (as Martijn suggested), it's hard for us to
help right away. It would be a tremendous help for us if you could for
example provide us steps to reproduce this exact issue? Another thing that
you could do, is to attach some code profiler to both Flink 1.9 and 1.14
versions and compare the results of source task threads from both (Flink
task threads are named after the task name, so they are easy to
distinguish).

Also have you observed some degradation in metrics reported by Flink? Like
the records processing rate between those two versions?

Best,
Piotrek

śr., 16 lut 2022 o 13:24 Arujit Pradhan 
napisał(a):

> Hey Martijn,
>
> Thanks a lot for getting back to us. To give you a little bit more
> context, we do maintain an open-source project around flink dagger
>  which is a wrapper for proto processing.
> As part of the upgrade to the latest version, we did some refactoring and
> moved to KafkaSource since the older FlinkKafkaConsumer was getting
> deprecated.
>
> So we currently do not have any set up to test the hypothesis. Also just
> increasing the resources by a bit fixes it and it does happen with a small
> set of jobs during high traffic.
>
> We would love to get some input from the community as it might cause
> errors in some of the jobs in production.
>
> Thanks and regards,
> //arujit
>
> On Tue, Feb 15, 2022 at 8:48 PM Martijn Visser 
> wrote:
>
>> Hi Arujit,
>>
>> I'm also looping in some contributors from the connector and runtime
>> perspective in this thread. Did you also test the upgrade first by only
>> upgrading to Flink 1.14 and keeping the FlinkKafkaConsumer? That would
>> offer a better way to determine if a regression is caused by the upgrade of
>> Flink or because of the change in connector.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>>
>>
>> On Tue, 15 Feb 2022 at 13:07, Arujit Pradhan 
>> wrote:
>>
>>> Hey team,
>>>
>>> We are migrating our Flink codebase and a bunch of jobs from Flink-1.9
>>> to Flink-1.14. To ensure uniformity in performance we ran a bunch of jobs
>>> for a week both in 1.9 and 1.14 simultaneously with the same resources and
>>> configurations and monitored them.
>>>
>>> Though most of the jobs are running fine, we have significant
>>> performance degradation in some of the high throughput jobs during peak
>>> hours. As a result, we can see high lag and data drops while processing
>>> messages from Kafka in some of the jobs in 1.14 while in 1.9 they are
>>> working just fine.
>>> Now we are debugging and trying to understand the potential reason for
>>> it.
>>>
>>> One of the hypotheses that we can think of is the change in the sequence
>>> of processing in the source-operator. To explain this, adding screenshots
>>> for the problematic tasks below.
>>> The first one is for 1.14 and the second is for 1.9. Upon inspection, it
>>> can be seen the sequence of processing 1.14 is -
>>>
>>> data_streams_0 -> Timestamps/Watermarks -> Filter -> Select.
>>>
>>> While in 1.9 it was,
>>>
>>> data_streams_0 -> Filter -> Timestamps/Watermarks -> Select.
>>>
>>> In 1.14 we are using KafkaSource API while in the older version it was
>>> FlinkKafkaConsumer API. Wanted to understand if it can cause potential
>>> performance decline as all other configurations/resources for both of the
>>> jobs are identical and if so then how to avoid it. Also, we can not see any
>>> unusual behaviour for the CPU/Memory while monitoring the affected jobs.
>>>
>>> Source Operator in 1.14 :
>>> [image: image.png]
>>> Source Operator in 1.9 :
>>> [image: image.png]
>>> Thanks in advance,
>>> //arujit
>>>
>>>
>>>
>>>
>>>
>>>
>>>

Re: Buffering when connecting streams

2022-01-18 Thread Piotr Nowojski

Hi Alexis,

I believe you should be able to use the `ConnectedStreams#transform()`
method.

Best, Piotrek

wt., 18 sty 2022 o 14:20 Alexis Sarda-Espinosa <
alexis.sarda-espin...@microfocus.com> napisał(a):

> Hi again everyone,
>
>
>
> It’s been a while, so first of all happy new year :)
>
>
>
> I was revisiting this discussion and started looking at the code. However,
> it seems that all of the overloads of ConnectedStreams#process expect a
> CoProcessFunction or the Keyed counterpart, so I don’t think I can inject a
> custom TwoInputStreamOperator.
>
>
>
> After a quick glance at the joining documentation, I wonder if I could
> accomplish what I want with a window/interval join of streams. If so, I
> might be able to avoid using state in the join function, but if I can’t
> avoid it, is it possible to use managed state in a (Process)JoinFunction?
> The join needs keys, but I don’t know if the resulting stream counts as
> keyed from the state’s point of view.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Piotr Nowojski 
> *Sent:* Montag, 6. Dezember 2021 08:43
> *To:* David Morávek 
> *Cc:* Alexis Sarda-Espinosa ;
> user@flink.apache.org
> *Subject:* Re: Buffering when connecting streams
>
>
>
> Hi Alexis and David,
>
>
>
> This actually can not happen. There are mechanisms in the code to make
> sure none of the input is starved IF there is some data to be read.
>
>
>
> The only time when input can be blocked is during the alignment phase of
> aligned checkpoints under back pressure. If there was a back pressure in
> your job it could have easily happened that checkpoint barriers would flow
> through the job graph to the CoProcessKeyedCoProcessFunction on one of the
> paths much quicker then the other, causing this faster path to be blocked
> until the other side catched up. But that would happen only during the
> alignment phase of the checkpoint, so without a backpressure for a very
> short period of time.
>
>
>
> Piotrek
>
>
>
> czw., 2 gru 2021 o 18:23 David Morávek  napisał(a):
>
> I think this could happen, but I have a very limited knowledge about how
> the input gates work internally. @Piotr could definitely provide some more
> insight here.
>
>
>
> D.
>
>
>
> On Thu, Dec 2, 2021 at 5:54 PM Alexis Sarda-Espinosa <
> alexis.sarda-espin...@microfocus.com> wrote:
>
> I do have some logic with timers today, but it’s indeed not ideal. I guess
> I’ll have a look at TwoInputStreamOperator, but I do have related
> questions. You mentioned a sample scenario of "processing backlog" where
> windows fire very quickly; could it happen that, in such a situation, the
> framework calls the operator’s processElement1 continuously (even for
> several minutes) before calling processElement2 a single time? How does the
> framework decide when to switch the stream processing when the streams are
> connected?
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* David Morávek 
> *Sent:* Donnerstag, 2. Dezember 2021 17:18
> *To:* Alexis Sarda-Espinosa 
> *Cc:* user@flink.apache.org
> *Subject:* Re: Buffering when connecting streams
>
>
>
> Even with the TwoInputStreamOperator you can not "halt" the processing.
> You need to buffer these elements for example in the ListState for later
> processing. At the time the watermark of the second stream arrives, you can
> process all buffered elements that satisfy the condition.
>
>
>
> You could probably also implement a similar (less optimized) solution with
> KeyedCoProcessFunction using event time timers.
>
>
>
> Best,
>
> D.
>
>
>
> On Thu, Dec 2, 2021 at 5:12 PM Alexis Sarda-Espinosa <
> alexis.sarda-espin...@microfocus.com> wrote:
>
> Yes, that sounds right, but with my current KeyedCoProcessFunction I can’t
> tell Flink to "halt" processElement1 and switch to the other stream
> depending on watermarks. I could look into TwoInputStreamOperator if you
> think that’s the best approach.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* David Morávek 
> *Sent:* Donnerstag, 2. Dezember 2021 16:59
> *To:* Alexis Sarda-Espinosa 
> *Cc:* user@flink.apache.org
> *Subject:* Re: Buffering when connecting streams
>
>
>
> I think this would require using lower level API and implementing a custom
> `TwoInputStreamOperator`. Then you can hook to `processWatemark{1,2}`
> methods.
>
>
>
> Let's also make sure we're on the same page on what the watermark is. You
> can think of the watermark as event time clock. It basically gives you an
> information, that *no more events with timestamp lower

Re: unaligned checkpoint for job with large start delay

2022-01-11 Thread Piotr Nowojski

Hi Thias and Mason,

> state-backend-rocksdb-metrics-estimate-num-keys

Indeed that can be a good indicator. However keep in mind that, depending
on your logic, there might be many existing windows for each key.

>  However, it’s not so clear how to count the windows that have been
registered since the window assigner does not expose the run time
context—is this even the right place to count?

Yes, I think you are unfortunately right. I've looked at the code, and it
wouldn't be even that easy to add such a metric. Sorry for misleading you.
But a spike in triggered windows is astrong indication that they were
triggered all at once.

> Perhaps, it can be an opt in feature? I do it see it being really useful
since most users aren’t really familiar with windows and these metrics can
help easily identify the common problem of too many windows firing.
> The additional metrics certainly help in diagnosing some of the symptoms
of the root problem.

I will think about how to solve it. I would be against an opt in metric, as
it would complicate code and configuration for the users while barely
anyone would use it.

Note that huge checkpoint start delay with unaligned checkpoints already
confirms that the system has been blocked by something. As I mentioned
before, there are a number of reasons why: record size larger than buffer
size, flatMap functions/operators multiplying number of records, large
number of timers fired at once. Summing up everything that you have
reported so far, we ruled out the former two options, and spike in the
number of triggered windows almost confirms that this is the issue at hand.

Best,
Piotrek

śr., 12 sty 2022 o 08:32 Schwalbe Matthias 
napisał(a):

> Hi Mason,
>
>
>
> Since you are using RocksDB, you could enable this metric [1]
> state-backend-rocksdb-metrics-estimate-num-keys which gives (afaik) good
> indication of the number of active windows.
>
> I’ve never seen (despite the warning) negative effect on the runtime.
>
>
>
> Hope this help …
>
>
>
> Thias
>
>
>
>
>
>
>
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/config/#state-backend-rocksdb-metrics-estimate-num-keys
>
>
>
> *From:* Mason Chen 
> *Sent:* Dienstag, 11. Januar 2022 19:20
> *To:* Piotr Nowojski 
> *Cc:* Mason Chen ; user 
> *Subject:* Re: unaligned checkpoint for job with large start delay
>
>
>
> Hi Piotrek,
>
>
>
> No worries—I hope you had a good break.
>
>
>
> Counting how many windows have been registered/fired and plotting that
> over time.
>
> It’s straightforward to count windows that are fired (the trigger exposes
> the run time context and we can collect the information in that code path).
> However, it’s not so clear how to count the windows that have been
> registered since the window assigner does not expose the run time
> context—is this even the right place to count? It’s not necessarily the
> case that an assignment results in a new window registered. Am I missing
> anything else relevant from the user facing interface perspective?
>
>
>
>  Unfortunately at the moment I don't know how to implement such a metric
> without affecting performance on the critical path, so I don't see this
> happening soon :(
>
> Perhaps, it can be an opt in feature? I do it see it being really useful
> since most users aren’t really familiar with windows and these metrics can
> help easily identify the common problem of too many windows firing.
>
>
>
> The additional metrics certainly help in diagnosing some of the symptoms
> of the root problem.
>
>
>
> Best,
>
> Mason
>
>
>
> On Jan 10, 2022, at 1:00 AM, Piotr Nowojski  wrote:
>
>
>
> Hi Mason,
>
>
>
> Sorry for a late reply, but I was OoO.
>
>
>
> I think you could confirm it with more custom metrics. Counting how many
> windows have been registered/fired and plotting that over time.
>
>
>
> I think it would be more helpful in this case to check how long a task has
> been blocked being "busy" processing for example timers. FLINK-25414 shows
> only blocked on being hard/soft backpressure. Unfortunately at the moment I
> don't know how to implement such a metric without affecting performance on
> the critical path, so I don't see this happening soon :(
>
>
>
> Best,
>
> Piotrek
>
>
>
> wt., 4 sty 2022 o 18:02 Mason Chen  napisał(a):
>
> Hi Piotrek,
>
>
>
> In other words, something (presumably a watermark) has fired more than 151
> 200 windows at once, which is taking ~1h 10minutes to process and during
> this time the checkpoint can not make any progress. Is this number of
> triggered windows plausible in your scenario?
>
>
>
> It see

Re: RichMapFunction to convert tuple of strings to DataStream[(String,String)]

2022-01-10 Thread Piotr Nowojski

Glad to hear it.

Best,
Piotrek

pon., 10 sty 2022 o 20:08 Siddhesh Kalgaonkar 
napisał(a):

> Hi Piotr,
>
> Thanks for the reply. I was looking for how to create a DataStream under a
> process function since using that I had to call something else but I came
> across one of Fabian's posts where he mentioned that this way of creating
> DS is not "encouraged and tested". So, I figured out an alternate way of
> using side output and now I can do what I was aiming for.
>
> Thanks,
> Sid.
>
> On Mon, Jan 10, 2022 at 5:29 PM Piotr Nowojski 
> wrote:
>
>> Hi Sid,
>>
>> I don't see on the stackoverflow explanation of what are you trying to do
>> here (no mentions of MapFunction or a tuple).
>>
>> If you want to create a `DataStream` from some a pre
>> existing/static Tuple of Strings, the easiest thing would be to convert the
>> tuple to a collection/iterator and use
>> `StreamExecutionEnvironment#fromCollection(...)`.
>> If you already have a `DataStream>` (for example your
>> source produces a tuple) and you want to flatten it to
>> `DataStream`, then you need a simple
>> `FlatMapFunction, String>` (or
>> `RichFlatMapFunction, String>`), that would do the flattening
>> via:
>>
>> public void flatMap(Tuple value, Collector out) throws
>> Exception {
>>   out.collect(value.f0);
>>   out.collect(value.f1);
>>   ...;
>>   out.collect(value.fN);
>> }
>>
>> Best,
>> Piotrek
>>
>> pt., 7 sty 2022 o 07:05 Siddhesh Kalgaonkar 
>> napisał(a):
>>
>>> Hi Francis,
>>>
>>> What I am trying to do is you can see over here
>>> https://stackoverflow.com/questions/70592174/richsinkfunction-for-cassandra-in-flink/70593375?noredirect=1#comment124796734_70593375
>>>
>>>
>>> On Fri, Jan 7, 2022 at 5:07 AM Francis Conroy <
>>> francis.con...@switchdin.com> wrote:
>>>
>>>> Hi Siddhesh,
>>>>
>>>> How are you getting this tuple of strings into the system? I think this
>>>> is the important question, you can create a DataStream in many ways, from a
>>>> collection, from a source, etc but all of these rely on the
>>>> ExecutionEnvironment you're using.
>>>> A RichMapFunction doesn't produce a datastream directly, it's used in
>>>> the context of the StreamExecutionEnvironment to create a stream i.e.
>>>> DataStream.map([YourRichMapFunction]) this implies that you already need a
>>>> datastream to transform a datastream using a mapFunction
>>>> (MapFunction/RichMapFunction)
>>>> Francis
>>>>
>>>> On Fri, 7 Jan 2022 at 01:48, Siddhesh Kalgaonkar <
>>>> kalgaonkarsiddh...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> As I am new and I am facing one issue so I came
>>>>> across RichMapFunction. How can I use RichMapFunction to convert a tuple 
>>>>> of
>>>>> strings to datastream? If not how can I do it apart from using
>>>>> StreamExecutionEnvironment?
>>>>>
>>>>> Thanks,
>>>>> Sid
>>>>>
>>>>
>>>> This email and any attachments are proprietary and confidential and are
>>>> intended solely for the use of the individual to whom it is addressed. Any
>>>> views or opinions expressed are solely those of the author and do not
>>>> necessarily reflect or represent those of SwitchDin Pty Ltd. If you have
>>>> received this email in error, please let us know immediately by reply email
>>>> and delete it from your system. You may not use, disseminate, distribute or
>>>> copy this message nor disclose its contents to anyone.
>>>> SwitchDin Pty Ltd (ABN 29 154893857) PO Box 1165, Newcastle NSW 2300
>>>> Australia
>>>>
>>>

Re: Custom Kafka Keystore on Amazon Kinesis Analytics

2022-01-10 Thread Piotr Nowojski

Ah, I see. Pitty. You could always use reflection if you really had to, but
that's of course not a long term solution.

I will raise this issue to the KafkaSource/AWS contributors.

Best,
Piotr Nowojski

pon., 10 sty 2022 o 16:55 Clayton Wohl  napisał(a):

> Custom code can create subclasses of FlinkKafkaConsumer, because the
> constructors are public. Custom code can't create subclasses of KafkaSource
> because the constructors are package private. So the same solution of
> creating code subclasses won't work for KafkaSource.
>
> Thank you for the response :)
>
>
> On Mon, Jan 10, 2022 at 6:22 AM Piotr Nowojski 
> wrote:
>
>> Hi Clayton,
>>
>> I think in principle this example should be still valid, however instead
>> of providing a `CustomFlinkKafkaConsumer` and overriding it's `open`
>> method, you would probably need to override
>> `org.apache.flink.connector.kafka.source.reader.KafkaSourceReader#start`.
>> So you would most likely need both at the very least a custom
>> `KafkaSourceReader` and `KafkaSource` to instantiate your custom
>> `KafkaSourceReader`. But I'm not sure if anyone has ever tried this so far.
>>
>> Best,
>> Piotrek
>>
>> pt., 7 sty 2022 o 21:18 Clayton Wohl  napisał(a):
>>
>>> If I want to migrate from FlinkKafkaConsumer to KafkaSource, does the
>>> latter support this:
>>>
>>>
>>> https://docs.aws.amazon.com/kinesisanalytics/latest/java/example-keystore.html
>>>
>>> Basically, I'm running my Flink app in Amazon's Kinesis Analytics hosted
>>> Flink environment. I don't have reliable access to the local file system.
>>> At the documentation link above, Amazon recommends adding a hook to copy
>>> the keystore files from the classpath to a /tmp directory at runtime. Can
>>> KafkaSource do something similar?
>>>
>>

Re: Is there a way to know how long a Flink app takes to finish resuming from Savepoint?

2022-01-10 Thread Piotr Nowojski

Hi,

Unfortunately there is no such metric. Regarding the logs, I'm not sure
what Flink version you are using, but since Flink 1.13.0 [1][2], you could
relay on the tasks/subtasks switch from `INITIALIZING` to `RUNNING` to
check when the task/subtask has finished recovering it's state.

Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-17012
[2] https://issues.apache.org/jira/browse/FLINK-22215

pon., 10 sty 2022 o 09:34 Chen-Che Huang  napisał(a):

> Hi all,
>
> I'm trying to speed up the process of resuming from a savepoint by
> adjusting some configuration.
> I wonder whether there exists a way to know how much time our Flink app
> spends resuming from a savepoint?
> From the logs, I can see only the starting time of the resuming (as shown
> below) but couldn't find the end time of the resuming.
> If there exists some metrics or information about the resuming time, it'd
> be very helpful for the tuning.
> Any comment is appreciated.
>
> timestamp-1: Starting job  from savepoint
> timestamp-2: Restoring job  from Savepoint
>
> Best wishes,
> Chen-Che Huang
>

Re: Uploading jar to s3 for persistence

2022-01-10 Thread Piotr Nowojski

Hi Puneet,

Have you seen this thread before? [1]. It looks like the same issue and
especially this part might be the key:

> Be aware that the filesystem used by the FileUploadHandler
> is java.nio.file.FileSystem and not
> Flink's org.apache.flink.core.fs.FileSystem for which we provide different
> FileSystem implementations.

Best,
Piotrek

[1] https://www.mail-archive.com/user@flink.apache.org/msg38043.html



pon., 10 sty 2022 o 08:19 Puneet Duggal 
napisał(a):

> Hi,
>
> Currently i am working with flink HA cluster with 3 job managers and 3
> zookeeper nodes. Also i am persisting my checkpoints to s3 and hence
> already configured required flink-s3 jars during flink job manager and task
> manager process startup. Now i have configured a variable
>
> web.upload.dir: s3p://d11-flink-job-manager-load/jars
>
> Expectation is that jar upload via rest apis will be uploaded to this
> location and hence is accessible to all 3 job managers (which eventually
> will help in job submission as all 3 job managers will have record of
> uploaded jar to this location). But while uploading the jar, I am facing
> following Illegal Argument Exception which i am not sure why. Also above
> provided s3 location was created before job manager process was even
> started.
>
> *2022-01-09 18:12:46,790 WARN
>  org.apache.flink.runtime.rest.FileUploadHandler  [] - File
> upload failed.*
> *java.lang.IllegalArgumentException: UploadDirectory is not absolute.*
> at
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at
> org.apache.flink.runtime.rest.handler.FileUploads.(FileUploads.java:59)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at
> org.apache.flink.runtime.rest.FileUploadHandler.channelRead0(FileUploadHandler.java:186)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at
> org.apache.flink.runtime.rest.FileUploadHandler.channelRead0(FileUploadHandler.java:69)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
> at
> org.apache.flink.shaded.netty4.io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
> ~[flink-dist_2.12-1.13.1.jar:1.13.1]
>
>
>
>

Re: Custom Kafka Keystore on Amazon Kinesis Analytics

2022-01-10 Thread Piotr Nowojski

Hi Clayton,

I think in principle this example should be still valid, however instead of
providing a `CustomFlinkKafkaConsumer` and overriding it's `open` method,
you would probably need to override
`org.apache.flink.connector.kafka.source.reader.KafkaSourceReader#start`.
So you would most likely need both at the very least a custom
`KafkaSourceReader` and `KafkaSource` to instantiate your custom
`KafkaSourceReader`. But I'm not sure if anyone has ever tried this so far.

Best,
Piotrek

pt., 7 sty 2022 o 21:18 Clayton Wohl  napisał(a):

> If I want to migrate from FlinkKafkaConsumer to KafkaSource, does the
> latter support this:
>
>
> https://docs.aws.amazon.com/kinesisanalytics/latest/java/example-keystore.html
>
> Basically, I'm running my Flink app in Amazon's Kinesis Analytics hosted
> Flink environment. I don't have reliable access to the local file system.
> At the documentation link above, Amazon recommends adding a hook to copy
> the keystore files from the classpath to a /tmp directory at runtime. Can
> KafkaSource do something similar?
>

Re: RichMapFunction to convert tuple of strings to DataStream[(String,String)]

2022-01-10 Thread Piotr Nowojski

Hi Sid,

I don't see on the stackoverflow explanation of what are you trying to do
here (no mentions of MapFunction or a tuple).

If you want to create a `DataStream` from some a pre
existing/static Tuple of Strings, the easiest thing would be to convert the
tuple to a collection/iterator and use
`StreamExecutionEnvironment#fromCollection(...)`.
If you already have a `DataStream>` (for example your source
produces a tuple) and you want to flatten it to `DataStream`, then
you need a simple `FlatMapFunction, String>` (or
`RichFlatMapFunction, String>`), that would do the flattening
via:

public void flatMap(Tuple value, Collector out) throws
Exception {
  out.collect(value.f0);
  out.collect(value.f1);
  ...;
  out.collect(value.fN);
}

Best,
Piotrek

pt., 7 sty 2022 o 07:05 Siddhesh Kalgaonkar 
napisał(a):

> Hi Francis,
>
> What I am trying to do is you can see over here
> https://stackoverflow.com/questions/70592174/richsinkfunction-for-cassandra-in-flink/70593375?noredirect=1#comment124796734_70593375
>
>
> On Fri, Jan 7, 2022 at 5:07 AM Francis Conroy <
> francis.con...@switchdin.com> wrote:
>
>> Hi Siddhesh,
>>
>> How are you getting this tuple of strings into the system? I think this
>> is the important question, you can create a DataStream in many ways, from a
>> collection, from a source, etc but all of these rely on the
>> ExecutionEnvironment you're using.
>> A RichMapFunction doesn't produce a datastream directly, it's used in the
>> context of the StreamExecutionEnvironment to create a stream i.e.
>> DataStream.map([YourRichMapFunction]) this implies that you already need a
>> datastream to transform a datastream using a mapFunction
>> (MapFunction/RichMapFunction)
>> Francis
>>
>> On Fri, 7 Jan 2022 at 01:48, Siddhesh Kalgaonkar <
>> kalgaonkarsiddh...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> As I am new and I am facing one issue so I came across RichMapFunction.
>>> How can I use RichMapFunction to convert a tuple of strings to datastream?
>>> If not how can I do it apart from using StreamExecutionEnvironment?
>>>
>>> Thanks,
>>> Sid
>>>
>>
>> This email and any attachments are proprietary and confidential and are
>> intended solely for the use of the individual to whom it is addressed. Any
>> views or opinions expressed are solely those of the author and do not
>> necessarily reflect or represent those of SwitchDin Pty Ltd. If you have
>> received this email in error, please let us know immediately by reply email
>> and delete it from your system. You may not use, disseminate, distribute or
>> copy this message nor disclose its contents to anyone.
>> SwitchDin Pty Ltd (ABN 29 154893857) PO Box 1165, Newcastle NSW 2300
>> Australia
>>
>

Re: Job stuck in savePoint - entire topic replayed on restart.

2022-01-10 Thread Piotr Nowojski

Hi Basil,

1. What do you mean by:
> The only way we could stop these stuck jobs was to patch the finalizers.
?
2. Do you mean that your job is stuck when doing stop-with-savepoint?
3. What Flink version are you using? Have you tried upgrading to the most
recent version, or at least the most recent minor release? There have been
some bugs in the past with stop-with-savepoint, that have been fixed over
time. For example [1], [2] or [3]. Note that some of them might not be
related to your use case (Kinesis consumer or FLIP-27 sources).
4. If upgrading won't help, can you post stack traces of task managers that
contain the stuck operators/tasks?
5. If you are working on a version that has fixed all of those bugs, are
you using some custom operators/sources/sinks? If your code is either
capturing interrupts, or doing some blocking calls, it might be prone to
bugs similar to [2] (please check the discussion in the ticket for more
information).

Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-21028
[2] https://issues.apache.org/jira/browse/FLINK-17170
[3] https://issues.apache.org/jira/browse/FLINK-21133

czw., 6 sty 2022 o 16:23 Basil Bibi  napisał(a):

> Hi,
> We experienced a problem in production during a release.
> Our application is deployed to kubernetes using argocd and uses the Lyft
> flink operator.
> We tried to do a release and found that on deleting the application some
> of the jobs became stuck in "savepointing" phase.
> The only way we could stop these stuck jobs was to patch the finalizers.
> We deployed the new release and on startup our application had lost it's
> offsets so all of the messages in kafka were replayed.
> Has anyone got any ideas how and why this happened and how we avoid it in
> the future?
> Sincerely Basil Bibi
>
>
>
> Authorised and regulated by the Financial Conduct Authority
>  (FCA) number 923700. *Humn.ai Ltd*, 12
> Hammersmith Grove, London, W6 7AP is a registered company number 11032616
> incorporated in the United Kingdom. Registered with the information
> commissioner’s office (ICO) number ZA504331.
>
> This message contains confidential information and is intended only for
> the individual(s) addressed in the message. If you aren't the named
> addressee, you should not disseminate, distribute, or copy this e-mail.
>

Re: unaligned checkpoint for job with large start delay

2022-01-10 Thread Piotr Nowojski

Hi Mason,

Sorry for a late reply, but I was OoO.

I think you could confirm it with more custom metrics. Counting how many
windows have been registered/fired and plotting that over time.

I think it would be more helpful in this case to check how long a task has
been blocked being "busy" processing for example timers. FLINK-25414 shows
only blocked on being hard/soft backpressure. Unfortunately at the moment I
don't know how to implement such a metric without affecting performance on
the critical path, so I don't see this happening soon :(

Best,
Piotrek

wt., 4 sty 2022 o 18:02 Mason Chen  napisał(a):

> Hi Piotrek,
>
> In other words, something (presumably a watermark) has fired more than 151
> 200 windows at once, which is taking ~1h 10minutes to process and during
> this time the checkpoint can not make any progress. Is this number of
> triggered windows plausible in your scenario?
>
>
> It seems plausible—there are potentially many keys (and many windows). Is
> there a way to confirm with metrics? We can add a window fire counter to
> the window operator that only gets incremented at the end of windows
> evaluation, in order to see the huge jumps in window fires. I can this
> benefiting other users who troubleshoot the problem of large number of
> window firing.
>
> Best,
> Mason
>
> On Dec 29, 2021, at 2:56 AM, Piotr Nowojski  wrote:
>
> Hi Mason,
>
> > and it has to finish processing this output before checkpoint can
> begin—is this right?
>
> Yes. Checkpoint will be only executed once all triggered windows will be
> fully processed.
>
> But from what you have posted it looks like all of that delay is
> coming from hundreds of thousands of windows firing all at the same time.
> Between 20:30 and ~21:40 there must have been a bit more than 36 triggers/s
> * 60s/min * 70min = 151 200triggers fired at once (or in a very short
> interval). In other words, something (presumably a watermark) has fired
> more than 151 200 windows at once, which is taking ~1h 10minutes to process
> and during this time the checkpoint can not make any progress. Is this
> number of triggered windows plausible in your scenario?
>
> Best,
> Piotrek
>
>
> czw., 23 gru 2021 o 12:12 Mason Chen  napisał(a):
>
>> Hi Piotr,
>>
>> Thanks for the thorough response and the PR—will review later.
>>
>> Clarifications:
>> 1. The flat map you refer to produces at most 1 record.
>> 2. The session window operator’s *window process function* emits at
>> least 1 record.
>> 3. The 25 ms sleep is at the beginning of the window process function.
>>
>> Your explanation about how records being bigger than the buffer size can
>> cause blockage makes sense to me. However, my average record size is around 
>> 770
>> bytes coming out of the source and 960 bytes coming out of the window.
>> Also, we don’t override the default `taskmanager.memory.segment-size`. My
>> Flink job memory config is as follows:
>>
>> ```
>> taskmanager.memory.jvm-metaspace.size: 512 mb
>> taskmanager.memory.jvm-overhead.max: 2Gb
>> taskmanager.memory.jvm-overhead.min: 512Mb
>> taskmanager.memory.managed.fraction: '0.4'
>> taskmanager.memory.network.fraction: '0.2'
>> taskmanager.memory.network.max: 2Gb
>> taskmanager.memory.network.min: 200Mb
>> taskmanager.memory.process.size: 16Gb
>> taskmanager.numberOfTaskSlots: '4'
>> ```
>>
>>  Are you sure your job is making any progress? Are records being
>> processed? Hasn't your job simply deadlocked on something?
>>
>>
>> To distinguish task blockage vs graceful backpressure, I have checked the
>> operator throughput metrics and have confirmed that during window *task*
>> buffer blockage, the window *operator* DOES emit records. Tasks look
>> like they aren’t doing anything but the window is emitting records.
>>
>> 
>>
>>
>> Furthermore, I created a custom trigger to wrap a metric counter for
>> FIRED counts to get a estimation of how many windows are fired at the same
>> time. I ran a separate job with the same configs—the results look as
>> follows:
>> 
>>
>> On average, when the buffers are blocked, there are 36 FIREs per second.
>> Since each of these fires invokes the window process function, 25 ms * 36 =
>> 900 ms means we sleep almost a second cumulatively, per second—which is
>> pretty severe. Combined with the fact that the window process function can
>> emit many records, the task takes even longer to checkpoint since the
>> flatmap/kafka sink is chained with the window operator—and it has to finish
>> processing this output before checkpoint can begin—*is this ri

Re: unaligned checkpoint for job with large start delay

2021-12-20 Thread Piotr Nowojski

Hi Mason,

Those checkpoint timeouts (30 minutes) have you already observed with the
alignment timeout set to 0ms? Or as you were previously running it with 1s
alignment timeout?

If the latter, it might be because unaligned checkpoints are failing to
kick in in the first place. Setting the timeout to 0ms should solve the
problem.

If the former, have you checked why the checkpoints are timeouting? What
part of the checkpointing process is taking a long time? For example can
you post a screenshot from the WebUI of checkpoint stats for each task? The
only explanation I could think of is this sleep time that you added. 25ms
per record is really a lot. I mean really a lot. 30 minutes / 25 ms/record
= 72 000 records. One of the unaligned checkpoints limitations is that
Flink can not snapshot a state of an operator in the middle of processing a
record. In your particular case, Flink will not be able to snapshot the
state of the session window operator in the middle of the windows being
fired. If your window operator is firing a lot of windows at the same time,
or a single window is producing 72k of records (which would be an
unusual but not unimaginable amount), this could block checkpointing of the
window operator for 30 minutes due to this 25ms sleep down the stream.

Piotrek

pt., 17 gru 2021 o 19:19 Mason Chen  napisał(a):

> Hi Piotr,
>
> Thanks for the link to the JIRA ticket, we actually don’t see much state
> size overhead between checkpoints in aligned vs unaligned, so we will go
> with your recommendation of using unaligned checkpoints with 0s alignment
> timeout.
>
> For context, we are testing unaligned checkpoints with our application
> with these tasks: [kafka source, map, filter] -> keyby -> [session window]
> -> [various kafka sinks]. The first task has parallelism 40 and the rest of
> the tasks have parallelism 240. This is the FLIP 27 Kafka source.
>
> We added an artificial sleep (25 ms per invocation of in process function)
> the session window task to simulate backpressure; however, we still see
> checkpoints failing due to task acknowledgement doesn’t complete within our
> checkpoint timeout (30 minutes).
>
> I am able to correlate that the input buffers from *window* and output
> buffers from *source* being 100% usage corresponds to the checkpoint
> failures. When they are not full (input can drop to as low as 60% usage and
> output can drop to as low as 55% usage), the checkpoints succeed within
> less than 2 ms. In all cases, it is the session window task or source task
> failing to 100% acknowledge the barriers within timeout. I do see the
> *source* task acknowledgement taking long in some of the failures (e.g.
> 20 minutes, 30 minutes, 50 minutes, 1 hour, 2 hours) and source is idle and
> not busy at this time.
>
> All other input buffers are low usage (mostly 0). For output buffer, the
> usage is around 50% for window--everything else is near 0% all the time
> except the source mentioned before (makes sense since rest are just sinks).
>
> We are also running a parallel Flink job with the same configurations,
> except with unaligned checkpoints disabled. Here we see observe the same
> behavior except now some of the checkpoints are failing due to the source
> task not acknowledging everything within timeout—however, most failures are
> still due to session window acknowledgement.
>
> All the data seems to points an issue with the source? Now, I don’t know
> how to explain this behavior since unaligned checkpoints should overtake
> records in the buffers (once seen at the input buffer, forward immediately
> downstream to output buffer).
>
> Just to confirm, this is our checkpoint configuration:
> ```
> Option
> Value
> Checkpointing Mode Exactly Once
> Checkpoint Storage FileSystemCheckpointStorage
> State Backend EmbeddedRocksDBStateBackend
> Interval 5m 0s
> Timeout 30m 0s
> Minimum Pause Between Checkpoints 2m 0s
> Maximum Concurrent Checkpoints 1
> Unaligned Checkpoints Enabled
> Persist Checkpoints Externally Enabled (retain on cancellation)
> Tolerable Failed Checkpoints 10
> ```
>
> Are there other metrics should I look at—why else should tasks fail
> acknowledgement in unaligned mode? Is it something about the implementation
> details of window function that I am not considering? My main hunch is
> something to do with the source.
>
> Best,
> Mason
>
> On Dec 16, 2021, at 12:25 AM, Piotr Nowojski  wrote:
>
> Hi Mason,
>
> In Flink 1.14 we have also changed the timeout behavior from checking
> against the alignment duration, to simply checking how old is the
> checkpoint barrier (so it would also account for the start delay) [1]. It
> was done in order to solve problems as you are describing. Unfortunately we
> can not backport this change

Re: Prometheus labels in metrics / counters

2021-12-17 Thread Piotr Nowojski

Hi,

In principle you can register metric/metric groups dynamically it should be
working just fine. However your code probably won't work, because per every
record you are creating a new group and new counter, that most likely will
be colliding with an old one. So every time you are defining a new group or
new counter, you should remember it in some field.

Best,
Piotrek

pt., 17 gru 2021 o 14:03 Witold Baryluk 
napisał(a):

> Hi,
>
> I want to track and increment some monitoring counters from `map`,
> but have them broken down by dynamically defined values of a label.
> The set of values is unknown at creation time, but it is bounded
> (less than 100 different values during a 30 day period, usually ~5).
>
> There is no easy way of doing this, compared to other Prometheus
> native systems (i.e. in Go, or C++), but it looks like it might be possible
> using some workarounds:
>
>
> public class MyMapper extends RichMapFunction {
>   private transient MetricGroup metric_group;
>
>   @Override
>   public void open(Configuration config) {
> this.metric_group = getRuntimeContext().getMetricGroup();
>   }
>
>   @Override
>   public String map(String value) throws Exception {
>   Group group = this.metric_group.addGroup("kind", getKind(value));
>   group.counter("latency_sum").inc(getLatencyMicros(value));
>   group.counter("latency_count").inc();
>   return value;
>   }
> }
>
> Will this work? Is there a better way?
>
> But, this does not look nice at all compared to how
> other projects handle Prometheus labels. Flink metrics are not well
> mapped into Prometheus metrics here.
>
> Second question, is it ok to call addGroup and counter, dynamically like
> this,
> or should it be cached? Do I need any such cache (which would be map
> of string to Counter), be protected by some mutex when I lookup or add
> to it?
>
> Cheers,
> Witold
>

Re: fraud detection example fails

2021-12-17 Thread Piotr Nowojski

Hi,

It might be simply because the binary artifacts are not yet
published/visible. The blog post [1] mentions that it should be visible
within 24h from yesterday), so please try again later/tomorrow. This is
also mentioned in the dev mailing list thread [2]

Best,
Piotrek

[1] https://flink.apache.org/news/2021/12/16/log4j-patch-releases.html
[2]
https://mail-archives.apache.org/mod_mbox/flink-dev/202112.mbox/%3C7ce89912-69fb-cf99-f815-10b87cece03b%40apache.org%3E

pt., 17 gru 2021 o 13:04 HG  napisał(a):

> Hello all
>
> I am a flink newbie and trying to do the fraud detection example.
> The maven command however fails for version 1.14.2 since it cannot find 
> flink-walkthrough-datastream-java
> for that version
>
> mvn archetype:generate -DarchetypeGroupId=org.apache.flink
> -DarchetypeArtifactId=flink-walkthrough-datastream-java
> -DarchetypeVersion=1.14.2 -DgroupId=frauddetection
> -DartifactId=frauddetection -Dversion=0.1 -Dpackage=spendreport
> -DinteractiveMode=false
>
> this succeeds however.
>
> mvn archetype:generate -DarchetypeGroupId=org.apache.flink
> -DarchetypeArtifactId=flink-walkthrough-datastream-java
> -DarchetypeVersion=1.14.1 -DgroupId=frauddetection
> -DartifactId=frauddetection -Dversion=0.1 -Dpackage=spendreport
> -DinteractiveMode=false
>
> Perhaps an omission caused by the need to fix the log4j issues?
>
> Can it be solved?
>
> Regards Hans-Peter
>

Re: Read parquet data from S3 with Flink 1.12

2021-12-17 Thread Piotr Nowojski

Hi,

Reading in the DataStream API (that's what I'm using you are doing) from
Parquet files is officially supported and documented only since 1.14 [1].
Before that it was only supported for the Table API. As far as I can tell,
the basic classes (`FileSource` and `ParquetColumnarRowInputFormat`) have
already been in the code base since 1.12.x. I don't know how stable it was
and how well it was working. I would suggest upgrading to Flink 1.14.1. As
a last resort you can try using the very least the latest version of 1.12.x
branch as documented by 1.14 version, but I can not guarantee that it will
be working.

Regarding the S3 issue, have you followed the documentation? [2][3]

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/formats/parquet/
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
[3]
https://nightlies.apache.org/flink/flink-docs-release-1.12/deployment/filesystems/s3.html


pt., 17 gru 2021 o 10:10 Alexandre Montecucco <
alexandre.montecu...@grabtaxi.com> napisał(a):

> Hello everyone,
> I am struggling to read S3 parquet files from S3 with Flink Streaming
> 1.12.2
> I had some difficulty simply reading from local parquet files. I finally
> managed that part, though the solution feels dirty:
> - I use the readFile function + ParquetInputFormat abstract class (that is
> protected) (as I could not find a way to use the public
> ParquetRowInputFormat).
> - the open function, in ParquetInputFormat is
> using org.apache.hadoop.conf.Configuration. I am not sure which import to
> add. It seems the flink-parquet library is importing the dependency from
> hadoop-common but the dep is marked as provided. THe doc only shows usage
> of flink-parquet from Flink SQL. So I am under the impression that this
> might not work in the streaming case without extra code. I 'solved' this by
> adding a dependency to hadoop-common. We did something similar to write
> parquet data to S3.
>
> Now, when trying to run the application to read from S3, I get an
> exception with root cause:
> ```
> Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No
> FileSystem for scheme "s3"
> ```
> I guess there are some issues with hadoop-common not knowing about the
> flink-s3-hadoop plugin setup. But I ran out of ideas on how to solve this.
>
>
> I also noticed there were some changes with flink-parquet in Flink 1.14,
> but I had some issues with simply reading data (but I did not investigate
> so deeply for that version).
>
> Many thanks for any help.
> --
>
> [image: Grab] 
>
> [image: Twitter]   [image: Facebook]
>  [image: LinkedIn]
>  [image: Instagram]
>  [image: Youtube]
> 
>
> Alexandre Montecucco / Grab, Software Developer
> alexandre.montecu...@grab.com  / 8782 0937
>
> Grab
> 138 Cecil Street, Cecil Court #01-01Singapore 069538
> https://www.grab.com/ 
>
>
> By communicating with Grab Inc and/or its subsidiaries, associate
> companies and jointly controlled entities (“Grab Group”), you are deemed to
> have consented to the processing of your personal data as set out in the
> Privacy Notice which can be viewed at https://grab.com/privacy/
>
> This email contains confidential information and is only for the intended
> recipient(s). If you are not the intended recipient(s), please do not
> disseminate, distribute or copy this email Please notify Grab Group
> immediately if you have received this by mistake and delete this email from
> your system. Email transmission cannot be guaranteed to be secure or
> error-free as any information therein could be intercepted, corrupted,
> lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do
> not accept liability for any errors or omissions in the contents of this
> email arises as a result of email transmission. All intellectual property
> rights in this email and attachments therein shall remain vested in Grab
> Group, unless otherwise provided by law.
>

Re: Confusion about rebalance bytes sent metric in Flink UI

2021-12-16 Thread Piotr Nowojski

Hi Tao,

Could you prepare a minimalistic example that would reproduce this issue?
Also what Flink version are you using?

Best,
Piotrek

czw., 16 gru 2021 o 09:44 tao xiao  napisał(a):

> >Your upstream is not inflating the record size?
> No, this is a simply dedup function
>
> On Thu, Dec 16, 2021 at 2:49 PM Arvid Heise  wrote:
>
>> Ah yes I see it now as well. Yes you are right, each record should be
>> replicated 9 times to send to one of the instances each. Your upstream is
>> not inflating the record size? The number of records seems to work
>> decently. @pnowojski  FYI.
>>
>> On Thu, Dec 16, 2021 at 2:20 AM tao xiao  wrote:
>>
>>> Hi Arvid
>>>
>>> The second picture shows the metrics of the upstream operator. The
>>> upstream has 150 parallelisms as you can see in the first picture. I expect
>>> the bytes sent is about 9 * bytes received as we have 9 downstream
>>> operators connecting.
>>>
>>> Hi Caizhi,
>>> Let me create a minimal reproducible DAG and update here
>>>
>>> On Thu, Dec 16, 2021 at 4:03 AM Arvid Heise  wrote:
>>>
 Hi,

 Could you please clarify which operator we see in the second picture?

 If you are showing the upstream operator, then this has only
 parallelism 1, so there shouldn't be multiple subtasks.
 If you are showing the downstream operator, then the metric would refer
 to the HASH and not REBALANCE.

 On Tue, Dec 14, 2021 at 2:55 AM Caizhi Weng 
 wrote:

> Hi!
>
> This doesn't seem to be the expected behavior. Rebalance shuffle
> should send records to one of the parallelism, not all.
>
> If possible could you please explain what your Flink job is doing and
> preferably share your user code so that others can look into this case?
>
> tao xiao  于2021年12月11日周六 01:11写道：
>
>> Hi team,
>>
>> I have one operator that is connected to another 9 downstream
>> operators using rebalance. Each operator has 150 parallelisms[1]. I 
>> assume
>> each message in the upstream operation is sent to one of the parallel
>> instances of the 9 receiving operators so the total bytes sent should be
>> roughly 9 times of bytes received in the upstream operator metric. 
>> However
>> the Flink UI shows the bytes sent is much higher than 9 times. It is 
>> about
>> 150 * 9 * bytes received[2]. This looks to me like every message is
>> duplicated to each parallel instance of all receiving operators like what
>> broadcast does.  Is this correct?
>>
>>
>>
>> [1] https://imgur.com/cGyb0QO
>> [2] https://imgur.com/SFqPiJA
>> --
>> Regards,
>> Tao
>>
>
>>>
>>> --
>>> Regards,
>>> Tao
>>>
>>
>
> --
> Regards,
> Tao
>

Re: unaligned checkpoint for job with large start delay

2021-12-16 Thread Piotr Nowojski

Hi Mason,

In Flink 1.14 we have also changed the timeout behavior from checking
against the alignment duration, to simply checking how old is the
checkpoint barrier (so it would also account for the start delay) [1]. It
was done in order to solve problems as you are describing. Unfortunately we
can not backport this change to 1.13.x as it's a breaking change.

Anyway, from our experience I would recommend going all in with the
unaligned checkpoints, so setting the timeout back to the default value of
0ms. With timeouts you are gaining very little (a tiny bit smaller state
size if there is no backpressure - tiny bit because without backpressure,
even with timeout set to 0ms, the amount of captured inflight data is
basically insignificant), while in practise you slow down the checkpoint
barriers propagation time by quite a lot.

Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-23041

wt., 14 gru 2021 o 22:04 Mason Chen  napisał(a):

> Hi all,
>
> I'm using Flink 1.13 and my job is experiencing high start delay, more so
> than high alignment time. (our flip 27 kafka source is heavily
> backpressured). Since our alignment timeout is set to 1s, the unaligned
> checkpoint never triggers since alignment delay is always below the
> threshold.
>
> It's seems there is only a configuration for alignment timeout but should
> there also be one for start delay timeout:
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpointing_under_backpressure/#aligned-checkpoint-timeout
>
> I'm interested to know the reasoning why there isn't a timeout for start
> delay as well--was it because it was deemed too complex for the user to
> configure two parameters for unaligned checkpoints?
>
> I'm aware of buffer debloating in 1.14 that could help but I'm trying to
> see how far unaligned checkpointing can take me.
>
> Best,
> Mason
>

Re: Buffering when connecting streams

2021-12-05 Thread Piotr Nowojski

Hi Alexis and David,

This actually can not happen. There are mechanisms in the code to make sure
none of the input is starved IF there is some data to be read.

The only time when input can be blocked is during the alignment phase of
aligned checkpoints under back pressure. If there was a back pressure in
your job it could have easily happened that checkpoint barriers would flow
through the job graph to the CoProcessKeyedCoProcessFunction on one of the
paths much quicker then the other, causing this faster path to be blocked
until the other side catched up. But that would happen only during the
alignment phase of the checkpoint, so without a backpressure for a very
short period of time.

Piotrek

czw., 2 gru 2021 o 18:23 David Morávek  napisał(a):

> I think this could happen, but I have a very limited knowledge about how
> the input gates work internally. @Piotr could definitely provide some more
> insight here.
>
> D.
>
> On Thu, Dec 2, 2021 at 5:54 PM Alexis Sarda-Espinosa <
> alexis.sarda-espin...@microfocus.com> wrote:
>
>> I do have some logic with timers today, but it’s indeed not ideal. I
>> guess I’ll have a look at TwoInputStreamOperator, but I do have related
>> questions. You mentioned a sample scenario of "processing backlog" where
>> windows fire very quickly; could it happen that, in such a situation, the
>> framework calls the operator’s processElement1 continuously (even for
>> several minutes) before calling processElement2 a single time? How does the
>> framework decide when to switch the stream processing when the streams are
>> connected?
>>
>>
>>
>> Regards,
>>
>> Alexis.
>>
>>
>>
>> *From:* David Morávek 
>> *Sent:* Donnerstag, 2. Dezember 2021 17:18
>> *To:* Alexis Sarda-Espinosa 
>> *Cc:* user@flink.apache.org
>> *Subject:* Re: Buffering when connecting streams
>>
>>
>>
>> Even with the TwoInputStreamOperator you can not "halt" the processing.
>> You need to buffer these elements for example in the ListState for later
>> processing. At the time the watermark of the second stream arrives, you can
>> process all buffered elements that satisfy the condition.
>>
>>
>>
>> You could probably also implement a similar (less optimized) solution
>> with KeyedCoProcessFunction using event time timers.
>>
>>
>>
>> Best,
>>
>> D.
>>
>>
>>
>> On Thu, Dec 2, 2021 at 5:12 PM Alexis Sarda-Espinosa <
>> alexis.sarda-espin...@microfocus.com> wrote:
>>
>> Yes, that sounds right, but with my current KeyedCoProcessFunction I
>> can’t tell Flink to "halt" processElement1 and switch to the other stream
>> depending on watermarks. I could look into TwoInputStreamOperator if you
>> think that’s the best approach.
>>
>>
>>
>> Regards,
>>
>> Alexis.
>>
>>
>>
>> *From:* David Morávek 
>> *Sent:* Donnerstag, 2. Dezember 2021 16:59
>> *To:* Alexis Sarda-Espinosa 
>> *Cc:* user@flink.apache.org
>> *Subject:* Re: Buffering when connecting streams
>>
>>
>>
>> I think this would require using lower level API and implementing a
>> custom `TwoInputStreamOperator`. Then you can hook to
>> `processWatemark{1,2}` methods.
>>
>>
>>
>> Let's also make sure we're on the same page on what the watermark is. You
>> can think of the watermark as event time clock. It basically gives you an
>> information, that *no more events with timestamp lower than the
>> watermark should appear in your stream*.
>>
>>
>>
>> You simply delay emitting of the window result from your "connect"
>> operator, until watermark from the second (side output) stream passes the
>> window's max timestamp (maximum timestamp that is included in the window).
>>
>>
>>
>> Does that make sense?
>>
>>
>>
>> Best,
>>
>> D.
>>
>>
>>
>> On Thu, Dec 2, 2021 at 4:25 PM Alexis Sarda-Espinosa <
>> alexis.sarda-espin...@microfocus.com> wrote:
>>
>> Could you elaborate on what you mean with synchronize? Buffering in the
>> state would be fine, but I haven’t been able to come up with a good way of
>> ensuring that all data from the side stream for a given minute is processed
>> by processElement2 before all data for the same (windowed) minute reaches
>> processElement1, even when considering watermarks.
>>
>>
>>
>> Regards,
>>
>> Alexis.
>>
>>
>>
>> *From:* David Morávek 
>> *Sent:* Donnerstag, 2. Dezember 2021 15:45
>> *To:* Alexis Sarda-Espinosa 
>> *Cc:* user@flink.apache.org
>> *Subject:* Re: Buffering when connecting streams
>>
>>
>>
>> You can not rely on order of the two streams that easily. In case you are
>> for example processing backlog and the windows fire quickly, it can happen
>> that it's actually faster than the second branch which has less work to do.
>> This will make the pipeline non-deterministic.
>>
>>
>>
>> What you can do is to "synchronize" watermarks of both streams in your
>> "connect" operator, but that of course involves buffering events in the
>> state.
>>
>>
>>
>> Best,
>>
>> D.
>>
>>
>>
>> On Thu, Dec 2, 2021 at 3:02 PM Alexis Sarda-Espinosa <
>> alexis.sarda-espin...@microfocus.com> wrote:
>>
>> Hi David,
>>
>>
>>
>> A watermark step

Re: Input Selectable & Checkpointing

2021-11-25 Thread Piotr Nowojski

You're welcome!

Piotrek

śr., 24 lis 2021 o 17:48 Shazia Kayani  napisał(a):

> Hi Piotrek,
>
> Thanks for you message!
>
> Ok that does sound interesting and is a approach I had not considered
> before, will take a look into and further investigate
>
>
> Thank you!
>
> Best wishes,
>
> Shazia
>
>
> - Original message -
> From: "Piotr Nowojski" 
> To: "Shazia Kayani" 
> Cc: mart...@ververica.com, "user" 
> Subject: [EXTERNAL] Re: Input Selectable & Checkpointing
> Date: Wed, Nov 24, 2021 11:08 AM
>
> Hi Shazia, FLIP-182 [1] might be a thing that will let you address issues
> like this in the future. With it, maybe you could do some magic with
> assigning watermarks to make sure that one stream doesn't run too much into
> the future which ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
> Hi Shazia,
>
> FLIP-182 [1] might be a thing that will let you address issues like this
> in the future. With it, maybe you could do some magic with assigning
> watermarks to make sure that one stream doesn't run too much into the
> future which would effectively prioritise the other stream. But that's
> currently aimed for Flink 1.15 (subject to change), which is still a couple
> of months away.
>
> For the time being, a workaround that I know some people were using is to
> implement some manual throttling of the sources. Either via a throttling
> operator/mapping function chained directly after the sources, or
> implemented inside your custom source. One issue that complicates this
> solution is that most likely you would need to use an external system
> (external database?, maybe some file?) to control how much and when to
> throttle whom. To decide whom to throttle you could use Flink metrics [2],
> especially something around the amount of bytes/records processed by an
> operator/subtask. Also note that be cautious when doing sleeps, as when you
> are blocking calls inside your code, you will block checkpointing for
> example. And let me stress this one more time, throttling should be chained
> directly after the sources. If there is a network exchange between source
> and throttling function, you would capture a lot of in-flight records
> between the two, causing potentially crippling back pressure that would
> especially affect aligned checkpointing [3].
>
> Best,
> Piotrek
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> [2] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/
> [3]
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpointing_under_backpressure/
>
> wt., 23 lis 2021 o 15:52 Shazia Kayani  napisał(a):
>
> Hi Martijn,
>
> Its a continuous requirement so always read from one input source over
> another, but its does not require a super strict guarantee, so it doesn't
> matter if on occasion a message is read from the wrong topic. It's mainly
> due to there consistently being significantly more messages on one source
> than another which causes issues when we there are too many messages on the
> stream.
>
> Thanks
>
> Shazia
>
>
> - Original message -
> From: "Martijn Visser" 
> To: "Shazia Kayani" 
> Cc: "User" 
> Subject: [EXTERNAL] Re: Input Selectable & Checkpointing
> Date: Tue, Nov 23, 2021 2:45 PM
>
> Hi,
>
> Do you have a requirement to continuously prioritise one input source over
> another (like always read topic X from Kafka before topic Y from Kafka) or
> is it a one-time effort, because you might need to bootstrap some state, so
> first read all data from file source A before switching over to topic B
> from Kafka?). If it's the latter, you could look into the HybridSource.
>
> Best regards,
>
> Martijn
>
> On Tue, 23 Nov 2021 at 15:34, Shazia Kayani  wrote:
>
> Hi All,
>
> Hope you are well!
>
> I am working on something which has a requirement from flink to prioritise
> one input datastream over another, to do this I'm currently implemented an
> operator which extends InputSelectable to do this.
> However, because of using input selectable checkpointing is disabled as it
> is currently not supported.
>
> I was just wondering if anyone has done something similar to
> this previously? and if so were you able to implement changes which
> resulted in successful checkpointing?
> If anyone has any other tips around the topic that too would also be
> helpful!
>
> Thanks
>
> Shazia
>
> Unless stated otherwise above:
>
> IBM United Kingdom L

Re: Input Selectable & Checkpointing

2021-11-24 Thread Piotr Nowojski

Hi Shazia,

FLIP-182 [1] might be a thing that will let you address issues like this in
the future. With it, maybe you could do some magic with assigning
watermarks to make sure that one stream doesn't run too much into the
future which would effectively prioritise the other stream. But that's
currently aimed for Flink 1.15 (subject to change), which is still a couple
of months away.

For the time being, a workaround that I know some people were using is to
implement some manual throttling of the sources. Either via a throttling
operator/mapping function chained directly after the sources, or
implemented inside your custom source. One issue that complicates this
solution is that most likely you would need to use an external system
(external database?, maybe some file?) to control how much and when to
throttle whom. To decide whom to throttle you could use Flink metrics [2],
especially something around the amount of bytes/records processed by an
operator/subtask. Also note that be cautious when doing sleeps, as when you
are blocking calls inside your code, you will block checkpointing for
example. And let me stress this one more time, throttling should be chained
directly after the sources. If there is a network exchange between source
and throttling function, you would capture a lot of in-flight records
between the two, causing potentially crippling back pressure that would
especially affect aligned checkpointing [3].

Best,
Piotrek

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
[2] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/
[3]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpointing_under_backpressure/

wt., 23 lis 2021 o 15:52 Shazia Kayani  napisał(a):

> Hi Martijn,
>
> Its a continuous requirement so always read from one input source over
> another, but its does not require a super strict guarantee, so it doesn't
> matter if on occasion a message is read from the wrong topic. It's mainly
> due to there consistently being significantly more messages on one source
> than another which causes issues when we there are too many messages on the
> stream.
>
> Thanks
>
> Shazia
>
>
> - Original message -
> From: "Martijn Visser" 
> To: "Shazia Kayani" 
> Cc: "User" 
> Subject: [EXTERNAL] Re: Input Selectable & Checkpointing
> Date: Tue, Nov 23, 2021 2:45 PM
>
> Hi,
>
> Do you have a requirement to continuously prioritise one input source over
> another (like always read topic X from Kafka before topic Y from Kafka) or
> is it a one-time effort, because you might need to bootstrap some state, so
> first read all data from file source A before switching over to topic B
> from Kafka?). If it's the latter, you could look into the HybridSource.
>
> Best regards,
>
> Martijn
>
> On Tue, 23 Nov 2021 at 15:34, Shazia Kayani  wrote:
>
> Hi All,
>
> Hope you are well!
>
> I am working on something which has a requirement from flink to prioritise
> one input datastream over another, to do this I'm currently implemented an
> operator which extends InputSelectable to do this.
> However, because of using input selectable checkpointing is disabled as it
> is currently not supported.
>
> I was just wondering if anyone has done something similar to
> this previously? and if so were you able to implement changes which
> resulted in successful checkpointing?
> If anyone has any other tips around the topic that too would also be
> helpful!
>
> Thanks
>
> Shazia
>
> Unless stated otherwise above:
>
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
>
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>
>
>
> Unless stated otherwise above:
>
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
>
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>

Re: Providing files while application mode deployment

2021-11-09 Thread Piotr Nowojski

Hi Vasily,

Unfortunately no, I don't think there is such an option in your case. With
per job mode, you could try to use the Distributed Cache, it should be
working in streaming as well [1], but this doesn't work in the application
mode, as in that case no code is executed on the JobMaster [2]

Two workarounds that I could propose, that I know are not perfect is to:
- bundle the configuration file in the jar
- pass the entire configuration as a parameter to the job though some json,
or base64 encoded parameter.

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/dataset/overview/#distributed-cache
[2]
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/overview/#overview-and-reference-architecture

wt., 9 lis 2021 o 14:14 Vasily Melnik 
napisał(a):

> Hi all.
>
> While running Flink jobs in application mode on YARN and Kuber, we need to
> provide some configuration files to main class. Is there any option on
> Flink CLI  to copy local files on cluster without manually copying on DFS
> or in docker image, something like *--files* option in spark-submit?
>
>
>

Re: how to expose the current in-flight async i/o requests as metrics?

2021-11-09 Thread Piotr Nowojski

Hi All,

to me it looks like something deadlocked, maybe due to this OOM error from
Kafka, preventing a Task from making any progress. To confirm Dongwan you
could collecte stack traces while the job is in such a blocked state.
Deadlocked Kafka could easily explain those symptoms and it would be
visible as an extreme back pressure. Another thing to look at would be if
the job is making any progress or not at all (via for example
numRecordsIn/numRecordsOut metric [1]).

A couple of clarifications.

> What I suspect is the capacity of the asynchronous operation because
limiting the value can cause back-pressure once the capacity is exhausted
[1].
> Although I could increase the value (...)

If you want to decrease the impact of a backpressure, you should decrease
the capacity. Not increase it. The more in-flight records in the system,
the more records need to be processed/persisted in aligned/unaligned
checkpoints.

> As far as I can tell from looking at the code, the async operator is able
to checkpoint even if the work-queue is exhausted.

Yes and no. If work-queue is full, `AsyncWaitOperator` can be snapshoted,
but it can not be blocked inside the `AsyncWaitOperator#processElement`
method. For checkpoint to be executed, `AsyncWaitrOperator` must finish
processing the current record and return execution to the task thread. If
the work-queue is full, `AsyncWaitOperator` will block inside the
`AsyncWaitOperator#addToWorkQueue` method until the work-queue will have
capacity to accept this new element. If what I suspect is happening here is
true, and the job is deadlocked via this Kafka issue, `AsyncWaitOperator`
will be blocked indefinitely in this method.

Best,
Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.14/docs/ops/metrics/#io



wt., 9 lis 2021 o 11:55 Fabian Paul  napisał(a):

> Hi Dongwan,
>
> Can you maybe share more about the setup and how you use the AsyncFunction
> with
> the Kafka client?
>
> As David already pointed out it could be indeed a Kafka bug but it could
> also
> mean that your defined async function leaks direct memory by not freeing
> some
> resources.
>
> We can definitely improve the metrics for the AsyncFunction and expose the
> current queue size as a followup.
>
> Best,
> Fabian

Re: A savepoint was created but the corresponding job didn't terminate successfully.

2021-11-09 Thread Piotr Nowojski

Hi Dongwon,

Thanks for reporting the issue, I've created a ticket for it [1] and we
will analyse and try to fix it soon. In the meantime it should be safe for
you to ignore this problem. If this failure happens only rarely, you can
always retry stop-with-savepoint command and there should be no visible
side effects for you.

Piotrek


[1] https://issues.apache.org/jira/browse/FLINK-24846

wt., 9 lis 2021 o 03:55 Dongwon Kim  napisał(a):

> Hi community,
>
> I failed to stop a job with savepoint with the following message:
>
>> Inconsistent execution state after stopping with savepoint. At least one
>> execution is still in one of the following states: FAILED, CANCELED. A
>> global fail-over is triggered to recover the job
>> 452594f3ec5797f399e07f95c884a44b.
>>
>
> The job manager said
>
>>  A savepoint was created at
>> hdfs://mobdata-flink-hdfs/driving-habits/svpts/savepoint-452594-f60305755d0e
>> but the corresponding job 452594f3ec5797f399e07f95c884a44b didn't terminate
>> successfully.
>
> while complaining about
>
>> Mailbox is in state QUIESCED, but is required to be in state OPEN for put
>> operations.
>>
>
> Is it okay to ignore this kind of error?
>
> Please see the attached files for the detailed context.
>
> FYI,
> - I used the latest 1.14.0
> - I started the job with "$FLINK_HOME"/bin/flink run --target yarn-per-job
> - I couldn't reproduce the exception using the same jar so I might not
> able to provide DUBUG messages
>
> Best,
>
> Dongwon
>
>

Re: Beginner: guidance on long term event stream persistence and replaying

2021-11-09 Thread Piotr Nowojski

Hi Simon,

>From the top of my head I do not see a reason why this shouldn't work in
Flink. I'm not sure what your question is here.

For reading both from the FileSource and Kafka at the same time you might
want to take a look at the Hybrid Source [1]. Apart from that there are
FileSource/FileSink and KafaSource that I presume you have already found :)

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/

pon., 8 lis 2021 o 22:22 Simon Paradis  napisał(a):

> Hi,
>
> We have an event processing pipeline that populates various reports from
> different Kafka topics and would like to centralize processing in Flink. My
> team is new to Flink but we did some prototyping using Kinesis.
>
> To enable new reporting based on past events, we'd like the ability to
> replay those Kafka events when creating new reports; a capability we don't
> have today.
>
> We ingest the same topics from many Kafka clusters in different
> datacenters and it is not practical to have enough retention on these Kafka
> topics for technical reasons and also practical issues around GDPR
> compliance and Kafka's immutability (it's not an issue today because our
> Kafka retention is short).
>
> So we'd like to archive events into files that we push to AWS S3 along
> with some metadata to help implement GDPR more efficiently. I've looked
> into Avro object container files and it seems like it would work for us.
>
> I was thinking of having a dedicated Flink job reading and archiving to S3
> and somehow plug these S3 files back into a FileSource when a replay is
> needed to backfill new reporting views. S3 would contain Avro container
> files with a pattern like
>
> sourceDC__topicName__MMDDHHMM__NN.data
>
> where files are rolled over every hour or so and "rekeyed" into NN slots
> as per the event key to retain logical order while having reasonable file
> sizes.
>
> I presume someone has already done something similar. Any pointer would be
> great!
>
> --
> Simon Paradis
> paradissi...@gmail.com
>

Re: Troubleshooting checkpoint timeout

2021-10-26 Thread Piotr Nowojski

I'm glad that I could help :)

Piotrek

pon., 25 paź 2021 o 16:04 Alexis Sarda-Espinosa <
alexis.sarda-espin...@microfocus.com> napisał(a):

> Oh, I got it. I should’ve made the connection earlier after you said “Once
> an operator decides to send/broadcast a checkpoint barrier downstream, it
> just broadcasts it to all output channels”.
>
>
>
> I’ll see what I can do about upgrading the Flink version and do some more
> tests with unaligned checkpoints. Thanks again for all the info.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Piotr Nowojski 
> *Sent:* Montag, 25. Oktober 2021 15:51
> *To:* Alexis Sarda-Espinosa 
> *Cc:* Parag Somani ; Caizhi Weng <
> tsreape...@gmail.com>; Flink ML 
> *Subject:* Re: Troubleshooting checkpoint timeout
>
>
>
> Hi Alexis,
>
>
>
> >  Should I understand these metrics as a property of an operator and not
> of each subtask (at least for aligned checkpoints)? Then “first” and “last”
> would make sense to me: first/last across all subtasks/channels for a given
> operator.
>
>
>
> Those are properties of a subtask. Subtasks are a collection of chained
> parallel instances of operators. If you have a simple job like
> `source.keyBy(...).window(...).process(...)`, with parallelism of 10, you
> will have two tasks. Each task will have 10 subtasks. Each subtask will
> have only a single element operator chain, with a single operator (either
> source operator for the source task/subtasks, or window/process function
> for the second task). If you add a sink to your job
> `source.keyBy(...).window(...).process(...).addSink(...)`, this sink will
> be chained with the window/process operator. You will still end up with two
> tasks:
>
>
>
> 1. Source
> 2. Window -> Sink
>
>
>
> again, each will have 10 subtasks, with parallel instances of the
> respective operators.
>
>
>
> So if you look at the "alignment duration" of a subtask from "2. Window ->
> Sink" task, that will be the difference between receiving a first
> checkpoint barrier from any of the "1. Source" subtasks and the last
> checkpoint barrier from those "1. Source" subtasks.
>
>
>
> > Naturally, for unaligned checkpoints, alignment duration isn’t
> applicable, but what about Start Delay? I imagine that might indeed be a
> property of the subtask and not the operator.
>
> As per the docs that I've already linked [1]
>
>
> Alignment Duration: The time between processing the first and the last
> checkpoint barrier. For aligned checkpoints, during the alignment, the
> channels that have already received checkpoint barriers are blocked from
> processing more data.
>
>
>
> This number is also defined the same way for the unaligned checkpoints.
> Even with unaligned checkpoints a subtask needs to wait for receiving all
> of the checkpoint barriers before completing the checkpoint. However, as
> subtask can broadcast the checkpoint barrier downstream immediately upon
> receiving the first checkpoint barrier AND those checkpoint barriers are
> able to overtake in-flight data, the propagation happens very very quickly
> for the most part. Hence alignment duration and start delay in this case
> should be very small, unless you have deeper problems like long GC pauses.
>
> > If I’m understanding the aligned checkpoint mechanism correctly, after
> the first failure the job restarts and tries to read, let’s say, the last 5
> minutes of data. Then it fails again because the checkpoint times out and,
> after restarting, would it try to read, for example, 15 minutes of data? If
> there was no backpressure in the source, it could be that the new
> checkpoint barriers created after the first restart are behind more data
> than before it restarted, no?
>
>
>
> I'm not sure if I understand. But yes. It's a valid scenario that:
>
> 1. timestamp t1, checkpoint 42 completes
> 2. failure happens at timestamp t1 + 10 minutes.
> 3. timestamp t2, job is recovered to checkpoint 42.
>
> 4. timestamp t2 + 5 minutes, checkpoint 43 is triggered.
>
>
>
> Between 1. and 2., your job could have processed more records than between
> 3. and 4.
>
>
>
> Piotrek
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/checkpoint_monitoring/
>
>
>
> pon., 25 paź 2021 o 15:02 Alexis Sarda-Espinosa <
> alexis.sarda-espin...@microfocus.com> napisał(a):
>
> Hi again,
>
>
>
> Thanks a lot for taking the time to clarify this. I think that the main
> thing that is confusing me is that the UI shows Alignment Duration and
> other checkpoint metrics for each subtask, and

Re: Troubleshooting checkpoint timeout

2021-10-25 Thread Piotr Nowojski

Hi Alexis,

>  Should I understand these metrics as a property of an operator and not
of each subtask (at least for aligned checkpoints)? Then “first” and “last”
would make sense to me: first/last across all subtasks/channels for a given
operator.

Those are properties of a subtask. Subtasks are a collection of chained
parallel instances of operators. If you have a simple job like
`source.keyBy(...).window(...).process(...)`, with parallelism of 10, you
will have two tasks. Each task will have 10 subtasks. Each subtask will
have only a single element operator chain, with a single operator (either
source operator for the source task/subtasks, or window/process function
for the second task). If you add a sink to your job
`source.keyBy(...).window(...).process(...).addSink(...)`, this sink will
be chained with the window/process operator. You will still end up with two
tasks:

1. Source
2. Window -> Sink

again, each will have 10 subtasks, with parallel instances of the
respective operators.

So if you look at the "alignment duration" of a subtask from "2. Window ->
Sink" task, that will be the difference between receiving a first
checkpoint barrier from any of the "1. Source" subtasks and the last
checkpoint barrier from those "1. Source" subtasks.

> Naturally, for unaligned checkpoints, alignment duration isn’t
applicable, but what about Start Delay? I imagine that might indeed be a
property of the subtask and not the operator.

As per the docs that I've already linked [1]

Alignment Duration: The time between processing the first and the last
checkpoint barrier. For aligned checkpoints, during the alignment, the
channels that have already received checkpoint barriers are blocked from
processing more data.

This number is also defined the same way for the unaligned checkpoints.
Even with unaligned checkpoints a subtask needs to wait for receiving all
of the checkpoint barriers before completing the checkpoint. However, as
subtask can broadcast the checkpoint barrier downstream immediately upon
receiving the first checkpoint barrier AND those checkpoint barriers are
able to overtake in-flight data, the propagation happens very very quickly
for the most part. Hence alignment duration and start delay in this case
should be very small, unless you have deeper problems like long GC pauses.

> If I’m understanding the aligned checkpoint mechanism correctly, after
the first failure the job restarts and tries to read, let’s say, the last 5
minutes of data. Then it fails again because the checkpoint times out and,
after restarting, would it try to read, for example, 15 minutes of data? If
there was no backpressure in the source, it could be that the new
checkpoint barriers created after the first restart are behind more data
than before it restarted, no?

I'm not sure if I understand. But yes. It's a valid scenario that:

1. timestamp t1, checkpoint 42 completes
2. failure happens at timestamp t1 + 10 minutes.
3. timestamp t2, job is recovered to checkpoint 42.
4. timestamp t2 + 5 minutes, checkpoint 43 is triggered.

Between 1. and 2., your job could have processed more records than between
3. and 4.

Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/checkpoint_monitoring/

pon., 25 paź 2021 o 15:02 Alexis Sarda-Espinosa <
alexis.sarda-espin...@microfocus.com> napisał(a):

> Hi again,
>
>
>
> Thanks a lot for taking the time to clarify this. I think that the main
> thing that is confusing me is that the UI shows Alignment Duration and
> other checkpoint metrics for each subtask, and the resources you’ve sent
> always discuss a single barrier per subtask channel. Should I understand
> these metrics as a property of an operator and not of each subtask (at
> least for aligned checkpoints)? Then “first” and “last” would make sense to
> me: first/last across all subtasks/channels for a given operator.
>
>
>
> Naturally, for unaligned checkpoints, alignment duration isn’t applicable,
> but what about Start Delay? I imagine that might indeed be a property of
> the subtask and not the operator.
>
>
>
> With respect to my problem, I can also add that my job reads data from
> Pulsar, so some of it is buffered in the message bus. If I’m understanding
> the aligned checkpoint mechanism correctly, after the first failure the job
> restarts and tries to read, let’s say, the last 5 minutes of data. Then it
> fails again because the checkpoint times out and, after restarting, would
> it try to read, for example, 15 minutes of data? If there was no
> backpressure in the source, it could be that the new checkpoint barriers
> created after the first restart are behind more data than before it
> restarted, no?
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Piotr Nowojski 
> *Sent:* Montag, 25. Oktober 2021 13:

Re: Troubleshooting checkpoint timeout

2021-10-25 Thread Piotr Nowojski

r timestamps lagging clock time by up to 1 hour. Since the
> logs don’t indicate the operator’s logic takes a significant amount of time
> and CPU is far below the available limit (the single TM barely uses more
> than 1 CPU out of 4), I’d guess the lag could be related to checkpoint
> alignment, which takes me to my questions:
>
>
>
>1. The documentation states “Operators that receive more than one
>input stream need to align the input streams on the snapshot barriers”. If
>an operator has parallelism > 1, does that count as more than one stream?
>Or is there a single output barrier for all subtask outputs that gets
>“copied” to all downstream subtask inputs?
>2. Similarly, alignment duration is said to be “The time between
>processing the first and the last checkpoint barrier”. What exactly is the
>interpretation of “first” and “last” here? Do they relate to a checkpoint
>“n” where “first” would be the barrier for n-1 and “last” the one for n?
>3. Start delay also refers to the “first checkpoint barrier to reach
>this subtask”. As before, what is “first” in this context?
>4. Maybe this will be answered by the previous questions, but what
>happens to barriers if a downstream operator has lower parallelism?
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Piotr Nowojski 
> *Sent:* Montag, 25. Oktober 2021 09:59
> *To:* Alexis Sarda-Espinosa 
> *Cc:* Parag Somani ; Caizhi Weng <
> tsreape...@gmail.com>; Flink ML 
> *Subject:* Re: Troubleshooting checkpoint timeout
>
>
>
> Hi Alexis,
>
>
>
> You can read about those metrics in the documentation [1]. Long alignment
> duration and start delay almost always come together. High values indicate
> long checkpoint barrier propagation times through the job graph, that's
> always (at least so far I haven't seen a different reason) caused by the
> same thing: backpressure. Which brings me to
>
>
>
> > There is no backpressure in any operator.
>
>
>
> Why do you think so?
>
>
>
> For analysing backpressure I would highly recommend upgrading to Flink 1.13.x
> as it has greatly improved tooling for that [2]. Since Flink 1.10 I
> believe you can use the `isBackPressured` metric. In previous versions you
> would have to rely on buffer usage metrics as described here [3].
>
>
>
> If this is indeed a problem with a backpressure, there are three things
> you could do to improve checkpointing time:
>
> a) Reduce the backpressure, either by optimising your job/code or scaling
> up.
>
> b) Reduce the amount of in-flight data. Since Flink 1.14.x, Flink can do
> it automatically when buffer debloating is enabled, but the same
> principle could be used to manually and statically configure cluster to
> have less in-flight data. You can read about this here [4].
>
> c) Enabled unaligned checkpoints [5].
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/checkpoint_monitoring/
>
> [2] https://flink.apache.org/2021/07/07/backpressure.html
>
> [3] https://flink.apache.org/2019/07/23/flink-network-stack-2.html
> #network-metrics
>
> [4]
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/memory/network_mem_tuning/#the-buffer-debloating-mechanism
>
> [5]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#unaligned-checkpoints
>
>
>
> Best,
>
> Piotrek
>
>
>
> czw., 21 paź 2021 o 19:00 Alexis Sarda-Espinosa <
> alexis.sarda-espin...@microfocus.com> napisał(a):
>
> I would really appreciate more fine-grained information regarding the
> factors that can affect a checkpoint’s:
>
>
>
>- Sync duration
>- Async duration
>- Alignment duration
>- Start delay
>
>
>
> Otherwise those metrics don’t really help me know in which areas to look
> for issues.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Alexis Sarda-Espinosa 
> *Sent:* Mittwoch, 20. Oktober 2021 09:43
> *To:* Parag Somani ; Caizhi Weng <
> tsreape...@gmail.com>
> *Cc:* Flink ML 
> *Subject:* RE: Troubleshooting checkpoint timeout
>
>
>
> Currently the windows are 10 minutes in size with a 1-minute slide time.
> The approximate 500 event/minute throughput is already rather high for my
> use case, so I don’t expect it to be higher, but I would imagine that’s
> still pretty low.
>
>
>
> I did have some issues with storage space, and I wouldn’t be surprised if
> there is an IO bottleneck in my dev environment, but then my main question
> would be: if IO is being throttled, could that result in the high “start
> delay” times I obse

Re: Troubleshooting checkpoint timeout

2021-10-25 Thread Piotr Nowojski

Hi Alexis,

You can read about those metrics in the documentation [1]. Long alignment
duration and start delay almost always come together. High values indicate
long checkpoint barrier propagation times through the job graph, that's
always (at least so far I haven't seen a different reason) caused by the
same thing: backpressure. Which brings me to

> There is no backpressure in any operator.

Why do you think so?

For analysing backpressure I would highly recommend upgrading to Flink 1.13.x
as it has greatly improved tooling for that [2]. Since Flink 1.10 I believe
you can use the `isBackPressured` metric. In previous versions you would
have to rely on buffer usage metrics as described here [3].

If this is indeed a problem with a backpressure, there are three things you
could do to improve checkpointing time:
a) Reduce the backpressure, either by optimising your job/code or scaling
up.
b) Reduce the amount of in-flight data. Since Flink 1.14.x, Flink can do it
automatically when buffer debloating is enabled, but the same
principle could be used to manually and statically configure cluster to
have less in-flight data. You can read about this here [4].
c) Enabled unaligned checkpoints [5].

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/monitoring/checkpoint_monitoring/
[2] https://flink.apache.org/2021/07/07/backpressure.html
[3] https://flink.apache.org/2019/07/23/flink-network-stack-2.html
#network-metrics
[4]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/memory/network_mem_tuning/#the-buffer-debloating-mechanism
[5]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#unaligned-checkpoints

Best,
Piotrek

czw., 21 paź 2021 o 19:00 Alexis Sarda-Espinosa <
alexis.sarda-espin...@microfocus.com> napisał(a):

> I would really appreciate more fine-grained information regarding the
> factors that can affect a checkpoint’s:
>
>
>
>- Sync duration
>- Async duration
>- Alignment duration
>- Start delay
>
>
>
> Otherwise those metrics don’t really help me know in which areas to look
> for issues.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Alexis Sarda-Espinosa 
> *Sent:* Mittwoch, 20. Oktober 2021 09:43
> *To:* Parag Somani ; Caizhi Weng <
> tsreape...@gmail.com>
> *Cc:* Flink ML 
> *Subject:* RE: Troubleshooting checkpoint timeout
>
>
>
> Currently the windows are 10 minutes in size with a 1-minute slide time.
> The approximate 500 event/minute throughput is already rather high for my
> use case, so I don’t expect it to be higher, but I would imagine that’s
> still pretty low.
>
>
>
> I did have some issues with storage space, and I wouldn’t be surprised if
> there is an IO bottleneck in my dev environment, but then my main question
> would be: if IO is being throttled, could that result in the high “start
> delay” times I observe? That seems to be the main slowdown, so I just want
> to be sure I’m looking in the right direction.
>
>
>
> I’d like to mention another thing about my pipeline’s structure in case
> it’s relevant, although it may be completely unrelated. I said that I
> specify the windowing properties once (windowedStream in my 1st e-mail)
> and use it twice, but it’s actually used 3 times. In addition to the 2
> ProcessWindowFunctions that end in sinks, the stream is also joined with a
> side output:
>
>
>
> openedEventsTimestamped = openedEvents
>
> .getSideOutput(…)
>
> .keyBy(keySelector)
>
> .assignTimestampsAndWatermarks(watermarkStrategy)
>
>
>
> windowedStream
>
> .process(ProcessWindowFunction3())
>
> .keyBy(keySelector)
>
>
> .connect(DataStreamUtils.reinterpretAsKeyedStream(openedEventsTimestamped,
> keySelector))
>
> .process(...)
>
>
>
> Could this lead to delays or alignment issues?
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Parag Somani 
> *Sent:* Mittwoch, 20. Oktober 2021 09:22
> *To:* Caizhi Weng 
> *Cc:* Alexis Sarda-Espinosa ; Flink
> ML 
> *Subject:* Re: Troubleshooting checkpoint timeout
>
>
>
> I had similar problem, where i have concurrent two checkpoints were
> configured. Also, i used to save it in S3(using minio) on k8s 1.18 env.
>
>
>
> Flink service were getting restarted and timeout was happening. It got
> resolved:
>
> 1. As minio ran out of disk space, caused failure of checkpoints(this was
> the main cause).
>
> 2. Added duration/interval of checkpoint parameter to address it
>
> execution.checkpointing.max-concurrent-checkpoints and
> execution.checkpointing.min-pause
>
> Details of same at:
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#checkpointing
>
>
>
>
>
> On Wed, Oct 20, 2021 at 7:50 AM Caizhi Weng  wrote:
>
> Hi!
>
>
>
> I see you're using sliding event time windows. What's the exact value of
> windowLengthMinutes and windowSlideTimeMinutes? If windowLengthMinutes is
> large and windowSlideTimeMinutes is small then each record may be assigned
> to a large number of

Re: Empty Kafka topics and watermarks

2021-10-11 Thread Piotr Nowojski

Great, thanks!

pon., 11 paź 2021 o 17:24 James Sandys-Lumsdaine 
napisał(a):

> Ah thanks for the feedback. I can work around for now but will upgrade as
> soon as I can to the latest version.
>
> Thanks very much,
>
> James.
> --
> *From:* Piotr Nowojski 
> *Sent:* 08 October 2021 13:17
> *To:* James Sandys-Lumsdaine 
> *Cc:* user@flink.apache.org 
> *Subject:* Re: Empty Kafka topics and watermarks
>
> Hi James,
>
> I believe you have encountered a bug that we have already fixed [1]. The
> small problem is that in order to fix this bug, we had to change some
> `@PublicEvolving` interfaces and thus we were not able to backport this fix
> to earlier minor releases. As such, this is only fixed starting from 1.14.x.
>
> Best,
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/FLINK-18934
>
> pt., 8 paź 2021 o 11:55 James Sandys-Lumsdaine 
> napisał(a):
>
> Hi everyone,
>
> I'm putting together a Flink workflow that needs to merge historic data
> from a custom JDBC source with a Kafka flow (for the realtime data). I have
> successfully written the custom JDBC source that emits a watermark for the
> last event time after all the DB data has been emitted but now I face a
> problem when joining with data from the Kafka stream.
>
> I register a timer in my KeyedCoProcessFunction joining the DB stream
> with live Kafka stream so I can emit all the "batch" data from the DB in
> one go when completely read up to the watermark but the timer never fires
> as the Kafka stream is empty and therefore doesn't emit a watermark. My
> Kafka stream will allowed to be empty since all the data will have been
> retrieved from the DB call so I only expect new events to appear over
> Kafka. Note that if I replace the Kafka input with a simple
> env.fromCollection(...) empty list then the timer triggers fine as Flink
> seems to detect it doesn't need to wait for any input from stream 2. So it
> seems to be something related to the Kafka stream status that is preventing
> the watermark from advancing in the KeyedCoProcessFunction.
>
> I have tried configuring the Kafka stream timestamp and watermark
> strategies to so the source is marked as idle after 10 seconds but still it
> seems the watermark in the join operator combining these 2 streams is not
> advancing. (See example code below).
>
> Maybe this is my bad understanding but I thought if an input stream into a
> KeyedCoProcessFunction is idle then it wouldn't be considered by the
> operator for forwarding the watermark i.e. it would forward the non-idle
> input stream's watermark and not do a min(stream1WM, stream2WM). With the
> below example I never see the onTimer fire and the only effect the
> withIdleness() strategy has is to stop the print statements in
> onPeriodicEmit() happening after 5 seconds (env periodic emit is set to the
> default 200ms so I see 25 rows before it stops).
>
> The only way I can get my KeyedCoProcessFunction timer to fire is to force
> an emit of the watermark I want in the onPeriodicEmit() after x numbers of
> attempts to advance an initial watermark i.e. if onPeriodicEmit() is called
> 100 times and the "latestWatermark" is still Long.MIN_VALUE then I emit the
> watermark I want so the join can progress. This seems like a nasty hack to
> me but perhaps something like this is actually necessary?
>
> I am currently using Flink 1.12.3, a Confluent Kafka client 6.1.1. Any
> pointers would be appreciated.
>
> Thanks in advance,
>
> James.
>
> FlinkKafkaConsumer positionsFlinkKafkaConsumer = new
> FlinkKafkaConsumer<>("poc.positions",
> ConfluentRegistryAvroDeserializationSchema.forSpecific(Position.class,
> SchemaRegistryURL), kafkaConsumerProperties);
>
> positionsFlinkKafkaConsumer.setStartFromEarliest();
>
> positionsFlinkKafkaConsumer.assignTimestampsAndWatermarks(
>
>new WatermarkStrategy() {
>
>   @Override
>
>   public TimestampAssigner
> createTimestampAssigner(TimestampAssignerSupplier.Context context) {
>
> return (event, recordTimestamp) -> {
>
> return event.getPhysicalFrom();
>
> };
>
> }
>
>
>
>   @Override
>
>   public WatermarkGenerator
> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
>
> return new WatermarkGenerator() {
>
> public long latestWatermark = Long.MIN_VALUE;
>
>
>
> @Override
>
> public void onEvent(Position event, long
> tim

Re: Empty Kafka topics and watermarks

2021-10-08 Thread Piotr Nowojski

Hi James,

I believe you have encountered a bug that we have already fixed [1]. The
small problem is that in order to fix this bug, we had to change some
`@PublicEvolving` interfaces and thus we were not able to backport this fix
to earlier minor releases. As such, this is only fixed starting from 1.14.x.

Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-18934

pt., 8 paź 2021 o 11:55 James Sandys-Lumsdaine 
napisał(a):

> Hi everyone,
>
> I'm putting together a Flink workflow that needs to merge historic data
> from a custom JDBC source with a Kafka flow (for the realtime data). I have
> successfully written the custom JDBC source that emits a watermark for the
> last event time after all the DB data has been emitted but now I face a
> problem when joining with data from the Kafka stream.
>
> I register a timer in my KeyedCoProcessFunction joining the DB stream
> with live Kafka stream so I can emit all the "batch" data from the DB in
> one go when completely read up to the watermark but the timer never fires
> as the Kafka stream is empty and therefore doesn't emit a watermark. My
> Kafka stream will allowed to be empty since all the data will have been
> retrieved from the DB call so I only expect new events to appear over
> Kafka. Note that if I replace the Kafka input with a simple
> env.fromCollection(...) empty list then the timer triggers fine as Flink
> seems to detect it doesn't need to wait for any input from stream 2. So it
> seems to be something related to the Kafka stream status that is preventing
> the watermark from advancing in the KeyedCoProcessFunction.
>
> I have tried configuring the Kafka stream timestamp and watermark
> strategies to so the source is marked as idle after 10 seconds but still it
> seems the watermark in the join operator combining these 2 streams is not
> advancing. (See example code below).
>
> Maybe this is my bad understanding but I thought if an input stream into a
> KeyedCoProcessFunction is idle then it wouldn't be considered by the
> operator for forwarding the watermark i.e. it would forward the non-idle
> input stream's watermark and not do a min(stream1WM, stream2WM). With the
> below example I never see the onTimer fire and the only effect the
> withIdleness() strategy has is to stop the print statements in
> onPeriodicEmit() happening after 5 seconds (env periodic emit is set to the
> default 200ms so I see 25 rows before it stops).
>
> The only way I can get my KeyedCoProcessFunction timer to fire is to force
> an emit of the watermark I want in the onPeriodicEmit() after x numbers of
> attempts to advance an initial watermark i.e. if onPeriodicEmit() is called
> 100 times and the "latestWatermark" is still Long.MIN_VALUE then I emit the
> watermark I want so the join can progress. This seems like a nasty hack to
> me but perhaps something like this is actually necessary?
>
> I am currently using Flink 1.12.3, a Confluent Kafka client 6.1.1. Any
> pointers would be appreciated.
>
> Thanks in advance,
>
> James.
>
> FlinkKafkaConsumer positionsFlinkKafkaConsumer = new
> FlinkKafkaConsumer<>("poc.positions",
> ConfluentRegistryAvroDeserializationSchema.forSpecific(Position.class,
> SchemaRegistryURL), kafkaConsumerProperties);
>
> positionsFlinkKafkaConsumer.setStartFromEarliest();
>
> positionsFlinkKafkaConsumer.assignTimestampsAndWatermarks(
>
>new WatermarkStrategy() {
>
>   @Override
>
>   public TimestampAssigner
> createTimestampAssigner(TimestampAssignerSupplier.Context context) {
>
> return (event, recordTimestamp) -> {
>
> return event.getPhysicalFrom();
>
> };
>
> }
>
>
>
>   @Override
>
>   public WatermarkGenerator
> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
>
> return new WatermarkGenerator() {
>
> public long latestWatermark = Long.MIN_VALUE;
>
>
>
> @Override
>
> public void onEvent(Position event, long
> timestamp, WatermarkOutput output) {
>
> long eventWatermark =
> event.getPhysicalFrom();
>
> if (eventWatermark > latestWatermark)
>
> latestWatermark = eventWatermark;
>
> }
>
>
>
> @Override
>
> public void onPeriodicEmit(WatermarkOutput
> output) {
>
> System.out.printf("Emitting watermark
> %d\n", latestWatermark);
>
> output.emitWatermark(new
> Watermark(latestWatermark));
>
> }
>
> };
>
> }
>
> }.withIdleness(Duration.ofSeconds(5)));
>
>
>
> DataStream positionKafkaInputStream =
>

Re: How to ugrade JobManagerCommunicationUtils from FLink 1.4 to Flink 1.5?

2021-10-08 Thread Piotr Nowojski

Hi,

`JobManagerCommunicationUtils` was never part of Flink's API. It was an
internal class, for our internal unit tests. Note that Flink's public API
is annotated with `@Public`, `@PublicEvolving` or `@Experimental`. Anything
else by default is internal (sometimes to avoid confusion we are annotating
internal classes with `@Internal`).

JobManagerCommunicationUtils seems to be replaced with
`MiniClusterResource` [1] as part of [2]. Note that MiniClusterResource is
also not a public API, so it's subject to change or to be completely
removed without warning. You can read about how to test your applications
here [3], and about `MiniClusterResource` in particular here [4].

Piotrek

PS Flink 1.5 is also not supported for something like 2 years now?
Currently officially supported Flink versions are 1.13, and 1.14 so I would
encourage you to upgrade to one of those.

[1]
https://github.com/apache/flink/commits/release-1.5/flink-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/JobManagerCommunicationUtils.java
[2] https://issues.apache.org/jira/browse/FLINK-8703
[3]
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/testing/
[4]
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/testing/#testing-flink-jobs

pt., 8 paź 2021 o 10:11 Felipe Gutierrez 
napisał(a):

> Hello there,
>
> what is the replacement from Flink 1.4 to Flink 1.5 of the class
> JobManagerCommunicationUtils.java [1] below?
>
> JobManagerCommunicationUtils.cancelCurrentJob
> JobManagerCommunicationUtils.waitUntilNoJobIsRunning
>
> I want to upgrade Flink from 1.4 to 1.5 but I cant find this class in the
> docs of the previous version [2] neither on the next version [3]. My plan
> is also to upgrade Flink to the lates version. But if I cannot find a way
> to the next version 1.4 -> 1.5, I suppose that for a greater it will be
> even more difficult.
>
> [1]
> https://github.com/a0x8o/flink/blob/master/flink-streaming-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/JobManagerCommunicationUtils.java#L38-L60
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/connectors/kafka/package-summary.html
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/streaming/connectors/kafka/package-summary.html
>
> Thanks in advance,
> Felipe
>
> *--*
> *-- Felipe Gutierrez*
>

Re: Event is taking a lot of time between the operators

2021-09-29 Thread Piotr Nowojski

Hi Sanket,

As I mentioned in the previous email, it's most likely still an issue of
backpressure and you can check it as I described in that message. Either
your records are stuck in the network buffers between (I) to async
operations (if there is a network exchange), and/or inside the
`AsyncWaitOperator`'s internal queue (II). If it's causing you problems

I. For the former problem (network buffers) you can:
a) get rid of the network exchange, via removing keyBy/shuffle/rebalance
(might not be feasible, depending on your business logic)
b) reduce the amount of the in-flight data. In Flink 1.14 we are adding
automatic buffer debloating mechanism, in Flink 1.8 you can not use, but
you could manually tweak both amount and the size of the buffers. You can
read about it here [1], just ignore the automatic buffer debloating
mechanism.
II. You can change the size of the internal queue by adjusting the
`capacity` parameter [2]

The more buffered in-flight data you have between operators, the longer the
delay between processing the same record by two different operators.

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/network_mem_tuning/
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api



śr., 29 wrz 2021 o 08:20 Sanket Agrawal 
napisał(a):

> Hi Ragini,
>
>
>
> For measuring time in an async we have put a logger as the first and the
> last statement in asyncInvoke and for measuring time between the asyncs
> we are simply subtracting the message2’s start time and message1’s end
> time. Also, we are using 1 as the parallelism.
>
>
>
> Please let me know if you need any other information or if you have any
> recommendations on improving the approach.
>
>
>
> Thanks,
>
> Sanket Agrawal
>
>
>
> *From:* Ragini Manjaiah 
> *Sent:* Wednesday, September 29, 2021 11:17 AM
> *To:* Sanket Agrawal 
> *Cc:* Piotr Nowojski ; user@flink.apache.org
> *Subject:* Re: Event is taking a lot of time between the operators
>
>
>
> [**EXTERNAL EMAIL**]
>
> Hi Sanket,
>
>  I have a similar use case. how are you measuring the time for Async1`
> function to return the result and external api call
>
>
>
> On Wed, Sep 29, 2021 at 10:47 AM Sanket Agrawal <
> sanket.agra...@infosys.com> wrote:
>
> Hi @Piotr Nowojski ,
>
>
>
> Thank you for replying back. Yes, first async is taking between 1300-1500
> milliseconds but that is called on a CompletableFuture.*supplyAsync *and
> the Async Capacity is set to 1000.
>
>
>
> *Async Code Structure*: Inside asyncInvoke we are calling
> CompletableFuture.*supplyAsync *and inside* supplyAsync *we are calling
> an external API which is taking around 1005ms to 1040ms. Rest of the code
> for request creation/response validation is also inside the* supplyAsync *and
> is taking around 250ms.
>
>
>
> This way we tried that the main Async thread(as the async does not uses
> multiple threads directly) is available for the next message as soon as it
> calls CompletableFuture.supplyAsync on the current message.
>
>
>
> Thanks,
>
> Sanket Agrawal
>
>
>
> *From:* Piotr Nowojski 
> *Sent:* Tuesday, September 28, 2021 8:02 PM
> *To:* Sanket Agrawal 
> *Cc:* user@flink.apache.org
> *Subject:* Re: Event is taking a lot of time between the operators
>
>
>
> [**EXTERNAL EMAIL**]
>
> Hi,
>
>
>
> With Flink 1.8.0 I'm not sure how reliable the backpressure status is in
> the WebUI when it comes to the Async operators. If I remember correctly
> until around Flink 1.10 (+/- 2 version) backpressure monitoring was
> checking for thread dumps stuck in requesting Flink's network memory
> buffers. If in your job AsyncFunction is the source of a backpressure, it
> would be skipped and not reported. For analysing backpressure I would
> highly recommend upgrading to Flink 1.13.x as it has greatly improved
> tooling for that [1]. And in that version AsynFunctions are definitely
> handled correctly. Since Flink 1.10 I believe you can use the
> `isBackPressured` metric. In previous versions you would have to rely on
> buffer usage metrics as described here [2].
>
>
>
>
>
> [1] https://flink.apache.org/2021/07/07/backpressure.html
> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2021%2F07%2F07%2Fbackpressure.html=04%7C01%7Csanket.agrawal%40infosys.com%7C3c523d23c78d85b908d9830caf3e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637684912775947321%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=%2BzEolZuvsAgudziPqGzqFuDGRZbHR2hu9D%2F9rERLwk8%3D=0>
>
> [2]
> https://flink.apache.org/2019/07/23/flink-n

Re: Event is taking a lot of time between the operators

2021-09-28 Thread Piotr Nowojski

Hi,

With Flink 1.8.0 I'm not sure how reliable the backpressure status is in
the WebUI when it comes to the Async operators. If I remember correctly
until around Flink 1.10 (+/- 2 version) backpressure monitoring was
checking for thread dumps stuck in requesting Flink's network memory
buffers. If in your job AsyncFunction is the source of a backpressure, it
would be skipped and not reported. For analysing backpressure I would
highly recommend upgrading to Flink 1.13.x as it has greatly improved
tooling for that [1]. And in that version AsynFunctions are definitely
handled correctly. Since Flink 1.10 I believe you can use the
`isBackPressured` metric. In previous versions you would have to rely on
buffer usage metrics as described here [2].


[1] https://flink.apache.org/2021/07/07/backpressure.html
[2]
https://flink.apache.org/2019/07/23/flink-network-stack-2.html#network-metrics

Apart of the back pressure, part of the problem might be simply how long
does it take for `Async1` function to return the result. Have you checked
that? Isn't it taking a couple of seconds?

Best,
Piotrek

wt., 28 wrz 2021 o 15:55 Sanket Agrawal 
napisał(a):

> Hi All,
>
>
>
> I am new to Flink. While developing a Flink application We observed that
> our message is taking around 10 seconds between the two Async operators.
> Below are the details.
>
>
>
>- *Flink Flow*: Kinesis Source -> Process -> Async1 -> Async2 ->
>Process -> Kinesis Sink
>- *Environment*: Amazon KDA. 1 Kinesis Processing Unit (1vCore & 4GB
>ram), and 1 parallelism.
>- *Flink Version*: 1.8.0
>- *Backpressure*: Flink dashboard shows that backpressure is *OK.*
>- *Input rate: *60 messages per second.
>
>
>
> Any kind of pointers/help will be very useful.
>
>
>
> Thanks,
>
> Sanket Agrawal
>
>
>

Re: State processor API very slow reading a keyed state with RocksDB

2021-09-09 Thread Piotr Nowojski

Hi David,

I can confirm that I'm able to reproduce this behaviour. I've tried
profiling/flame graphs and I was not able to make much sense out of those
results. There are no IO/Memory bottlenecks that I could notice, it looks
indeed like the Job is stuck inside RocksDB itself. This might be an issue
with for example memory configuration. Streaming jobs and State Processor
API are running in very different environments as the latter one is using
DataSet API under the hood, so maybe that can explain this? However I'm no
expert in neither DataSet API nor the RocksDB, so it's hard for me to make
progress here.

Maybe someone else can help here?

Piotrek


śr., 8 wrz 2021 o 14:45 David Causse  napisał(a):

> Hi,
>
> I'm investigating why a job we use to inspect a flink state is a lot
> slower than the bootstrap job used to generate it.
>
> I use RocksdbDB with a simple keyed value state mapping a string key to a
> long value. Generating the bootstrap state from a CSV file with 100M
> entries takes a couple minutes over 12 slots spread over 3 TM (4Gb
> allowed). But another job that does the opposite (converts this state into
> a CSV file) takes several hours. I would have expected these two job
> runtimes to be in the same ballpark.
>
> I wrote a simple test case[1] to reproduce the problem. This program has 3
> jobs:
> - CreateState: generate a keyed state (string->long) using the state
> processor api
> - StreamJob: reads all the keys using a StreamingExecutionEnvironment
> - ReadState: reads all the keys using the state processor api
>
> Running with 30M keys and (12 slots/3TM with 4Gb each) CreateState &
> StreamJob are done in less than a minute.
> ReadState is much slower (> 30minutes) on my system. The RocksDB state
> appears to be restored relatively quickly but after that the slots are
> performing at very different speeds. Some slots finish quickly but some
> others struggle to advance.
> Looking at the thread dumps I always see threads in
> org.rocksdb.RocksDB.get:
>
> "DataSource (at readKeyedState(ExistingSavepoint.java:314)
> (org.apache.flink.state.api.input.KeyedStateInputFormat)) (12/12)#0" Id=371
> RUNNABLE
> at org.rocksdb.RocksDB.get(Native Method)
> at org.rocksdb.RocksDB.get(RocksDB.java:2084)
> at
> org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:83)
> at org.wikimedia.flink.StateReader.readKey(ReadState.scala:38)
> at org.wikimedia.flink.StateReader.readKey(ReadState.scala:32)
> at
> org.apache.flink.state.api.input.operator.KeyedStateReaderOperator.processElement(KeyedStateReaderOperator.java:76)
> at
> org.apache.flink.state.api.input.operator.KeyedStateReaderOperator.processElement(KeyedStateReaderOperator.java:51)
> at
> org.apache.flink.state.api.input.KeyedStateInputFormat.nextRecord(KeyedStateInputFormat.java:228)
>
> It seems suspiciously slow to me and I'm wondering if I'm missing
> something in the way the state processor api works.
>
> Thanks for your help!
>
> David.
>
> 1: https://github.com/nomoa/rocksdb-state-processor-test
>

Re: Duplicate copies of job in Flink UI/API

2021-09-08 Thread Piotr Nowojski

Hi Peter,

Can you provide relevant JobManager logs? And can you write down what steps
have you taken before the failure happened? Did this failure occur during
upgrading Flink, or after the upgrade etc.

Best,
Piotrek

śr., 8 wrz 2021 o 16:11 Peter Westermann 
napisał(a):

> We recently upgraded from Flink 1.12.4 to 1.12.5 and are seeing some weird
> behavior after a change in jobmanager leadership: We’re seeing two copies
> of the same job, one of those is in SUSPENDED state and has a start time of
> zero. Here’s the output from the /jobs/overview endpoint:
>
> {
>
>   "jobs": [{
>
> "jid": "2db4ee6397151a1109d1ca05188a4cbb",
>
> "name": "analytics-flink-v1",
>
> "state": "RUNNING",
>
> "start-time": 1631106146284,
>
> "end-time": -1,
>
> "duration": 2954642,
>
> "last-modification": 1631106152322,
>
> "tasks": {
>
>   "total": 112,
>
>   "created": 0,
>
>   "scheduled": 0,
>
>   "deploying": 0,
>
>   "running": 112,
>
>   "finished": 0,
>
>   "canceling": 0,
>
>   "canceled": 0,
>
>   "failed": 0,
>
>   "reconciling": 0
>
> }
>
>   }, {
>
> "jid": "2db4ee6397151a1109d1ca05188a4cbb",
>
> "name": "analytics-flink-v1",
>
> "state": "SUSPENDED",
>
> "start-time": 0,
>
> "end-time": -1,
>
> "duration": 1631105900760,
>
> "last-modification": 0,
>
> "tasks": {
>
>   "total": 0,
>
>   "created": 0,
>
>   "scheduled": 0,
>
>   "deploying": 0,
>
>   "running": 0,
>
>   "finished": 0,
>
>   "canceling": 0,
>
>   "canceled": 0,
>
>   "failed": 0,
>
>   "reconciling": 0
>
> }
>
>   }]
>
> }
>
>
>
> Has anyone seen this behavior before?
>
>
>
> Thanks,
>
> Peter
>

Re: Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-14 Thread Piotr Nowojski

Hi,

FYI, the performance regression after upgrading RocksDB was clearly visible
in all of our RocksDB related benchmarks, like for example:

http://codespeed.dak8s.net:8000/timeline/?ben=stateBackends.ROCKS=2
http://codespeed.dak8s.net:8000/timeline/?ben=stateBackends.ROCKS_INC=2
(and many more in the State Backends executable)

It's 6% to 12% across the board.

Best,
Piotrek


śr., 11 sie 2021 o 13:42 张蛟  napisał(a):

> Hi, Nico and yun:
>Thanks for your great work and detail description on rocksdb
> version upgrade and performance. About 800 jobs are using rocksdb state
> backend in our production environment, and we
> plan to upgrade more aim to solve the gc problems caused by large
> states.Because of non-restrict memory control on rocksdb, we have to spend
> a lot of time to solve the problem of memory usage beyond the physical
> memory.With the support of strict block cache, things will become much
> easier. Also, delete range api is useful for us too, so we prefer to
> upgrade the rocksdb to the new release version and +1(non-binding). best,
> zlzhang0122
>
> At 2021-08-05 01:50:07, "Yun Tang"  wrote:
> >Hi Yuval,
> >
> >Upgrading RocksDB version is a long story since Flink-1.10.
> >When we first plan to introduce write buffer manager to help control the
> memory usage of RocksDB, we actually wanted to bump up to RocksDB-5.18 from
> current RocksDB-5.17. However, we found performance regression in our micro
> benchmark on state operations [1] if bumped to RocksDB-5.18. We did not
> figure the root cause at that time and decide to cherry pick the commits of
> write buffer manager to our own FRocksDB [2]. And we finally released our
> own frocksdbjni-5.17.2-artisans-2.0 at that time.
> >
> >As time goes no, more and more bugs or missed features have been reported
> in the old RocksDB version. Such as:
> >
> >  1.  Cannot support ARM platform [3]
> >  2.  Dose not have stable deleteRange API, which is useful for Flink
> scale out [4]
> >  3.  Cannot support strict block cache [5]
> >  4.  Checkpoint might stuck if using UNIVERSVAL compaction strategy [6]
> >  5.  Uncontrolled log size make us disabled the RocksDB internal LOG [7]
> >  6.  RocksDB's optimizeForPointLookup option might cause data lost [8]
> >  7.  Current dummy entry used for memory control in RocksDB-5.17 is too
> large, leading performance problem [9]
> >  8.  Cannot support alpine-based images.
> >  9.  ...
> >
> >Some of the bugs are walked around, and some are still open.
> >
> >And we decide to make some changes from Flink-1.12. First of all, we
> reported the performance regression problem compared with RocksDB-5.18 and
> RocksDB-5.17 to RocksDB community [10]. However, as RocksDB-5.x versions
> are a bit older for the community, and RocksJava usage might not be the
> core part for facebook guys, we did not get useful replies. Thus, we decide
> to figure out the root cause of performance regression by ourself.
> >Fortunately, we find the cause via binary search the commits among
> RocksDB-5.18 and RocksDB-5.17, and updated in the original thread [10]. To
> be short, the performance regression is due to different implementation of
> `__thread` and `thread_local` in gcc and would have more impact on dynamic
> loading [11], which is also what current RocksJava jar package does. With
> my patch [12], the performance regression would disappear if comparing
> RocksDB-5.18 with RocksDB-5.17.
> >
> >Unfortunately, RocksDB-5.18 still has many bugs and we want to bump to
> RocksDB-6.x. However, another performance regression appeared even with my
> patch [12]. With previous knowledge, we know that we must verify the built
> .so files with our java-based benchmark instead of using RocksDB built-in
> db-bench. I started to search the 1340+ commits from RocksDB-5.18 to
> RocksDB-6.11 to find the performance problem. However, I did not figure out
> the root cause after spending several weeks this time. The performance
> behaves up and down in those commits and I cannot get the commit which lead
> the performance regression. Take this commit of integrating block cache
> tracer in block-based table reader [13] for example, I noticed that this
> commit would cause a bit performance regression and that might be the
> useless usage accounting in operations, however, the problematic code was
> changed in later commits. Thus, after several weeks digging, I have to give
> up for the endless searching in the thousand commits temporarily. As
> RocksDB community seems not make the project management system public,
> unlike Apache's open JIRA systems, we do not know what benchmark they
> actually run before releasing each version to guarantee the performance.
> >
> >With my patch [10] on latest RocksDB-6.20.3, we could get the results on
> nexmark in the original thread sent by Stephan, and we can see the
> performance behaves closely in many real-world cases. And we also hope new
> features, such as direct buffer supporting [14] in RocksJava

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread Piotr Nowojski

Thanks for the detailed explanation Yun Tang and clearly all of the effort
you have put into it. Based on what was described here I would also vote
for going forward with the upgrade.

It's a pity that this regression wasn't caught in the RocksDB community. I
would have two questions/ideas:
1. Can we push for better benchmark coverage in the RocksDB project in the
future?
2. Can we try to catch this kind of problems with RocksDB earlier? For
example with more frequent RocksDB upgrades, or building test flink builds
with the most recent RocksDB version to run our benchmarks and validate
newer RocksDB versions?

Best,
Piotrek

śr., 4 sie 2021 o 19:59 Yuval Itzchakov  napisał(a):

> Hi Yun,
> Thank you for the elaborate explanation and even more so for the super
> hard work that you're doing digging into RocksDB and chasing after
> hundreds of commits in order to fix them so we can all benefit.
>
> I can say for myself that optimizing towards memory is more important
> ATM for us, and I'm totally +1 for this.
>
> On Wed, Aug 4, 2021 at 8:50 PM Yun Tang  wrote:
>
>> Hi Yuval,
>>
>> Upgrading RocksDB version is a long story since Flink-1.10.
>> When we first plan to introduce write buffer manager to help control the
>> memory usage of RocksDB, we actually wanted to bump up to RocksDB-5.18 from
>> current RocksDB-5.17. However, we found performance regression in our micro
>> benchmark on state operations [1] if bumped to RocksDB-5.18. We did not
>> figure the root cause at that time and decide to cherry pick the commits of
>> write buffer manager to our own FRocksDB [2]. And we finally released our
>> own frocksdbjni-5.17.2-artisans-2.0 at that time.
>>
>> As time goes no, more and more bugs or missed features have been reported
>> in the old RocksDB version. Such as:
>>
>>1. Cannot support ARM platform [3]
>>2. Dose not have stable deleteRange API, which is useful for Flink
>>scale out [4]
>>3. Cannot support strict block cache [5]
>>4. Checkpoint might stuck if using UNIVERSVAL compaction strategy [6]
>>5. Uncontrolled log size make us disabled the RocksDB internal LOG [7]
>>6. RocksDB's optimizeForPointLookup option might cause data lost [8]
>>7. Current dummy entry used for memory control in RocksDB-5.17 is too
>>large, leading performance problem [9]
>>8. Cannot support alpine-based images.
>>9. ...
>>
>> Some of the bugs are walked around, and some are still open.
>>
>> And we decide to make some changes from Flink-1.12. First of all, we
>> reported the performance regression problem compared with RocksDB-5.18 and
>> RocksDB-5.17 to RocksDB community [10]. However, as RocksDB-5.x versions
>> are a bit older for the community, and RocksJava usage might not be the
>> core part for facebook guys, we did not get useful replies. Thus, we decide
>> to figure out the root cause of performance regression by ourself.
>> Fortunately, we find the cause via binary search the commits among
>> RocksDB-5.18 and RocksDB-5.17, and updated in the original thread [10]. To
>> be short, the performance regression is due to different implementation of
>> `__thread` and `thread_local` in gcc and would have more impact on dynamic
>> loading [11], which is also what current RocksJava jar package does. With
>> my patch [12], the performance regression would disappear if comparing
>> RocksDB-5.18 with RocksDB-5.17.
>>
>> Unfortunately, RocksDB-5.18 still has many bugs and we want to bump to
>> RocksDB-6.x. However, another performance regression appeared even with my
>> patch [12]. With previous knowledge, we know that we must verify the built
>> .so files with our java-based benchmark instead of using RocksDB built-in
>> db-bench. I started to search the 1340+ commits from RocksDB-5.18 to
>> RocksDB-6.11 to find the performance problem. However, I did not figure out
>> the root cause after spending several weeks this time. The performance
>> behaves up and down in those commits and I cannot get *the commit *which
>> lead the performance regression. Take this commit of integrating block
>> cache tracer in block-based table reader [13] for example, I noticed that
>> this commit would cause a bit performance regression and that might be the
>> useless usage accounting in operations, however, the problematic code was
>> changed in later commits. Thus, after several weeks digging, I have to give
>> up for the endless searching in the thousand commits temporarily. As
>> RocksDB community seems not make the project management system public,
>> unlike Apache's open JIRA systems, we do not know what benchmark they
>> actually run before releasing each version to guarantee the performance.
>>
>> With my patch [10] on latest RocksDB-6.20.3, we could get the results on
>> nexmark in the original thread sent by Stephan, and we can see the
>> performance behaves closely in many real-world cases. And we also hope new
>> features, such as direct buffer supporting [14] in RocksJava could help
>>

Re: TaskManager crash after cancelling a job

2021-07-29 Thread Piotr Nowojski

Hi Ivan,

It sounds to me like a bug in FlinkKinesisConsumer that it's not cancelling
properly. The change in the behaviour could have been introduced as a bug
fix [1], where we had to stop interrupting the source thread. This also
might be related or at least relevant for fixing [2].

Ivan, the stack trace that you posted shows only that the task thread is
waiting for the source thread to finish. It doesn't show why the source
thread hasn't ended. For someone to fix this, it would be helpful if you
could provide a thread dump from a Task Manager that is stuck in cancelling
state. If for some reason you don't want to share a full thread dump, you
can share only the threads that have on the stack trace any package that
contains "kinesis" (this would capture the `FlinkKinesisConsumer` and all
internal kinesis threads). This should be enough for someone that is
familiar with FlinkKinesisConsumer to understand why it hasn't been
canceled.

Best, Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-21028
[2] https://issues.apache.org/jira/browse/FLINK-23528

czw., 29 lip 2021 o 04:15 Yangze Guo  napisał(a):

> In your case, the entry point is the `cleanUpInvoke` function called
> by `StreamTask#invoke`.
>
> @ro...@apache.org Could you take another look at this?
>
> Best,
> Yangze Guo
>
> On Thu, Jul 29, 2021 at 2:29 AM Ivan Yang  wrote:
> >
> > Hi Yangze,
> >
> > I deployed 1.13.1, same problem exists. It seems like that the cancel
> logic has changed since 1.11.0 (which was the one we have been running for
> almost 1 year). In 1.11.0, during the cancellation, we saw some subtask
> stays in the cancelling state for sometime, but eventually the job will be
> cancelled, and no task manager were lost. So we can start the job right
> away. In the new version 1.13.x, it will kill the task managers where those
> stuck sub tasks were running on, then takes another 4-5 minutes for the
> task manager to rejoin.  Can you point me the code that manages the job
> cancellation routine? Want to understand the logic there.
> >
> > Thanks,
> > Ivan
> >
> > > On Jul 26, 2021, at 7:22 PM, Yangze Guo  wrote:
> > >
> > > Hi, Ivan
> > >
> > > My gut feeling is that it is related to FLINK-22535. Could @Yun Gao
> > > take another look? If that is the case, you can upgrade to 1.13.1.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Jul 27, 2021 at 9:41 AM Ivan Yang 
> wrote:
> > >>
> > >> Dear Flink experts,
> > >>
> > >> We recently ran into an issue during a job cancellation after
> upgraded to 1.13. After we issue a cancel (from Flink console or flink
> cancel {jobid}), a few subtasks stuck in cancelling state. Once it gets to
> that situation, the behavior is consistent. Those “cancelling tasks will
> never become canceled. After 3 minutes, The job stopped, as a result,
> number of task manages were lost. It will take about another 5 minute for
> the those lost task manager to rejoin the Job manager. Then we can restart
> the job from the previous checkpoint. Found an exception from the hanging
> (cancelling) Task Manager.
> > >> ==
> > >>sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> org.apache.flink.streaming.runtime.tasks.StreamTask.cleanUpInvoke(StreamTask.java:705)
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.cleanUpInvoke(SourceStreamTask.java:186)
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:637)
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:776)
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
> java.lang.Thread.run(Thread.java:748)
> > >> ===
> > >>
> > >> Here are some background information about our job and setup.
> > >> 1) The job is relatively large, we have 500+ parallelism and 2000+
> subtasks. It’s mainly reading from a Kinesis stream and perform some
> transformation and fanout to multiple output s3 buckets. It’s a stateless
> ETL job.
> > >> 2) The same code and setup running on smaller environments don’t seem
> to have this cancel failure problem.
> > >> 3) We have been using Flink 1.11.0 for the same job, and never seen
> this cancel failure and killing Task Manager problem.
> > >> 4) With upgrading to 1.13, we also added Kubernetes HA
> (zookeeperless). Pervious we don’t not use HA.
> > >>
> > >> The cancel and restart from previous checkpoint is our regular
> procedure to support daily operation. With this 10 minutes TM restart
> cycle, it basically slowed down our throughput. I try to understand what
> leads into this situation. Hoping maybe some configuration change

Re: Kafka Consumer Retries Failing

2021-07-19 Thread Piotr Nowojski

Ok, thanks for the update. Great that you managed to resolve this issue :)

Best,
Piotrek

pon., 19 lip 2021 o 17:13 Rahul Patwari 
napisał(a):

> Hi Piotrek,
>
> I was just about to update.
> You are right. The issue is because of a stalled task manager due to High
> Heap Usage. And the High Heap Usage is because of a Memory Leak in a
> library we are using.
>
> Thanks for your help.
>
> On Mon, Jul 19, 2021 at 8:31 PM Piotr Nowojski 
> wrote:
>
>> Thanks for the update.
>>
>> > Could the backpressure timeout and heartbeat timeout be because of
>> Heap Usage close to Max configured?
>>
>> Could be. This is one of the things I had in mind under overloaded in:
>>
>> > might be related to one another via some different deeper problem
>> (broken network environment, something being overloaded)
>>
>> You can easily diagnose it. Just attach a memory profiler or check gc
>> logs, just as you would normally do when debugging a non-Flink standalone
>> Java application.
>>
>> It can also be a symptom of a failing network environment. I would first
>> check for GC pauses/stops/gaps in the logs that would indicate stalled JVM
>> caused those RPC timeouts. If that doesn't bring you closer to a solution I
>> would then check for the network environment in your cluster/cloud. Both of
>> those might be a reason behind your Kafka issues. Hard to tell. Definitely
>> you shouldn't have heartbeat timeouts in your cluster, so something IS
>> wrong with your setup.
>>
>> Best,
>> Piotrek
>>
>> czw., 15 lip 2021 o 17:17 Rahul Patwari 
>> napisał(a):
>>
>>> Thanks for the feedback Piotrek.
>>>
>>> We have observed the issue again today. As we are using Flink 1.11.1, I
>>> tried to check the backpressure of Kafka source tasks from the
>>> Jobmanager UI.
>>> The backpressure request was canceled due to Timeout and "No Data" was
>>> displayed in UI. Here are the respective logs:
>>>
>>> java.util.concurrent.TimeoutException: Invocation of public abstract
>>> java.util.concurrent.CompletableFuture
>>> org.apache.flink.runtime.taskexecutor.TaskExecutorGateway.requestTaskBackPressure(org.apache.flink.runtime.executiongraph.ExecutionAttemptID,int,org.apache.flink.api.common.time.Time)
>>> timed out.
>>> at
>>> org.apache.flink.runtime.jobmaster.RpcTaskManagerGateway.requestTaskBackPressure(RpcTaskManagerGateway.java:67)
>>> .
>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>> [Actor[akka.tcp://flink@xX.X.X.X:X/user/rpc/taskmanager_0#-1457664622]]
>>> after [15000 ms]. Message of type
>>> [org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation]. A typical
>>> reason for `AskTimeoutException` is that the recipient actor didn't send a
>>> reply.
>>> at
>>> akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:635)
>>> ~[flink-dist_2.12-1.11.1.jar:1.11.1]
>>> .
>>>
>>> During this time, the heartbeat of one of the Taskmanager to the
>>> Jobmanager timed out. Here are the respective logs:
>>>
>>> java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id
>>> bead57c15b447eac08531693ec91edc4 timed out. at
>>> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1193)
>>> ..
>>>
>>> Because of heartbeat timeout, there was an internal restart of Flink
>>> and the Kafka consumption rate recovered after the restart.
>>>
>>> Could the backpressure timeout and heartbeat timeout be because of Heap
>>> Usage close to Max configured?
>>>
>>> On Wed, Jul 14, 2021 at 6:29 PM Piotr Nowojski 
>>> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>> I would highly doubt that you are hitting the network bottleneck case.
>>>> It would require either a broken environment/network or throughputs in
>>>> orders of GB/second. More likely you are seeing empty input pool and you
>>>> haven't checked the documentation [1]:
>>>>
>>>> > inPoolUsage - An estimate of the input buffers usage. (ignores
>>>> LocalInputChannels)
>>>>
>>>> If local channels are backpressured, inPoolUsage will be 0. You can
>>>> check downstream task's inputQueueLength or isBackPressured metrics.
>>>> Besides that, I would highly recommend upgrading to Flink 1.13

Re: Kafka Consumer Retries Failing

2021-07-19 Thread Piotr Nowojski

Thanks for the update.

> Could the backpressure timeout and heartbeat timeout be because of Heap
Usage close to Max configured?

Could be. This is one of the things I had in mind under overloaded in:

> might be related to one another via some different deeper problem (broken
network environment, something being overloaded)

You can easily diagnose it. Just attach a memory profiler or check gc logs,
just as you would normally do when debugging a non-Flink standalone Java
application.

It can also be a symptom of a failing network environment. I would first
check for GC pauses/stops/gaps in the logs that would indicate stalled JVM
caused those RPC timeouts. If that doesn't bring you closer to a solution I
would then check for the network environment in your cluster/cloud. Both of
those might be a reason behind your Kafka issues. Hard to tell. Definitely
you shouldn't have heartbeat timeouts in your cluster, so something IS
wrong with your setup.

Best,
Piotrek

czw., 15 lip 2021 o 17:17 Rahul Patwari 
napisał(a):

> Thanks for the feedback Piotrek.
>
> We have observed the issue again today. As we are using Flink 1.11.1, I
> tried to check the backpressure of Kafka source tasks from the
> Jobmanager UI.
> The backpressure request was canceled due to Timeout and "No Data" was
> displayed in UI. Here are the respective logs:
>
> java.util.concurrent.TimeoutException: Invocation of public abstract
> java.util.concurrent.CompletableFuture
> org.apache.flink.runtime.taskexecutor.TaskExecutorGateway.requestTaskBackPressure(org.apache.flink.runtime.executiongraph.ExecutionAttemptID,int,org.apache.flink.api.common.time.Time)
> timed out.
> at
> org.apache.flink.runtime.jobmaster.RpcTaskManagerGateway.requestTaskBackPressure(RpcTaskManagerGateway.java:67)
> .
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka.tcp://flink@xX.X.X.X:X/user/rpc/taskmanager_0#-1457664622]]
> after [15000 ms]. Message of type
> [org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation]. A typical
> reason for `AskTimeoutException` is that the recipient actor didn't send a
> reply.
> at
> akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:635)
> ~[flink-dist_2.12-1.11.1.jar:1.11.1]
> .
>
> During this time, the heartbeat of one of the Taskmanager to the
> Jobmanager timed out. Here are the respective logs:
>
> java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id
> bead57c15b447eac08531693ec91edc4 timed out. at
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1193)
> ..
>
> Because of heartbeat timeout, there was an internal restart of Flink and
> the Kafka consumption rate recovered after the restart.
>
> Could the backpressure timeout and heartbeat timeout be because of Heap
> Usage close to Max configured?
>
> On Wed, Jul 14, 2021 at 6:29 PM Piotr Nowojski 
> wrote:
>
>> Hi Rahul,
>>
>> I would highly doubt that you are hitting the network bottleneck case. It
>> would require either a broken environment/network or throughputs in orders
>> of GB/second. More likely you are seeing empty input pool and you haven't
>> checked the documentation [1]:
>>
>> > inPoolUsage - An estimate of the input buffers usage. (ignores
>> LocalInputChannels)
>>
>> If local channels are backpressured, inPoolUsage will be 0. You can check
>> downstream task's inputQueueLength or isBackPressured metrics. Besides
>> that, I would highly recommend upgrading to Flink 1.13.x if you are
>> investigating backpressure problems as described in the blog post.
>>
>> > 1. Can the backpressure Cause "DisconnectException", "Error Sending
>> Fetch Request to node ..." and other Kafka Consumer logs mentioned above?
>>
>> No, I don't think it's possible. Those two might be related to one
>> another via some different deeper problem (broken network environment,
>> something being overloaded), but I don't see a way how one could cause the
>> other.
>>
>> Piotrek
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/metrics/#default-shuffle-service
>>
>> śr., 14 lip 2021 o 14:18 Rahul Patwari 
>> napisał(a):
>>
>>> Thanks, Piotrek.
>>>
>>> We have two Kafka sources. We are facing this issue for both of them.
>>> The downstream tasks with the sources form two independent directed acyclic
>>> graphs, running within the same Streaming Job.
>>>
>>> For Example:
>>> source1 -> task1 -> sink1
>>> source2 -

Re: Process finite stream and notify upon completion

2021-07-14 Thread Piotr Nowojski

gt; series analysis, when doing aggregations based on certain time periods
> > (typically called windows), or when you do event processing where the
> > time when an event occurred is important.
> > ci.apache.org
> >
> >
> > [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/api/java/org/apache/flink/streaming/api/watermark/Watermark.html#MAX_WATERMARK
> > <
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/api/java/org/apache/flink/streaming/api/watermark/Watermark.html#MAX_WATERMARK
> >
> >
> >
> > 
> > *From:* Piotr Nowojski 
> > *Sent:* Wednesday, July 14, 2021 1:36 PM
> > *To:* Tamir Sagi 
> > *Cc:* user@flink.apache.org 
> > *Subject:* Re: Process finite stream and notify upon completion
> >
> > *EXTERNAL EMAIL*
> >
> >
> >
> > Hi Tamir,
> >
> > Sorry I missed that you want to use Kafka. In that case I would suggest
> > trying out the new KafkaSource [1] interface and it's built-in boundness
> > support [2][3]. Maybe it will do the trick? If you want to be notified
> > explicitly about the completion of such a bounded Kafka stream, you
> > still can use this `Watermark#MAX_WATERMARK` trick mentioned above.
> >
> > If not, can you let us know what is not working?
> >
> > Best,
> > Piotrek
> >
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#kafka-source
> > <
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#kafka-source
> >
> > [2]
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#boundedness
> > <
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#boundedness
> >
> > [3]
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/api/java/org/apache/flink/connector/kafka/source/KafkaSourceBuilder.html#setBounded-org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer-
> > <
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/api/java/org/apache/flink/connector/kafka/source/KafkaSourceBuilder.html#setBounded-org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer-
> >
> >
> >
> > śr., 14 lip 2021 o 11:59 Tamir Sagi  > <mailto:tamir.s...@niceactimize.com>> napisał(a):
> >
> > Hey Piotr,
> >
> > Thank you for your response.
> >
> > I saw the exact suggestion answer by David Anderson [1] but did not
> > really understand how it may help.
> >
> > Sources when finishing are emitting
> >
>  {{org.apache.flink.streaming.api.watermark.Watermark#MAX_WATERMARK}}
> >
> > Assuming 10 messages are sent to Kafka topic , processed and saved
> > into DB
> >
> >  1. Kafka is not considered a finite source, after the 10th element
> > it will wait for more input, no?
> >  2. In such case, the 10th element will be marked with MAX_WATERMARK
> > or not? or at some point in the future?
> >
> > Now, Let's say the 10th element will be marked with MAX_WATERMARK,
> > How will I know when all elements have been saved into DB?
> >
> > Here is the execution Graph
> > Source(Kafka) --> Operator --- > Operator 2 --> Sink(PostgresSQL)
> >
> > Would you please elaborate about the time event function? where
> > exactly will it be integrated into the aforementioned execution
> graph ?
> >
> > Another question I have, based on our discussion. If the only thing
> > that changed is the source, apart from that the entire flow is the
> > same(operators and sink);  is there any good practice to achieve a
> > single job for that?
> >
> > Tamir.
> >
> > [1]
> >
> https://stackoverflow.com/questions/54687372/flink-append-an-event-to-the-end-of-finite-datastream#answer-54697302
> > <
> https://stackoverflow.com/questions/54687372/flink-append-an-event-to-the-end-of-finite-datastream#answer-54697302
> >
> >
>  
> > *From:* Piotr Nowojski  > <mailto:pnowoj...@apache.org>>
> > *Sent:* Tuesday, July 13, 2021 4:54 PM
> > *To:* Tamir Sagi  > <mailto:tamir.s...@niceactimize.com>>
> > *Cc:* user@flink.apache.org <mailto:user

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Piotr Nowojski

rce Thread - Source:
> SourceEventSignature (8/12)\" Id=515 WAITING on 
> java.lang.Object@4d5cc800\n\tat
> java.lang.Object.wait(Native Method)\n\t-  waiting on
> java.lang.Object@4d5cc800\n\tat
> java.lang.Object.wait(Object.java:502)\n\tat
> org.apache.flink.streaming.connectors.kafka.internal.Handover.pollNext(Handover.java:74)\n\tat
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:133)\n\tat
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:755)\n\tat
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)\n\tat
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)\n\tat
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:201)\n\n"
> }, {
> "threadName": "Legacy Source Thread - Source: SourceEventTransition
> (4/12)",
> "stringifiedThreadInfo": "\"Legacy Source Thread - Source:
> SourceEventTransition (4/12)\" Id=514 WAITING on 
> java.lang.Object@1fc525f3\n\tat
> java.lang.Object.wait(Native Method)\n\t-  waiting on
> java.lang.Object@1fc525f3\n\tat
> java.lang.Object.wait(Object.java:502)\n\tat
> org.apache.flink.streaming.connectors.kafka.internal.Handover.pollNext(Handover.java:74)\n\tat
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:133)\n\tat
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:755)\n\tat
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)\n\tat
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)\n\tat
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:201)\n\n"
> }
>
> On Wed, Jul 14, 2021 at 2:39 PM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> Waiting for memory from LocalBufferPool is a perfectly normal symptom of
>> a backpressure [1][2].
>>
>> Best,
>> Piotrek
>>
>> [1] https://flink.apache.org/2021/07/07/backpressure.html
>> [2] https://www.ververica.com/blog/how-flink-handles-backpressure
>>
>> śr., 14 lip 2021 o 06:05 Rahul Patwari 
>> napisał(a):
>>
>>> Thanks, David, Piotr for your reply.
>>>
>>> I managed to capture the Thread dump from Jobmanaager UI for few task
>>> managers.
>>> Here is the thread dump for Kafka Source tasks in one task manager. I
>>> could see the same stack trace in other task managers as well. It seems
>>> like Kafka Source tasks are waiting on Memory. Any Pointers?
>>>
>>>   {
>>> "threadName": "Kafka Fetcher for Source: SourceEventTransition (6/12)",
>>> "stringifiedThreadInfo": "\"Kafka Fetcher for Source:
>>> SourceEventTransition (6/12)\" Id=581 WAITING on 
>>> java.lang.Object@444c0edc\n\tat
>>> java.lang.Object.wait(Native Method)\n\t-  waiting on
>>> java.lang.Object@444c0edc\n\tat
>>> java.lang.Object.wait(Object.java:502)\n\tat
>>> org.apache.flink.streaming.connectors.kafka.internal.Handover.produce(Handover.java:117)\n\tat
>>> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:261)\n\n"
>>> }, {
>>> "threadName": "Kafka Fetcher for Source: SourceEventSignature (7/12)",
>>> "stringifiedThreadInfo": "\"Kafka Fetcher for Source:
>>> SourceEventSignature (7/12)\" Id=580 WAITING on 
>>> java.lang.Object@7d3843a9\n\tat
>>> java.lang.Object.wait(Native Method)\n\t-  waiting on
>>> java.lang.Object@7d3843a9\n\tat
>>> java.lang.Object.wait(Object.java:502)\n\tat
>>> org.apache.flink.streaming.connectors.kafka.internal.Handover.produce(Handover.java:117)\n\tat
>>> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:261)\n\n"
>>> }, {
>>> "threadName": "Legacy Source Thread - Source: SourceEventSignature
>>> (7/12)",
>>> "stringifiedThreadInfo": "\"Legacy Source Thread - Source:
>>> SourceEventSignature (7/12)\" Id=408 WAITING on
>>> java.util.concurrent.CompletableFuture$Signaller@4c613ed7\n\tat
>>> sun.misc.Unsafe.park(Native Method)\n\t-  waiting on
>>> java.util.concurrent.CompletableFuture$Signaller@4c613ed7\n\tat
>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)\n\tat
>>> java.util.concurrent.CompletableFuture$Signall

Re: Process finite stream and notify upon completion

2021-07-14 Thread Piotr Nowojski

Hi Tamir,

Sorry I missed that you want to use Kafka. In that case I would suggest
trying out the new KafkaSource [1] interface and it's built-in boundness
support [2][3]. Maybe it will do the trick? If you want to be notified
explicitly about the completion of such a bounded Kafka stream, you still
can use this `Watermark#MAX_WATERMARK` trick mentioned above.

If not, can you let us know what is not working?

Best,
Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#kafka-source
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#boundedness
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/api/java/org/apache/flink/connector/kafka/source/KafkaSourceBuilder.html#setBounded-org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer-


śr., 14 lip 2021 o 11:59 Tamir Sagi 
napisał(a):

> Hey Piotr,
>
> Thank you for your response.
>
> I saw the exact suggestion answer by David Anderson [1] but did not really
> understand how it may help.
>
> Sources when finishing are emitting
> {{org.apache.flink.streaming.api.watermark.Watermark#MAX_WATERMARK}}
>
> Assuming 10 messages are sent to Kafka topic , processed and saved into DB
>
>1. Kafka is not considered a finite source, after the 10th element it
>will wait for more input, no?
>2. In such case, the 10th element will be marked with MAX_WATERMARK or
>not? or at some point in the future?
>
> Now, Let's say the 10th element will be marked with MAX_WATERMARK, How
> will I know when all elements have been saved into DB?
>
> Here is the execution Graph
> Source(Kafka) --> Operator --- > Operator 2 --> Sink(PostgresSQL)
>
> Would you please elaborate about the time event function? where exactly
> will it be integrated into the aforementioned execution graph ?
>
> Another question I have, based on our discussion. If the only thing that
> changed is the source, apart from that the entire flow is the
> same(operators and sink);  is there any good practice to achieve a single
> job for that?
>
> Tamir.
>
> [1]
> https://stackoverflow.com/questions/54687372/flink-append-an-event-to-the-end-of-finite-datastream#answer-54697302
> --
> *From:* Piotr Nowojski 
> *Sent:* Tuesday, July 13, 2021 4:54 PM
> *To:* Tamir Sagi 
> *Cc:* user@flink.apache.org 
> *Subject:* Re: Process finite stream and notify upon completion
>
>
> *EXTERNAL EMAIL*
>
>
> Hi,
>
> Sources when finishing are emitting
> {{org.apache.flink.streaming.api.watermark.Watermark#MAX_WATERMARK}}, so I
> think the best approach is to register an even time timer for
> {{Watermark#MAX_WATERMARK}} or maybe {{Watermark#MAX_WATERMARK - 1}}. If
> your function registers such a timer, it would be processed after
> processing all of the records by that function (keep in mind Flink is a
> distributed system so downstream operators/functions might still be busy
> for some time processing last records, while upstream operators/functions
> are already finished).
>
> Alternatively you can also implement a custom operator that implements
> {{BoundedOneInput}} interface [1], it would work in the same way, but
> implementing a custom operator is more difficult, only semi officially
> supported and not well documented.
>
> Best,
> Piotrek
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/operators/BoundedOneInput.html
>
> pon., 12 lip 2021 o 12:44 Tamir Sagi 
> napisał(a):
>
> Hey Community,
>
> I'm working on a stream job that should aggregate a bounded data and
> notify upon completion. (It works in Batch mode; however, I'm trying to
> achieve the same results in Stream mode, if possible).
>
> Source: Kafka
> Sink: PostgresDB
>
> *I'm looking for an elegant way to notify upon completion.*
>
> One solution I have in mind (Not perfect but might work)
>
>1. Send message to topic for every record which successfully saved
>into DB (From sink)
>2. Consume those messages externally to cluster
>3. If message is not consumed for fixed time, we assume the process
>has finished.
>
> I was also wondering if TimeEventWindow with custom trigger and
> AggregationFunction may help me here
> However, I could not find a way to detect when all records have been
> processed within the window.
>
> I'd go with Flink base solution if exists.
>
> Various References
> flink-append-an-event-to-the-end-of-finite-datastream
> <https://stackoverflow.com/questions/54687372/flink-append-an-event-to-the-end-of-finite-datastream#answer-54697302>
> how-can-i-know-that-i-have-consumed-al

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Piotr Nowojski

Hi,

Waiting for memory from LocalBufferPool is a perfectly normal symptom of a
backpressure [1][2].

Best,
Piotrek

[1] https://flink.apache.org/2021/07/07/backpressure.html
[2] https://www.ververica.com/blog/how-flink-handles-backpressure

śr., 14 lip 2021 o 06:05 Rahul Patwari 
napisał(a):

> Thanks, David, Piotr for your reply.
>
> I managed to capture the Thread dump from Jobmanaager UI for few task
> managers.
> Here is the thread dump for Kafka Source tasks in one task manager. I
> could see the same stack trace in other task managers as well. It seems
> like Kafka Source tasks are waiting on Memory. Any Pointers?
>
>   {
> "threadName": "Kafka Fetcher for Source: SourceEventTransition (6/12)",
> "stringifiedThreadInfo": "\"Kafka Fetcher for Source:
> SourceEventTransition (6/12)\" Id=581 WAITING on 
> java.lang.Object@444c0edc\n\tat
> java.lang.Object.wait(Native Method)\n\t-  waiting on
> java.lang.Object@444c0edc\n\tat
> java.lang.Object.wait(Object.java:502)\n\tat
> org.apache.flink.streaming.connectors.kafka.internal.Handover.produce(Handover.java:117)\n\tat
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:261)\n\n"
> }, {
> "threadName": "Kafka Fetcher for Source: SourceEventSignature (7/12)",
> "stringifiedThreadInfo": "\"Kafka Fetcher for Source: SourceEventSignature
> (7/12)\" Id=580 WAITING on java.lang.Object@7d3843a9\n\tat
> java.lang.Object.wait(Native Method)\n\t-  waiting on
> java.lang.Object@7d3843a9\n\tat
> java.lang.Object.wait(Object.java:502)\n\tat
> org.apache.flink.streaming.connectors.kafka.internal.Handover.produce(Handover.java:117)\n\tat
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:261)\n\n"
> }, {
> "threadName": "Legacy Source Thread - Source: SourceEventSignature (7/12)",
> "stringifiedThreadInfo": "\"Legacy Source Thread - Source:
> SourceEventSignature (7/12)\" Id=408 WAITING on
> java.util.concurrent.CompletableFuture$Signaller@4c613ed7\n\tat
> sun.misc.Unsafe.park(Native Method)\n\t-  waiting on
> java.util.concurrent.CompletableFuture$Signaller@4c613ed7\n\tat
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)\n\tat
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)\n\tat
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)\n\tat
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)\n\tat
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n\tat
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:293)\n\tat
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:266)\n\t...\n\n"
> }, {
> "threadName": "Legacy Source Thread - Source: SourceEventTransition
> (6/12)",
> "stringifiedThreadInfo": "\"Legacy Source Thread - Source:
> SourceEventTransition (6/12)\" Id=409 WAITING on
> java.util.concurrent.CompletableFuture$Signaller@5765d0d4\n\tat
> sun.misc.Unsafe.park(Native Method)\n\t-  waiting on
> java.util.concurrent.CompletableFuture$Signaller@5765d0d4\n\tat
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)\n\tat
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)\n\tat
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)\n\tat
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)\n\tat
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n\tat
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:293)\n\tat
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:266)\n\t...\n\n"
> }
>
> On Tue, Jul 13, 2021 at 7:07 PM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> I'm not sure, maybe someone will be able to help you, but it sounds like
>> it would be better for you to:
>> - google search something like "Kafka Error sending fetch request
>> TimeoutException" (I see there are quite a lot of results, some of them
>> might be related)
>> - ask this question on the Kafka mailing list
>> - ask this question on stackoverflow as a Kafka question
>>
>> In short, FlinkKafkaConsumer is a very thin wrapper around the
>> KafkaConsumer class, so the thing you are observing has most likely very
>> little to do with the Flink itself. In other words

Re: Process finite stream and notify upon completion

2021-07-13 Thread Piotr Nowojski

Hi,

Sources when finishing are emitting
{{org.apache.flink.streaming.api.watermark.Watermark#MAX_WATERMARK}}, so I
think the best approach is to register an even time timer for
{{Watermark#MAX_WATERMARK}} or maybe {{Watermark#MAX_WATERMARK - 1}}. If
your function registers such a timer, it would be processed after
processing all of the records by that function (keep in mind Flink is a
distributed system so downstream operators/functions might still be busy
for some time processing last records, while upstream operators/functions
are already finished).

Alternatively you can also implement a custom operator that implements
{{BoundedOneInput}} interface [1], it would work in the same way, but
implementing a custom operator is more difficult, only semi officially
supported and not well documented.

Best,
Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/operators/BoundedOneInput.html

pon., 12 lip 2021 o 12:44 Tamir Sagi 
napisał(a):

> Hey Community,
>
> I'm working on a stream job that should aggregate a bounded data and
> notify upon completion. (It works in Batch mode; however, I'm trying to
> achieve the same results in Stream mode, if possible).
>
> Source: Kafka
> Sink: PostgresDB
>
> *I'm looking for an elegant way to notify upon completion.*
>
> One solution I have in mind (Not perfect but might work)
>
>1. Send message to topic for every record which successfully saved
>into DB (From sink)
>2. Consume those messages externally to cluster
>3. If message is not consumed for fixed time, we assume the process
>has finished.
>
> I was also wondering if TimeEventWindow with custom trigger and
> AggregationFunction may help me here
> However, I could not find a way to detect when all records have been
> processed within the window.
>
> I'd go with Flink base solution if exists.
>
> Various References
> flink-append-an-event-to-the-end-of-finite-datastream
> 
> how-can-i-know-that-i-have-consumed-all-of-a-kafka-topic
> 
>
> Best,
>
> Tamir.
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>

Re: Kafka Consumer Retries Failing

2021-07-13 Thread Piotr Nowojski

Hi,

I'm not sure, maybe someone will be able to help you, but it sounds like it
would be better for you to:
- google search something like "Kafka Error sending fetch request
TimeoutException" (I see there are quite a lot of results, some of them
might be related)
- ask this question on the Kafka mailing list
- ask this question on stackoverflow as a Kafka question

In short, FlinkKafkaConsumer is a very thin wrapper around the
KafkaConsumer class, so the thing you are observing has most likely very
little to do with the Flink itself. In other words, if you are observing
such a problem you most likely would be possible to reproduce it without
Flink.

Best,
Piotrek

pt., 9 lip 2021 o 12:30 Rahul Patwari 
napisał(a):

> Hi,
>
> We have a Flink 1.11.1 Version streaming pipeline in production which
> reads from Kafka.
> Kafka Server version is 2.5.0 - confluent 5.5.0
> Kafka Client Version is 2.4.1 - 
> {"component":"org.apache.kafka.common.utils.AppInfoParser$AppInfo","message":"Kafka
> version: 2.4.1","method":""}
>
> Occasionally(every 6 to 12 hours), we have observed that the Kafka
> consumption rate went down(NOT 0) and the following logs were observed:
> Generally, the consumption rate across all consumers is 4k records/sec.
> When this issue occurred, the consumption rate dropped to < 50 records/sec
>
> org.apache.kafka.common.errors.DisconnectException: null
>
> {"time":"2021-07-07T22:13:37,385","severity":"INFO","component":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer
> clientId=consumer-MFTDataProcessorEventSignatureConsumerGroupV1R1-3,
> groupId=MFTDataProcessorEventSignatureConsumerGroupV1R1] Error sending
> fetch request (sessionId=405798138, epoch=5808) to node 8:
> {}.","method":"handleError"}
>
> org.apache.kafka.common.errors.TimeoutException: Failed
>
> {"time":"2021-07-07T22:26:41,379","severity":"INFO","component":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator","message":"[Consumer
> clientId=consumer-MFTDataProcessorEventSignatureConsumerGroupV1R1-3,
> groupId=MFTDataProcessorEventSignatureConsumerGroupV1R1] Group coordinator
> 100.98.40.16:9092 (id: 2147483623 rack: null) is unavailable or invalid,
> will attempt rediscovery","method":"markCoordinatorUnknown"}
>
> {"time":"2021-07-07T22:27:10,465","severity":"INFO","component":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler","message":"[Consumer
> clientId=consumer-MFTDataProcessorEventSignatureConsumerGroupV1R1-3,
> groupId=MFTDataProcessorEventSignatureConsumerGroupV1R1] Discovered group
> coordinator 100.98.40.16:9092 (id: 2147483623 rack:
> null)","method":"onSuccess"}
>
> The consumers retried for more than an hour but the above logs are
> observed again.
> The consumers started pulling data after a manual restart.
>
> No WARN or ERROR logs were observed in Kafka or Zookeeper during this
> period.
>
> Our observation from this incident is that Kafka Consumer retries could
> not resolve the issue but a manual restart (or) Flink internal
> restart(Failure rate restart policy) does.
>
> Has anyone faced this issue before? Any pointers are appreciated.
>
> Regards,
> Rahul
>

Re: How to register custormize serializer for flink kafka format type

2021-07-13 Thread Piotr Nowojski

Hi,

It's mentioned in the docs [1], but unfortunately this is not very well
documented in 1.10. In short you have to provide a custom implementation of
a `DeserializationSchemaFactory`. Please look at the built-in factories for
examples of how it can be done.

In newer versions it's both easier and better documented. For example in
1.13 please take a look at `DeserializationFormatFactory` and [2]

Best,
Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sourceSinks.html
[2]
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sourcessinks/#factories

czw., 8 lip 2021 o 14:21 Chenzhiyuan(HR) 
napisał(a):

> I create table as below, and the data is from kafka.
>
> I want to deserialize the json message to Pojo object.
>
> But the message format is not avro or simple json.
>
> *So I need to know how to register custormized serializer and use it for
> the 'format.type' property.*
>
> By the way, my flink version is 1.10.0.
>
> CREATE TABLE MyUserTable(
>
> uuid VARCHAR,
>
> orgId VARCHAR
>
> ) with (
>
> 'connector.type' = 'kafka',
>
> 'connector.version' = '0.11',
>
> 'connector.topic' = 'topic_name',
>
> 'connector.properties.zookeeper.connect' = 'localhost:2181',
>
> 'connector.properties.bootstrap.servers' = 'localhost:9092',
>
> 'connector.properties.group.id' = 'testGroup',
>
> 'format.type' = 'cutormizeSerializer'
>
> )
>
> The kafka message body sample, each columnName is the key for Pojo object,
> and rawData is value:
>
> {
>
>"beforeData": [],
>
> "byteSize": 272,
>
> "columnNumber": 32,
>
> "data": [{
>
> "byteSize": 8,
>
> "columnName": "APPLY_PERSON_ID",
>
> "rawData": 10017,
>
> "type": "LONG"
>
> }, {
>
> "byteSize": 12,
>
> "columnName": "UPDATE_SALARY",
>
> "rawData": "11000.00",
>
> "type": "DOUBLE"
>
> }, {
>
> "byteSize": 11,
>
> "columnName": "UP_AMOUNT",
>
> "rawData": "1000.00",
>
> "type": "DOUBLE"
>
> }, {
>
> "byteSize": 3,
>
> "columnName": "CURRENCY",
>
> "rawData": "CNY",
>
> "type": "STRING"
>
> }, {
>
> "byteSize": 32,
>
> "columnName": "EXCHANGE_RATE",
>
> "rawData": "1.00",
>
> "type": "DOUBLE"
>
> },  {
>
> "byteSize": 11,
>
> "columnName": "DEDUCTED_ACCOUNT",
>
> "rawData": "1000.00",
>
> "type": "DOUBLE"
>
> }, {
>
> "byteSize": 1,
>
> "columnName": "ENTER_AT_PROCESS",
>
> "rawData": "Y",
>
> "type": "STRING"
>
> }],
>
> "dataCount": 0,
>
> "dataMetaData": {
>
> "connector": "mysql",
>
> "pos": 1000368076,
>
> "row": 0,
>
> "ts_ms": 1625565737000,
>
> "snapshot": "false",
>
> "db": "testdb",
>
> "table": "flow_person_t"
>
> },
>
> "key": "APPLY_PERSON_ID",
>
> "memorySize": 1120,
>
> "operation": "insert",
>
> "rowIndex": -1,
>
> "timestamp": "1970-01-01 00:00:00"
>
> }
>
> The Pojo object as below:
>
> import lombok.Data;
>
>
>
> @Data
>
> public class HrSalaryPersonVO {
>
> private String uuid;
>
> private String orgId;
>
> private String unitId;
>
> private String effectiveDate;
>
>
>
> private int adjustPersonCount;
>
>
>
> private Double adjustAmount;
>
>
>
> private Double beforeSalaryAmount;
>
> private Double adjustRate;
>
>
>
> private String data0prateType;
>
>
>
> private String status;
>
> }
>
>
>

Re: How to read large amount of data from hive and write to redis, in a batch manner?

2021-07-08 Thread Piotr Nowojski

Great, thanks for coming back and I'm glad that it works for you!

Piotrek

czw., 8 lip 2021 o 13:34 Yik San Chan 
napisał(a):

> Hi Piotr,
>
> Thanks! I end up doing option 1, and that works great.
>
> Best,
> Yik San
>
> On Tue, May 25, 2021 at 11:43 PM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> You could always buffer records in your sink function/operator, until a
>> large enough batch is accumulated and upload the whole batch at once. Note
>> that if you want to have at-least-once or exactly-once semantics, you would
>> need to take care of those buffered records in one way or another. For
>> example you could:
>> 1. Buffer records on some in memory data structure (not Flink's state),
>> and just make sure that those records are flushed to the underlying sink on
>> `CheckpointedFunction#snapshotState()` calls
>> 2. Buffer records on Flink's state (heap state backend or rocksdb - heap
>> state backend would be the fastest with little overhead, but you can risk
>> running out of memory), and that would easily give you exactly-once. That
>> way your batch could span multiple checkpoints.
>> 3. Buffer/write records to temporary files, but in that case keep in mind
>> that those files need to be persisted and recovered in case of failure and
>> restart.
>> 4. Ignore checkpointing and either always restart the job from scratch or
>> accept some occasional data loss.
>>
>> FYI, virtually every connector/sink is internally batching writes to some
>> extent. Usually by doing option 1.
>>
>> Piotrek
>>
>> wt., 25 maj 2021 o 14:50 Yik San Chan 
>> napisał(a):
>>
>>> Hi community,
>>>
>>> I have a Hive table that stores tens of millions rows of data. In my
>>> Flink job, I want to process the data in batch manner:
>>>
>>> - Split the data into batches, each batch has (maybe) 10,000 rows.
>>> - For each batch, call a batchPut() API on my redis client to dump in
>>> Redis.
>>>
>>> Doing so in a streaming manner is not expected, as that will cause too
>>> many round trips between Flink workers and Redis.
>>>
>>> Is there a way to do that? I find little clue in Flink docs, since
>>> almost all APIs feel better suited for streaming processing by default.
>>>
>>> Thank you!
>>>
>>> Best,
>>> Yik San
>>>
>>

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-07-05 Thread Piotr Nowojski

Hey Alex,

Sorry, I've missed your previous email. I've spent a bit more time
searching our Jira for relevant bugs and maybe you were hit by this one:
https://issues.apache.org/jira/browse/FLINK-21351
?
> T2: Job1 was savepointed, brought down and replaced with Job2.

This in combination with FLINK-21351 could cause Flink to incorrectly
remove still referenced incremental checkpoints. That's my best explanation
as to what has caused this. Could you try upgrading to the latest 1.12.x or
1.13.x release?

Best,
Piotrek

sob., 3 lip 2021 o 23:39 Alexander Filipchik 
napisał(a):

> Bumping it up, any known way to catch it if it happens again ? Any logs we
> should enable?
>
> Sent via Superhuman iOS <https://sprh.mn/?vip=afilipc...@gmail.com>
>
>
> On Thu, Jun 17 2021 at 7:52 AM, Alexander Filipchik 
> wrote:
>
>> Did some more digging.
>> 1) is not an option as we are not doing any cleanups at the moment. We
>> keep the last 4 checkpoints per job + all the savepoints.
>> 2) I looked at job deployments that happened 1 week before the incident.
>> We have 23 deployments in total and each resulted in a unique job id. I
>> also looked at job specific metrics and I don't see any evidence of
>> overlapping checkpointing. There is exactly 1 checkpoint per application,
>> every time it has a different job id and every time once a new job
>> checkpoints there are now checkpoints from previous job id.
>>
>> A bit of a mystery. Is there a way to at least catch it in the future?
>> Any additional telemetry (logs, metrics) we can extract to better
>> understand what is happening.
>>
>> Alex
>>
>> On Tue, Jun 8, 2021 at 12:01 AM Piotr Nowojski 
>> wrote:
>>
>>> Re-adding user mailing list
>>>
>>> Hey Alex,
>>>
>>> In that case I can see two scenarios that could lead to missing files.
>>> Keep in mind that incremental checkpoints are referencing previous
>>> checkpoints in order to minimise the size of the checkpoint (roughly
>>> speaking only changes since the previous checkpoint are being
>>> persisted/uploaded/written). Checkpoint number 42, can reference an
>>> arbitrary number of previous checkpoints. I suspect that somehow, some of
>>> those previously referenced checkpoints got deleted and removed. Also keep
>>> in mind that savepoints (as of now) are never incremental, they are always
>>> full checkpoints. However externalised checkpoints can be incremental. Back
>>> to the scenarios:
>>> 1. You might have accidentally removed some older checkpoints from your
>>> Job2, maybe thinking they are no longer needed. Maybe you have just kept
>>> this single externalised checkpoint directory from steps T3 or T4,
>>> disregarding that this externalised checkpoint might be referencing
>>> previous checkpoints of Job2?
>>> 2. As I mentioned, Flink is automatically maintaining reference counts
>>> of the used files and deletes them when they are no longer used/referenced.
>>> However this works only within a single job/cluster. For example if between
>>> steps T3 and T4, you restarted Job2 and let it run for a bit, it could take
>>> more checkpoints that would subsume files that were still part of the
>>> externalised checkpoint that you previously used to start Job3/Job4. Job2
>>> would have no idea that Job3/Job4 exist, let alone that they are
>>> referencing some files from Job2, and those files could have been deleted
>>> as soon as Job2 was no longer using/referencing them.
>>>
>>> Could one of those happen in your case?
>>>
>>> Best, Piotrek
>>>
>>> pon., 7 cze 2021 o 20:01 Alexander Filipchik 
>>> napisał(a):
>>>
>>>> Yes, we do use incremental checkpoints.
>>>>
>>>> Alex
>>>>
>>>> On Mon, Jun 7, 2021 at 3:12 AM Piotr Nowojski 
>>>> wrote:
>>>>
>>>>> Hi Alex,
>>>>>
>>>>> A quick question. Are you using incremental checkpoints?
>>>>>
>>>>> Best, Piotrek
>>>>>
>>>>> sob., 5 cze 2021 o 21:23  napisał(a):
>>>>>
>>>>>> Small correction, in T4 and T5 I mean Job2, not Job 1 (as job 1 was
>>>>>> save pointed).
>>>>>>
>>>>>> Thank you,
>>>>>> Alex
>>>>>>
>>>>>> On Jun 4, 2021, at 3:07 PM, Alexander Filipchik 
>>>>>> wrote:
>>>>>>
>>>>>> 
>>>>>> Looked through the

Re: Yarn Application Crashed?

2021-06-30 Thread Piotr Nowojski

You are welcome :)

Piotrek

śr., 30 cze 2021 o 08:34 Thomas Wang  napisał(a):

> Thanks Piotr. This is helpful.
>
> Thomas
>
> On Mon, Jun 28, 2021 at 8:29 AM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> You should still be able to get the Flink logs via:
>>
>> > yarn logs -applicationId application_1623861596410_0010
>>
>> And it should give you more answers about what has happened.
>>
>> About the Flink and YARN behaviour, have you seen the documentation? [1]
>> Especially this part:
>>
>> > Failed containers (including the JobManager) are replaced by YARN. The
>> maximum number of JobManager container restarts is configured via
>> yarn.application-attempts (default 1). The YARN Application will fail once
>> all attempts are exhausted.
>>
>> ?
>>
>> Best,
>> Piotrek
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/yarn/#flink-on-yarn-reference
>>
>> pon., 28 cze 2021 o 02:26 Thomas Wang  napisał(a):
>>
>>> Just found some additional info. It looks like one of the EC2 instances
>>> got terminated at the time the crash happened and this job had 7 Task
>>> Managers running on that EC2 instance. Now I suspect it's possible
>>> that when Yarn tried to migrate the Task Managers, there were no idle
>>> containers as this job was using like 99% of the entire cluster. However in
>>> that case shouldn't Yarn wait for containers to become available? I'm not
>>> quite sure how Flink would behave in this case. Could someone provide some
>>> insights here? Thanks.
>>>
>>> Thomas
>>>
>>> On Sun, Jun 27, 2021 at 4:24 PM Thomas Wang  wrote:
>>>
>>>> Hi,
>>>>
>>>> I recently experienced a job crash due to the underlying Yarn
>>>> application failing for some reason. Here is the only error message I saw.
>>>> It seems I can no longer see any of the Flink job logs.
>>>>
>>>> Application application_1623861596410_0010 failed 1 times (global limit
>>>> =2; local limit is =1) due to ApplicationMaster for attempt
>>>> appattempt_1623861596410_0010_01 timed out. Failing the application.
>>>>
>>>> I was running the Flink job using the Yarn session mode with the
>>>> following command.
>>>>
>>>> export HADOOP_CLASSPATH=`hadoop classpath` &&
>>>> /usr/lib/flink/bin/yarn-session.sh -jm 7g -tm 7g -s 4 --detached
>>>>
>>>> I didn't have HA setup, but I believe the underlying Yarn application
>>>> caused the crash because if, for some reason, the Flink job failed, the
>>>> Yarn application should still survive. Please correct me if this is not the
>>>> right assumption.
>>>>
>>>> My question is how I should find the root cause in this case and what's
>>>> the recommended way to avoid this going forward?
>>>>
>>>> Thanks.
>>>>
>>>> Thomas
>>>>
>>>

Re: Yarn Application Crashed?

2021-06-28 Thread Piotr Nowojski

Hi,

You should still be able to get the Flink logs via:

> yarn logs -applicationId application_1623861596410_0010

And it should give you more answers about what has happened.

About the Flink and YARN behaviour, have you seen the documentation? [1]
Especially this part:

> Failed containers (including the JobManager) are replaced by YARN. The
maximum number of JobManager container restarts is configured via
yarn.application-attempts (default 1). The YARN Application will fail once
all attempts are exhausted.

?

Best,
Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/yarn/#flink-on-yarn-reference

pon., 28 cze 2021 o 02:26 Thomas Wang  napisał(a):

> Just found some additional info. It looks like one of the EC2 instances
> got terminated at the time the crash happened and this job had 7 Task
> Managers running on that EC2 instance. Now I suspect it's possible
> that when Yarn tried to migrate the Task Managers, there were no idle
> containers as this job was using like 99% of the entire cluster. However in
> that case shouldn't Yarn wait for containers to become available? I'm not
> quite sure how Flink would behave in this case. Could someone provide some
> insights here? Thanks.
>
> Thomas
>
> On Sun, Jun 27, 2021 at 4:24 PM Thomas Wang  wrote:
>
>> Hi,
>>
>> I recently experienced a job crash due to the underlying Yarn application
>> failing for some reason. Here is the only error message I saw. It seems I
>> can no longer see any of the Flink job logs.
>>
>> Application application_1623861596410_0010 failed 1 times (global limit
>> =2; local limit is =1) due to ApplicationMaster for attempt
>> appattempt_1623861596410_0010_01 timed out. Failing the application.
>>
>> I was running the Flink job using the Yarn session mode with the
>> following command.
>>
>> export HADOOP_CLASSPATH=`hadoop classpath` &&
>> /usr/lib/flink/bin/yarn-session.sh -jm 7g -tm 7g -s 4 --detached
>>
>> I didn't have HA setup, but I believe the underlying Yarn application
>> caused the crash because if, for some reason, the Flink job failed, the
>> Yarn application should still survive. Please correct me if this is not the
>> right assumption.
>>
>> My question is how I should find the root cause in this case and what's
>> the recommended way to avoid this going forward?
>>
>> Thanks.
>>
>> Thomas
>>
>

Re: Looking for example code

2021-06-28 Thread Piotr Nowojski

Have you seen the documents that I linked? Isn't it enough?

First pular link that I posted [4] has some example code. Literally the
first link inside the second pulsar blog I referenced [5] leads to the
pulsar connector repository which also has some examples [6].

Piotrek

[6] https://github.com/streamnative/pulsar-flink/

pon., 28 cze 2021 o 17:08 Thomas Raef 
napisał(a):

> I need it to connect to Pulsar and stream from Pulsar. I could not find
> any code on how to connect to Pulsar. I've done the WordCount, but I need
> sample code for how to connect to Pulsar.
>
> Thomas J. Raef
> Founder, WeWatchYourWebsite.com
> http://wewatchyourwebsite.com
> tr...@wewatchyourwebsite.com
> LinkedIn <https://www.linkedin.com/in/thomas-raef-74b93a14/>
> Facebook <https://www.facebook.com/WeWatchYourWebsite>
>
>
>
> On Mon, Jun 28, 2021 at 8:54 AM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> We are glad that you want to try out Flink, but if you would like to get
>> help you need to be a bit more specific. What are you exactly doing, and
>> what, on which step exactly and how is not working (including logs and/or
>> error messages) is necessary for someone to help you.
>>
>> In terms of how to start, I would suggest starting with running the
>> WordCount example on your cluster [1]. This one assumes starting a small
>> cluster on your local machine. If you are interested in different methods
>> of running Flink cluster check docs -> deployment -> resource providers [2].
>>
>> For developing a simple application and running it from your IDE please
>> take a look at this Fraud Detection with the DataStream API example [3].
>>
>> Regarding the Pulsar connector, I don't know much about it, but there are
>> a couple of resources that I have found via simple search, that might
>> be helpful [4],[5]
>>
>> Piotrek
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/try-flink/local_installation/
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/
>> [3]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/try-flink/datastream/
>> [4]
>> https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.html
>> [5] https://flink.apache.org/2021/01/07/pulsar-flink-connector-270.html
>>
>> pt., 25 cze 2021 o 12:05 traef  napisał(a):
>>
>>> I'm just starting with Flink. I've been trying all the examples online
>>> and none of them work.
>>>
>>> I am not a Java programmer but have been programming since 1982.
>>>
>>> I would like example code to read from a Pulsar topic and output to
>>> another Pulsar topic.
>>>
>>> Pulsar version 2.8.0
>>> Flink version 1.13.1
>>> Scala version 2.11
>>>
>>> Thank you in advance.
>>>
>>

Re: Cancel job error ! Interrupted while waiting for buffer

2021-06-28 Thread Piotr Nowojski

Hi,

It's hard to say from the log fragment, but I presume this task has
correctly switched to "CANCELLED" state and this error should not have been
logged as an ERROR, right? How did you get this stack trace? Maybe it was
logged as a DEBUG message? If not, that would be probably a minor bug in
Flink and can you post a larger fragment of the log including the stack
trace and the log line that has printed it?

In short, this kind of exception is a normal thing to happen and expected
when cancelling a job. If your code is busy and blocked while being
backpressured (as your FlatMap operation was in this particular case),
interrupting the code is a standard thing that Flink is doing. However it
shouldn't bubble up to the end user exactly for this reason - to not
confuse users.

> some TM cancel  success, some TM become cenceling and the TM  will be
kill by itself  with task.cancellation.timeout = 18

This part is a bit confusing to me. The above interruption should actually
prevent this timeout from kicking in and TM shouldn't be killed. Again can
you post larger part of the TM/JM logs or even better, full TM/JM logs?

best,
Piotrek



sob., 26 cze 2021 o 04:59 SmileSmile  napisał(a):

> Hi
>
>I use Flink 1.12.4 on yarn,  job topology is.  kafka -> source ->
> flatmap -> window 1 min agg -> sink -> kafka.  Checkpoint is enable ,
> checkpoint interval is 20s . When I cancel my job,  some TM cancel
> success, some TM become cenceling and the TM  will be kill by itself  with
> task.cancellation.timeout = 18.  the TM log show that
>
> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException:
> Could not forward element to next operator
> at
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:114)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:93)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:50)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> com.operation.ParseLineOperationForAgg.flatMap(ParseLineOperationForAgg.java:74)
> [testFlink-1.0.jar:?]
> at
> com.operation.ParseLineOperationForAgg.flatMap(ParseLineOperationForAgg.java:29)
> [testFlink-1.0.jar:?]
> at
> org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:47)
> [flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:112)
> [flink-dist_2.11-1.12.4.jar:1.12.4]
>
> Caused by: java.io.IOException: Interrupted while waiting for buffer
> at
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.requestNewBufferBuilderFromPool(BufferWritingResultPartition.java:341)
> ~[testFlink-1.0.jar:?]
> at
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.requestNewUnicastBufferBuilder(BufferWritingResultPartition.java:313)
> ~[testFlink-1.0.jar:?]
> at
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.appendUnicastDataForRecordContinuation(BufferWritingResultPartition.java:257)
> ~[testFlink-1.0.jar:?]
> at
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.emitRecord(BufferWritingResultPartition.java:149)
> ~[testFlink-1.0.jar:?]
> at
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:104)
> ~[testFlink-1.0.jar:?]
> at
> org.apache.flink.runtime.io.network.api.writer.ChannelSelectorRecordWriter.emit(ChannelSelectorRecordWriter.java:54)
> ~[testFlink-1.0.jar:?]
> at
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:101)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:87)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:43)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:50)
> ~[flink-dist_2.11-1.12.4.jar:1.12.4]
> at
> com.operation.ExtractLineOperationAgg.flatMap(ExtractLineOperationAgg.java:72)
>

Re: Looking for example code

2021-06-28 Thread Piotr Nowojski

Hi,

We are glad that you want to try out Flink, but if you would like to get
help you need to be a bit more specific. What are you exactly doing, and
what, on which step exactly and how is not working (including logs and/or
error messages) is necessary for someone to help you.

In terms of how to start, I would suggest starting with running the
WordCount example on your cluster [1]. This one assumes starting a small
cluster on your local machine. If you are interested in different methods
of running Flink cluster check docs -> deployment -> resource providers [2].

For developing a simple application and running it from your IDE please
take a look at this Fraud Detection with the DataStream API example [3].

Regarding the Pulsar connector, I don't know much about it, but there are a
couple of resources that I have found via simple search, that might
be helpful [4],[5]

Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/try-flink/local_installation/
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/try-flink/datastream/
[4]
https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.html
[5] https://flink.apache.org/2021/01/07/pulsar-flink-connector-270.html

pt., 25 cze 2021 o 12:05 traef  napisał(a):

> I'm just starting with Flink. I've been trying all the examples online and
> none of them work.
>
> I am not a Java programmer but have been programming since 1982.
>
> I would like example code to read from a Pulsar topic and output to
> another Pulsar topic.
>
> Pulsar version 2.8.0
> Flink version 1.13.1
> Scala version 2.11
>
> Thank you in advance.
>

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-25 Thread Piotr Nowojski

rkList recovered: 1
>> watermarkList recovered: 1
>> watermarkList recovered: 2
>> watermarkList recovered: 2
>> watermarkList recovered: 2
>> maxWatermark: 2
>> maxWatermark: 2
>> processing watermark: 2
>> processing watermark: 2
>> maxWatermark: 2
>> processing watermark: 2
>> processing watermark: 0 // IS IS ALSO PROCESSING THE OTHER WATERMARKS.
>> WHY?
>> processing watermark: 0
>> processing watermark: 0
>> processing watermark: 0
>> Attempts restart: 1
>> processing watermark: 1
>> processing watermark: 1
>> processing watermark: 1
>> processing watermark: 1
>> Attempts restart: 1
>> processing watermark: 2
>> processing watermark: 2
>> processing watermark: 2
>> processing watermark: 2
>> Attempts restart: 1
>> processing watermark: 3
>> processing watermark: 3
>> processing watermark: 3
>> processing watermark: 3
>> Attempts restart: 1
>> processing watermark: 9223372036854775807
>> processing watermark: 9223372036854775807
>> processing watermark: 9223372036854775807
>> processing watermark: 9223372036854775807
>> This is a poison but we do NOT throw an exception because the reference
>> time passed :) [2021-06-21 16:57:22.849] >= [2021-06-21 16:57:21.672]
>> Attempts restart: 1
>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.836 sec
>>
>>
>> On Fri, Jun 18, 2021 at 2:46 PM Piotr Nowojski 
>> wrote:
>>
>>> I'm glad I could help, I hope it will solve your problem :)
>>>
>>> Best,
>>> Piotrek
>>>
>>> pt., 18 cze 2021 o 14:38 Felipe Gutierrez 
>>> napisał(a):
>>>
>>>> On Fri, Jun 18, 2021 at 1:41 PM Piotr Nowojski 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Keep in mind that this is a quite low level approach to this problem.
>>>>> It would be much better to make sure that after recovery watermarks are
>>>>> still being emitted.
>>>>>
>>>>
>>>> yes. Indeed it looks like a very low level. I did a small test to emit
>>>> one watermark for the stream that was recovered and then it can process
>>>> the join. It has the same behavior on using a CoGroupFunction nad a
>>>> CoProcessFunction. So in the end I don't need to implement
>>>> MyCoProcessFunction with checkpoint. I just need to emit a new watermark
>>>> after the job recovers.
>>>>
>>>> In my case, I am using Kafka source. so, if I make Kafka
>>>> keeping emitting watermarks I solve the problem. Otherwise, I have to
>>>> implement this custom operator.
>>>>
>>>> Thanks for your answer!
>>>> Felipe
>>>>
>>>>
>>>>>
>>>>> If you are using a built-in source, it's probably easier to do it in a
>>>>> custom operator. I would try to implement a custom one based on
>>>>> AbstractStreamOperator. Your class would also need to implement the
>>>>> OneInputStreamOperator interface. `processElement` you could implement as
>>>>> an identity function (just pass down the stream element unchanged). In
>>>>> `processWatermark` you would need to store the latest watermark on the
>>>>> `ListState` field (you can declare it inside
>>>>> `AbstractStreamOperator#initializeState` via `context.getListState(new
>>>>> ListStateDescriptor<>("your-field-name", Long.class));`). During normal
>>>>> processing (`processWatermark`) make sure it's a singleton list. During
>>>>> recovery (`AbstractStreamOpeartor#initializeState()`) without rescaling,
>>>>> you would just access this state field and re-emit the only element on 
>>>>> that
>>>>> list. However during recovery, depending if you are scaling up (a) or down
>>>>> (b), you could have a case where you sometimes have either (a) empty list
>>>>> (in that case you can not emit anything), or (b) many elements on the list
>>>>> (in that case you would need to calculate a minimum of all elements).
>>>>>
>>>>> As operator API is not a very oficial one, it's not well documented.
>>>>> For an example you would need to take a look in the Flink code itself by
>>>>> finding existing implementations of the `AbstractStreamOperator` or
>>>>> `OneInputStreamOperator`.
>>>>>
>>>>> Best,
>>>>>

Re: Task is always created state after submit a example job

2021-06-21 Thread Piotr Nowojski

I'm glad that you managed to work it out.

As far as I understand, without specifying the `taskmanager.host`, Task
Manager would try to automatically detect what host/ip address should be
advertised to the Job Manager, using which JM can connect to TM. I don't
know what is your network setup, what are the local network cards, firewall
or routing settings, but either of those can lead to some connection
issues. For example your local machine not being accessible via using
127.0.0.1, but "localhost" working fine. This automatically detected
address is logged in the TM logs "TaskManager will use hostname/address
'{}' ({}) for communication", so if it matters to you, you can check what
was the detected address without specifying `taskmanager.host`, and try to
workout what's wrong with it. But it's most likely not a Flink issue.

Best,
Piotrek

niedz., 20 cze 2021 o 16:17 Lei Wang  napisał(a):

> There's enough slots on the jobmanger UI, but the slots are not available.
>
> After i add  taskmanager.host: localhost to flink-conf.yaml, it works.
>
> But i don't know why.
>
> Thanks,
> Lei
>
>
> On Fri, Jun 18, 2021 at 6:07 PM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> I would start by looking at the Job Manager and Task Manager logs. Take a
>> look if Task Managers connected/registered in the Job Manager and if so, if
>> there were no problems when submitting the job. It seems like either there
>> are not enough slots, or slots are actually not available.
>>
>> Best,
>> Piotrek
>>
>> pt., 18 cze 2021 o 05:53 Lei Wang  napisał(a):
>>
>>> flink 1.11.2 on a single host.
>>>
>>> ./bin/start-cluster.sh and then
>>>
>>> ./bin/flink run examples/streaming/SocketWindowWordCount.jar  --hostname
>>> localhost --port 
>>>
>>> But on the jobmanager UI, the task is always in created state.  There's
>>> available  slots.
>>>
>>> Any insights on this?
>>>
>>> Thanks,
>>> Lei
>>>
>>

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski

I'm glad I could help, I hope it will solve your problem :)

Best,
Piotrek

pt., 18 cze 2021 o 14:38 Felipe Gutierrez 
napisał(a):

> On Fri, Jun 18, 2021 at 1:41 PM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> Keep in mind that this is a quite low level approach to this problem. It
>> would be much better to make sure that after recovery watermarks are still
>> being emitted.
>>
>
> yes. Indeed it looks like a very low level. I did a small test to emit one
> watermark for the stream that was recovered and then it can process
> the join. It has the same behavior on using a CoGroupFunction nad a
> CoProcessFunction. So in the end I don't need to implement
> MyCoProcessFunction with checkpoint. I just need to emit a new watermark
> after the job recovers.
>
> In my case, I am using Kafka source. so, if I make Kafka keeping emitting
> watermarks I solve the problem. Otherwise, I have to implement this custom
> operator.
>
> Thanks for your answer!
> Felipe
>
>
>>
>> If you are using a built-in source, it's probably easier to do it in a
>> custom operator. I would try to implement a custom one based on
>> AbstractStreamOperator. Your class would also need to implement the
>> OneInputStreamOperator interface. `processElement` you could implement as
>> an identity function (just pass down the stream element unchanged). In
>> `processWatermark` you would need to store the latest watermark on the
>> `ListState` field (you can declare it inside
>> `AbstractStreamOperator#initializeState` via `context.getListState(new
>> ListStateDescriptor<>("your-field-name", Long.class));`). During normal
>> processing (`processWatermark`) make sure it's a singleton list. During
>> recovery (`AbstractStreamOpeartor#initializeState()`) without rescaling,
>> you would just access this state field and re-emit the only element on that
>> list. However during recovery, depending if you are scaling up (a) or down
>> (b), you could have a case where you sometimes have either (a) empty list
>> (in that case you can not emit anything), or (b) many elements on the list
>> (in that case you would need to calculate a minimum of all elements).
>>
>> As operator API is not a very oficial one, it's not well documented. For
>> an example you would need to take a look in the Flink code itself by
>> finding existing implementations of the `AbstractStreamOperator` or
>> `OneInputStreamOperator`.
>>
>> Best,
>> Piotrek
>>
>> pt., 18 cze 2021 o 12:49 Felipe Gutierrez 
>> napisał(a):
>>
>>> Hello Piotrek,
>>>
>>> On Fri, Jun 18, 2021 at 11:48 AM Piotr Nowojski 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> As far as I can tell timers should be checkpointed and recovered. What
>>>> may be happening is that the state of the last seen watermarks by operators
>>>> on different inputs and different channels inside an input is not
>>>> persisted. Flink is assuming that after the restart, watermark assigners
>>>> will emit newer watermarks after the recovery. However if one of your
>>>> inputs is dormant and it has already emitted some very high watermark long
>>>> time before the failure, after recovery if no new watermark is emitted,
>>>> this input/input channel might be preventing timers from firing. Can you
>>>> check if that's what's happening in your case?
>>>>
>>>
>>> I think you are correct. at least when I reproduce the bug it is like
>>> you said.
>>>
>>>
>>>> If so you would have to make sure one way or another that some
>>>> watermarks will be emitted after recovery. As a last resort, you could
>>>> manually store the watermarks in the operators/sources state and re-emit
>>>> last seen watermark during recovery.
>>>>
>>>
>>> Could you please point how I can checkpoint the watermarks on a source
>>> operator? Is it done by this code below from here (
>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-strategies-and-the-kafka-connector
>>> )?
>>>
>>> FlinkKafkaConsumer kafkaSource = new
>>> FlinkKafkaConsumer<>("myTopic", schema, props);
>>> kafkaSource.assignTimestampsAndWatermarks(
>>> WatermarkStrategy.
>>> .forBoundedOutOfOrderness(Duration.ofSeconds(20)));
>>>
>>> Thanks,
>>> Felipe
>>>
>>>
>>>>
>>>> Best,
>&g

Re: Handling Large Broadcast States

2021-06-18 Thread Piotr Nowojski

Hi,

As far as I know there are no plans to support other state backends with
BroadcastState. I don't know about any particular technical limitation, it
probably just hasn't been done. Also I don't know how much effort that
would be. Probably it wouldn't be easy.

 Timo, can you chip in how for example Table API/SQL is solving this
problem? I'm pretty sure Tablie API is using broadcast joins after all?

Best,
Piotrek

czw., 17 cze 2021 o 02:53 Rion Williams  napisał(a):

> Hey Flink folks,
>
> I was discussing the use of the Broadcast Pattern with some colleagues
> today for a potential enrichment use-case and noticed that it wasn’t
> currently backed by RocksDB. This seems to indicate that it would be solely
> limited to the memory allocated, which might not support a large enrichment
> data set that our use case might run into (thousands of tenants with users
> and various other entities to enrich by).
>
> Are there any plans to eventually add support for BroadcastState to be
> backed by a non-memory source? Or perhaps some technical limitations that
> might not make that possible? If the latter is true, is there a preferred
> pattern for handling enrichment/lookups for a very large set of data that
> may not be memory-bound?
>
> Any advice or thoughts would be welcome!
>
> Rion

Re: The memory usage of the job is very different between Flink1.9 and Flink1.12

2021-06-18 Thread Piotr Nowojski

Hi,

always when upgrading I would suggest to check release notes first [1]

Best,
Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html#memory-management

pt., 18 cze 2021 o 12:24 Haihang Jing  napisał(a):

> Ask a question, the same business logic, the same resource configuration,
> the
> memory usage of the job is very different between Flink1.9 and Flink1.12.
> Using jemalloc analysis, it is found that the
> UncompressBlockContentsForCompressionType method of rocksdb takes up more
> memory and runs the same time , This method occupies 200MB of memory in
> Flink1.9 and about 4G in Flink1.12. Have you ever encountered this
> phenomenon?
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t3050/1.png>
>
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t3050/12.png>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski

Hi,

Keep in mind that this is a quite low level approach to this problem. It
would be much better to make sure that after recovery watermarks are still
being emitted.

If you are using a built-in source, it's probably easier to do it in a
custom operator. I would try to implement a custom one based on
AbstractStreamOperator. Your class would also need to implement the
OneInputStreamOperator interface. `processElement` you could implement as
an identity function (just pass down the stream element unchanged). In
`processWatermark` you would need to store the latest watermark on the
`ListState` field (you can declare it inside
`AbstractStreamOperator#initializeState` via `context.getListState(new
ListStateDescriptor<>("your-field-name", Long.class));`). During normal
processing (`processWatermark`) make sure it's a singleton list. During
recovery (`AbstractStreamOpeartor#initializeState()`) without rescaling,
you would just access this state field and re-emit the only element on that
list. However during recovery, depending if you are scaling up (a) or down
(b), you could have a case where you sometimes have either (a) empty list
(in that case you can not emit anything), or (b) many elements on the list
(in that case you would need to calculate a minimum of all elements).

As operator API is not a very oficial one, it's not well documented. For an
example you would need to take a look in the Flink code itself by finding
existing implementations of the `AbstractStreamOperator` or
`OneInputStreamOperator`.

Best,
Piotrek

pt., 18 cze 2021 o 12:49 Felipe Gutierrez 
napisał(a):

> Hello Piotrek,
>
> On Fri, Jun 18, 2021 at 11:48 AM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> As far as I can tell timers should be checkpointed and recovered. What
>> may be happening is that the state of the last seen watermarks by operators
>> on different inputs and different channels inside an input is not
>> persisted. Flink is assuming that after the restart, watermark assigners
>> will emit newer watermarks after the recovery. However if one of your
>> inputs is dormant and it has already emitted some very high watermark long
>> time before the failure, after recovery if no new watermark is emitted,
>> this input/input channel might be preventing timers from firing. Can you
>> check if that's what's happening in your case?
>>
>
> I think you are correct. at least when I reproduce the bug it is like you
> said.
>
>
>> If so you would have to make sure one way or another that some watermarks
>> will be emitted after recovery. As a last resort, you could manually store
>> the watermarks in the operators/sources state and re-emit last seen
>> watermark during recovery.
>>
>
> Could you please point how I can checkpoint the watermarks on a source
> operator? Is it done by this code below from here (
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-strategies-and-the-kafka-connector
> )?
>
> FlinkKafkaConsumer kafkaSource = new
> FlinkKafkaConsumer<>("myTopic", schema, props);
> kafkaSource.assignTimestampsAndWatermarks(
> WatermarkStrategy.
> .forBoundedOutOfOrderness(Duration.ofSeconds(20)));
>
> Thanks,
> Felipe
>
>
>>
>> Best,
>> Piotrek
>>
>> czw., 17 cze 2021 o 13:46 Felipe Gutierrez 
>> napisał(a):
>>
>>> Hi community,
>>>
>>> I have implemented a join function using CoProcessFunction with
>>> CheckpointedFunction to recover from failures. I added some debug lines to
>>> check if it is restoring and it does. Before the crash, I process events
>>> that fall at processElement2. I create snapshots at snapshotState(), the
>>> application comes back and restores the events. That is fine.
>>>
>>> After the restore, I process events that fall on processElement1. I
>>> register event timers for them as I did on processElement2 before the
>>> crash. But the onTimer() is never called. The point is that I don't have
>>> any events to send to processElement2() to make the CoProcessFunction
>>> register a time for them. They were sent before the crash.
>>>
>>> I suppose that the onTimer() is called only when there are
>>> "timerService.registerEventTimeTimer(endOfWindow);" for processElement1 and
>>> processElement2. Because when I test the same application without crashing
>>> and the CoProcessFunction triggers the onTimer() method.
>>>
>>> But if I have a crash in the middle the CoProcessFunction does not call
>>> onTimer(). Why is that? Is that normal? What do I have to do to make the
>>> CoProcessFunction trigger the onTime() method even if only one stream is
>>> processed let's say at the processElement2() method and the other stream is
>>> restored from a snapshot? I imagine that I have to register a time during
>>> the recovery (initializeState()). But how?
>>>
>>> thanks,
>>> Felipe
>>>
>>

Re: Task is always created state after submit a example job

2021-06-18 Thread Piotr Nowojski

Hi,

I would start by looking at the Job Manager and Task Manager logs. Take a
look if Task Managers connected/registered in the Job Manager and if so, if
there were no problems when submitting the job. It seems like either there
are not enough slots, or slots are actually not available.

Best,
Piotrek

pt., 18 cze 2021 o 05:53 Lei Wang  napisał(a):

> flink 1.11.2 on a single host.
>
> ./bin/start-cluster.sh and then
>
> ./bin/flink run examples/streaming/SocketWindowWordCount.jar  --hostname
> localhost --port 
>
> But on the jobmanager UI, the task is always in created state.  There's
> available  slots.
>
> Any insights on this?
>
> Thanks,
> Lei
>

Re: Flow of events when Flink Iterations are used in DataStream API

2021-06-18 Thread Piotr Nowojski

Hi,

In old Flink versions (prior to 1.9) that would be the case. If operator D
emitted a record to Operator B, but Operator B hasn't yet processed when
checkpoint is happening, this record would be lost during recovery.
Operator D would be recovered with it's state as it was after emitting this
record, but the record would never be delivered to Operator B.

However, since Flink 1.9, iterators are not working with checkpointing on
even a deeper level [1] and currently there are no plans to address this
issue. We are working on providing better API for iterations in the future.

Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-22326

pt., 18 cze 2021 o 08:21 Varun Chakravarthy Senthilnathan <
varun_senthinat...@infosys.com> napisał(a):

> Hi All,
>
>
>
> We have a sample flow like below :
>
>
>
> Operator A
>
>
>
> Operator B
>
>
>
> Operator C
>
>
>
> Operator D
>
>
>
> Operator E
>
>
>
> We have implemented iterations where the result of code done in Operator C
> is checked in Operator D and conditionally pushed back into Operator B. Now
> according to this stackoverflow answer (
> https://stackoverflow.com/questions/54681200/how-does-flink-treat-checkpoints-and-state-within-iterativestream/54707931#54707931),
> events in the loop could be lost in case of failure.
>
>
>
> I would like to understand what it means to lose events. Let’s say an
> event reaches D and it was pushed to operator B and at this instant, the
> application went down. When the restart happens, will the event be lost as
> in the progress that it had made till Operator D be last it was seen and
> the event cannot be reprocessed again or will it be retriggered from
> Operator A?
>
>
>
> Regards,
>
> Varun.
>

Re: How to make onTimer() trigger on a CoProcessFunction after a failure?

2021-06-18 Thread Piotr Nowojski

Hi,

As far as I can tell timers should be checkpointed and recovered. What may
be happening is that the state of the last seen watermarks by operators on
different inputs and different channels inside an input is not persisted.
Flink is assuming that after the restart, watermark assigners will emit
newer watermarks after the recovery. However if one of your inputs is
dormant and it has already emitted some very high watermark long time
before the failure, after recovery if no new watermark is emitted, this
input/input channel might be preventing timers from firing. Can you check
if that's what's happening in your case?

If so you would have to make sure one way or another that some watermarks
will be emitted after recovery. As a last resort, you could manually store
the watermarks in the operators/sources state and re-emit last seen
watermark during recovery.

Best,
Piotrek

czw., 17 cze 2021 o 13:46 Felipe Gutierrez 
napisał(a):

> Hi community,
>
> I have implemented a join function using CoProcessFunction with
> CheckpointedFunction to recover from failures. I added some debug lines to
> check if it is restoring and it does. Before the crash, I process events
> that fall at processElement2. I create snapshots at snapshotState(), the
> application comes back and restores the events. That is fine.
>
> After the restore, I process events that fall on processElement1. I
> register event timers for them as I did on processElement2 before the
> crash. But the onTimer() is never called. The point is that I don't have
> any events to send to processElement2() to make the CoProcessFunction
> register a time for them. They were sent before the crash.
>
> I suppose that the onTimer() is called only when there are
> "timerService.registerEventTimeTimer(endOfWindow);" for processElement1 and
> processElement2. Because when I test the same application without crashing
> and the CoProcessFunction triggers the onTimer() method.
>
> But if I have a crash in the middle the CoProcessFunction does not call
> onTimer(). Why is that? Is that normal? What do I have to do to make the
> CoProcessFunction trigger the onTime() method even if only one stream is
> processed let's say at the processElement2() method and the other stream is
> restored from a snapshot? I imagine that I have to register a time during
> the recovery (initializeState()). But how?
>
> thanks,
> Felipe
>

Re: Discard checkpoint files through a single recursive call

2021-06-18 Thread Piotr Nowojski

Hi,

Unfortunately at the moment I think there are no plans to push for this. I
would suggest you to bump/cast a vote on
https://issues.apache.org/jira/browse/FLINK-13856 in order to allows us
more accurately prioritise efforts.

Best,
Piotrek

śr., 16 cze 2021 o 05:46 Jiahui Jiang  napisał(a):

> Hello Yun and Guowei,
>
> Thanks for the context! Looks like the plan is to have a Flink config flag
> to enable recursive deletion? Is there any plan to push through this PR in
> the next release? https://github.com/apache/flink/pull/9602
>
>
> Thank you so much!
> Jiahui
> --
> *From:* Yun Tang 
> *Sent:* Tuesday, June 15, 2021 10:27 PM
> *To:* Guowei Ma ; Jiahui Jiang <
> qzhzm173...@hotmail.com>
> *Cc:* user@flink.apache.org 
> *Subject:* Re: Discard checkpoint files through a single recursive call
>
> Hi Jiang,
>
> Please take a look at FLINK-17860 and FLINK-13856 for previous discussion
> of this problem.
>
> [1] https://issues.apache.org/jira/browse/FLINK-17860
> [2] https://issues.apache.org/jira/browse/FLINK-13856
>
> Best
> Yun Tang
>
> --
> *From:* Guowei Ma 
> *Sent:* Wednesday, June 16, 2021 8:40
> *To:* Jiahui Jiang 
> *Cc:* user@flink.apache.org 
> *Subject:* Re: Discard checkpoint files through a single recursive call
>
> hi, Jiang
>
> I am afraid of misunderstanding what you mean, so can you elaborate on how
> you want to change it? For example, which interface or class do you want to
> add a method to?
> Although I am not a state expert, as far as I know, due to incremental
> checkpoints, when CompleteCheckpoint is discarding, it is necessary to call
> the discardState method of each State.
>
> Best,
> Guowei
>
>
> On Tue, Jun 15, 2021 at 7:37 AM Jiahui Jiang 
> wrote:
>
> Hello Flink!
>
> We are building an infrastructure where we implement our own
> CompletedCheckpointStore. The read and write to the external storage
> location of these checkpoints are through HTTP calls to an external service.
>
> Recently we noticed some checkpoint file cleanup performance issue when
> the job writes out a very high number of checkpoint files per checkpoint.
> (In our case we had a few hundreds of operators and ran with 16
> parallelism)
> During checkpoint state discard phase, since the implementation in
> CompletedCheckpoint discards the state files one by one, we are seeing a
> very high number of remote calls. Sometimes the deletion fails to catch up
> with the checkpoint progress.
>
> Given the interface we are given to configure the external storage
> location for checkpoints is always a `target directory`. Would it be
> reasonable to expose an implementation of discard() that directly calls
> disposeStorageLocation with recursive set to true, without iterating over
> each individual files first? Is there any blockers for that?
>
> Thank you!
>
>
> links
>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java#L240
>
> https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsCompletedCheckpointStorageLocation.java#L70
>
>

Re: Web UI shows my AssignTImestamp is in high back pressure but in/outPoolUsage are both 0.

2021-06-18 Thread Piotr Nowojski

Hi Haocheng,

Regarding the first part, yes. For a very long time there was a trivial bug
that was displaying the maximum "backpressure status" ("HIGH" in your case)
from all of the subtasks, for every subtask, instead of showing the
subtask's individual status. [1]  It is/will be fixed in Flink 1.11.4,
1.12.4, 1.13.1, 1.14.0.

Also please note, that starting from 1.13.0, Flink has a much better, more
user friendly tools for analysing the source of the backpressure [2]. I
would highly recommend upgrading to it.

About the empty `inPoolUsage`. Keep in mind that this metric is ignoring
local channels [3], which might be hiding the problem. But yes. In
principle, if the upstream subtask has full output buffers, while the
downstream subtasks have empty input buffers, that most likely means there
is a problem in the network exchange. It can be network IO related, maybe
network threads are overloaded (CPU) might be causing that, or maybe some
other issue (GC, encryption/SSL, compression). But that should only happen
in very high throughput jobs, with hundreds of MB/s of network traffic. I
would first rule out if for sure your `Window` is not causing the
backpressure. You could do it by upgrading to Flink 1.13.x and checking the
newly added `busyTimeMsPerSecond` metric. Alternatively you can attach a
CPU profiler to a TaskManager. This is the most reliable way.

Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-22489
[2] https://issues.apache.org/jira/browse/FLINK-14814
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/metrics/#default-shuffle-service

sob., 12 cze 2021 o 12:53 Haocheng Wang  napisał(a):

> Hi, I have a job like 'Source -> assignmentTimestamp -> flatmap ->  Window
> -> Sink' and I get back pressure from 'Source' to the 'FlatMap' operators
> form the 'BackPressure' tab in the Web UI.
> When trying to find which operator is the source of back pressure, I use
> metrics provided by the Web UI, specifically, 'inPoolUsage' and
> 'outPoolUsage'.
> Firstly, As far as I know, when both of the metrics are 0, the operator
> should not be defined as 'back pressured', but when I check the
> 'AssignmentTimestamp' operator, where 8 subtasks running, I find 1 or 2 of
> them have 0 value about the back pressure index, and the others have the
> index higher than 0.80, and all of them are marked  in 'HIGH' status.
> However, the two metrics, 'in/outPoolUsage', are always be 0. So maybe the
> operator is not back pressured actually?  Or is there any problem with my
> Flink WebUI?
> Second question is, from my experience, I think the source of the back
> pressure should be the Window operator because the outPoolUsage of the
> 'FlatMap' are 1, and the 'Window' is the first downstream operator from the
> 'Flatmap', but the inPoolUsage and the outPoolUsage are also 0. So the
> cause of the back pressure should be the network bottleneck between window
> and flatmap? Am I right?
> Thanks for your reading, and I'm looking forward for your ideas.
>
> Haocheng
>

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-17 Thread Piotr Nowojski

Hi Thomas. The bug https://issues.apache.org/jira/browse/FLINK-21028 is
still present in 1.12.1. You would need to upgrade to at least 1.13.0,
1.12.2 or 1.11.4. However as I mentioned before, 1.11.4 hasn't yet been
released. On the other hand both 1.12.2 and 1.13.0 have already been
superseded by more recent minor releases (1.13.1 and 1.12.4 respectively).

Piotre

śr., 16 cze 2021 o 06:01 Thomas Wang  napisał(a):

> Thanks everyone. I'm using Flink on EMR. I just updated to EMR 6.3 which
> uses Flink 1.12.1. I will report back whether this resolves the issue.
>
> Thomas
>
> On Wed, Jun 9, 2021 at 11:15 PM Yun Gao  wrote:
>
>> Very thanks Kezhu for the catch, it also looks to me the same issue as
>> FLINK-21028.
>>
>> --
>> From:Piotr Nowojski 
>> Send Time:2021 Jun. 9 (Wed.) 22:12
>> To:Kezhu Wang 
>> Cc:Thomas Wang ; Yun Gao ; user <
>> user@flink.apache.org>
>> Subject:Re: Re: Re: Re: Failed to cancel a job using the STOP rest API
>>
>> Yes good catch Kezhu, IllegalStateException sounds very much like
>> FLINK-21028.
>>
>> Thomas, could you try upgrading to Flink 1.13.1 or 1.12.4? (1.11.4 hasn't
>> been released yet)?
>>
>> Piotrek
>>
>> wt., 8 cze 2021 o 17:18 Kezhu Wang  napisał(a):
>> Could it be same as FLINK-21028[1] (titled as “Streaming application
>> didn’t stop properly”, fixed in 1.11.4, 1.12.2, 1.13.0) ?
>>
>>
>> [1]: https://issues.apache.org/jira/browse/FLINK-21028
>>
>>
>> Best,
>> Kezhu Wang
>>
>> On June 8, 2021 at 22:54:10, Yun Gao (yungao...@aliyun.com) wrote:
>> Hi Thomas,
>>
>> I tried but do not re-produce the exception yet. I have filed
>> an issue for the exception first [1].
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-22928
>>
>>
>> --Original Mail --
>> *Sender:*Thomas Wang 
>> *Send Date:*Tue Jun 8 07:45:52 2021
>> *Recipients:*Yun Gao 
>> *CC:*user 
>> *Subject:*Re: Re: Re: Failed to cancel a job using the STOP rest API
>> This is actually a very simple job that reads from Kafka and writes to S3
>> using the StreamingFileSink w/ Parquet format. I'm all using Flink's API
>> and nothing custom.
>>
>> Thomas
>>
>> On Sun, Jun 6, 2021 at 6:43 PM Yun Gao  wrote:
>> Hi Thoms,
>>
>> Very thanks for reporting the exceptions, and it seems to be not work as
>> expected to me...
>> Could you also show us the dag of the job ? And does some operators in
>> the source task
>> use multiple-threads to emit records?
>>
>> Best,
>> Yun
>>
>>
>> --Original Mail --
>> *Sender:*Thomas Wang 
>> *Send Date:*Sun Jun 6 04:02:20 2021
>> *Recipients:*Yun Gao 
>> *CC:*user 
>> *Subject:*Re: Re: Failed to cancel a job using the STOP rest API
>> One thing I noticed is that if I set drain = true, the job could be
>> stopped correctly. Maybe that's because I'm using a Parquet file sink which
>> is a bulk-encoded format and only writes to disk during checkpoints?
>>
>> Thomas
>>
>> On Sat, Jun 5, 2021 at 10:06 AM Thomas Wang  wrote:
>> Hi Yun,
>>
>> Thanks for the tips. Yes, I do see some exceptions as copied below. I'm
>> not quite sure what they mean though. Any hints?
>>
>> Thanks.
>>
>> Thomas
>>
>> ```
>> 2021-06-05 10:02:51
>> java.util.concurrent.ExecutionException:
>> org.apache.flink.streaming.runtime.tasks.
>> ExceptionInChainedOperatorException: Could not forward element to next
>> operator
>> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture
>> .java:357)
>> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:
>> 1928)
>> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
>> .quiesceTimeServiceAndCloseOperator(StreamOperatorWrapper.java:161)
>> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
>> .close(StreamOperatorWrapper.java:130)
>> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
>> .close(StreamOperatorWrapper.java:134)
>> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
>> .close(StreamOperatorWrapper.java:80)
>> at org.apache.flink.streaming.runtime.tasks.OperatorChain
>> .closeOperators(OperatorChain.java:302)
>> at org.apache.flink.streaming.runtime.tasks.StreamTask.afterInvoke(
>> StreamTask.java:576)
>> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
>> StreamTask.java:544)
>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.flink.streaming.runtime.tasks.
>> ExceptionInChainedOperatorException: Could not forward element to next
>> operator
>> at org.apache.flink.streaming.runtime.tasks.
>> OperatorChain$ChainingOutput.emitWatermark(OperatorChain.java:642)
>> at org.apache.flink.streaming.api.operators.CountingOutput
>> .emitWatermark(CountingOutput.java:41)
>> at org.apache.flink.streaming.runtime.operators.
>>

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-09 Thread Piotr Nowojski

Yes good catch Kezhu, IllegalStateException sounds very much like
FLINK-21028.

Thomas, could you try upgrading to Flink 1.13.1 or 1.12.4? (1.11.4 hasn't
been released yet)?

Piotrek

wt., 8 cze 2021 o 17:18 Kezhu Wang  napisał(a):

> Could it be same as FLINK-21028[1] (titled as “Streaming application
> didn’t stop properly”, fixed in 1.11.4, 1.12.2, 1.13.0) ?
>
>
> [1]: https://issues.apache.org/jira/browse/FLINK-21028
>
>
> Best,
> Kezhu Wang
>
> On June 8, 2021 at 22:54:10, Yun Gao (yungao...@aliyun.com) wrote:
>
> Hi Thomas,
>
> I tried but do not re-produce the exception yet. I have filed
> an issue for the exception first [1].
>
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-22928
>
>
> --Original Mail --
> *Sender:*Thomas Wang 
> *Send Date:*Tue Jun 8 07:45:52 2021
> *Recipients:*Yun Gao 
> *CC:*user 
> *Subject:*Re: Re: Re: Failed to cancel a job using the STOP rest API
>
>> This is actually a very simple job that reads from Kafka and writes to S3
>> using the StreamingFileSink w/ Parquet format. I'm all using Flink's API
>> and nothing custom.
>>
>> Thomas
>>
>> On Sun, Jun 6, 2021 at 6:43 PM Yun Gao  wrote:
>>
>>> Hi Thoms,
>>>
>>> Very thanks for reporting the exceptions, and it seems to be not work as
>>> expected to me...
>>> Could you also show us the dag of the job ? And does some operators in
>>> the source task
>>> use multiple-threads to emit records?
>>>
>>> Best,
>>> Yun
>>>
>>>
>>> --Original Mail --
>>> *Sender:*Thomas Wang 
>>> *Send Date:*Sun Jun 6 04:02:20 2021
>>> *Recipients:*Yun Gao 
>>> *CC:*user 
>>> *Subject:*Re: Re: Failed to cancel a job using the STOP rest API
>>>
 One thing I noticed is that if I set drain = true, the job could be
 stopped correctly. Maybe that's because I'm using a Parquet file sink which
 is a bulk-encoded format and only writes to disk during checkpoints?

 Thomas

 On Sat, Jun 5, 2021 at 10:06 AM Thomas Wang  wrote:

> Hi Yun,
>
> Thanks for the tips. Yes, I do see some exceptions as copied below.
> I'm not quite sure what they mean though. Any hints?
>
> Thanks.
>
> Thomas
>
> ```
> 2021-06-05 10:02:51
> java.util.concurrent.ExecutionException:
> org.apache.flink.streaming.runtime.tasks.
> ExceptionInChainedOperatorException: Could not forward element to
> next operator
> at java.util.concurrent.CompletableFuture.reportGet(
> CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture
> .java:1928)
> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
> .quiesceTimeServiceAndCloseOperator(StreamOperatorWrapper.java:161)
> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
> .close(StreamOperatorWrapper.java:130)
> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
> .close(StreamOperatorWrapper.java:134)
> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
> .close(StreamOperatorWrapper.java:80)
> at org.apache.flink.streaming.runtime.tasks.OperatorChain
> .closeOperators(OperatorChain.java:302)
> at org.apache.flink.streaming.runtime.tasks.StreamTask
> .afterInvoke(StreamTask.java:576)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(
> StreamTask.java:544)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.streaming.runtime.tasks.
> ExceptionInChainedOperatorException: Could not forward element to
> next operator
> at org.apache.flink.streaming.runtime.tasks.
> OperatorChain$ChainingOutput.emitWatermark(OperatorChain.java:642)
> at org.apache.flink.streaming.api.operators.CountingOutput
> .emitWatermark(CountingOutput.java:41)
> at org.apache.flink.streaming.runtime.operators.
> TimestampsAndWatermarksOperator$WatermarkEmitter.emitWatermark(
> TimestampsAndWatermarksOperator.java:165)
> at org.apache.flink.api.common.eventtime.
> BoundedOutOfOrdernessWatermarks.onPeriodicEmit(
> BoundedOutOfOrdernessWatermarks.java:69)
> at org.apache.flink.streaming.runtime.operators.
> TimestampsAndWatermarksOperator.close(TimestampsAndWatermarksOperator
> .java:125)
> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
> .lambda$closeOperator$5(StreamOperatorWrapper.java:205)
> at org.apache.flink.streaming.runtime.tasks.
> StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor
> .runThrowing(StreamTaskActionExecutor.java:92)
> at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper
> .closeOperator(StreamOperatorWrapper.java:203)
> at

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-08 Thread Piotr Nowojski

Re-adding user mailing list

Hey Alex,

In that case I can see two scenarios that could lead to missing files. Keep
in mind that incremental checkpoints are referencing previous checkpoints
in order to minimise the size of the checkpoint (roughly speaking only
changes since the previous checkpoint are being
persisted/uploaded/written). Checkpoint number 42, can reference an
arbitrary number of previous checkpoints. I suspect that somehow, some of
those previously referenced checkpoints got deleted and removed. Also keep
in mind that savepoints (as of now) are never incremental, they are always
full checkpoints. However externalised checkpoints can be incremental. Back
to the scenarios:
1. You might have accidentally removed some older checkpoints from your
Job2, maybe thinking they are no longer needed. Maybe you have just kept
this single externalised checkpoint directory from steps T3 or T4,
disregarding that this externalised checkpoint might be referencing
previous checkpoints of Job2?
2. As I mentioned, Flink is automatically maintaining reference counts of
the used files and deletes them when they are no longer used/referenced.
However this works only within a single job/cluster. For example if between
steps T3 and T4, you restarted Job2 and let it run for a bit, it could take
more checkpoints that would subsume files that were still part of the
externalised checkpoint that you previously used to start Job3/Job4. Job2
would have no idea that Job3/Job4 exist, let alone that they are
referencing some files from Job2, and those files could have been deleted
as soon as Job2 was no longer using/referencing them.

Could one of those happen in your case?

Best, Piotrek

pon., 7 cze 2021 o 20:01 Alexander Filipchik 
napisał(a):

> Yes, we do use incremental checkpoints.
>
> Alex
>
> On Mon, Jun 7, 2021 at 3:12 AM Piotr Nowojski 
> wrote:
>
>> Hi Alex,
>>
>> A quick question. Are you using incremental checkpoints?
>>
>> Best, Piotrek
>>
>> sob., 5 cze 2021 o 21:23  napisał(a):
>>
>>> Small correction, in T4 and T5 I mean Job2, not Job 1 (as job 1 was save
>>> pointed).
>>>
>>> Thank you,
>>> Alex
>>>
>>> On Jun 4, 2021, at 3:07 PM, Alexander Filipchik 
>>> wrote:
>>>
>>> 
>>> Looked through the logs and didn't see anything fishy that indicated an
>>> exception during checkpointing.
>>> To make it clearer, here is the timeline (we use unaligned checkpoints,
>>> and state size around 300Gb):
>>>
>>> T1: Job1 was running
>>> T2: Job1 was savepointed, brought down and replaced with Job2.
>>> T3: Attempts to savepoint Job2 failed (timed out). Job2 was cancelled,
>>> brought down and replaced by Job3 that was restored from extarnilized
>>> checkpoint of Job2
>>> T3: Attempts to savepoint Job3 failed (timed out). Job3 was cancelled,
>>> brought down and replaced by Job4 that was restored from extarnilized
>>> checkpoint of Job3
>>> T4: We realized that jobs were timing out to savepoint due to local disk
>>> throttling. We provisioned disk with more throughput and IO. Job4 was
>>> cancelled, Job4 was deployed and restored from externilized checkpoint of
>>> Job3, but failed as it couldn't find some files in the folder that belongs
>>> to the checkpoint of *Job1*
>>> T5: We tried to redeploy and restore from checkpoints of Job3 and Job2,
>>> but all the attempts failed on reading files from the *folder that
>>> belongs to the checkpoint of Job1*
>>>
>>> We checked the content of the folder containing checkpoints of Job1, and
>>> it has files. Not sure what is pointing tho missing files and what could've
>>> removed them.
>>>
>>> Any way we can figure out what could've happened? Is there a tool that
>>> can read the checkpoint and check whether it is valid?
>>>
>>> Alex
>>>
>>> On Thu, Jun 3, 2021 at 2:12 PM Alexander Filipchik 
>>> wrote:
>>>
>>>> On the checkpoints -> what kind of issues should I check for? I was
>>>> looking for metrics and it looks like they were reporting successful
>>>> checkpoints. It looks like some files were removed in the shared folder,
>>>> but I'm not sure how to check for what caused it.
>>>>
>>>> Savepoints were failing due to savepoint timeout timeout. Based on
>>>> metrics, our attached disks were not fast enough (GCS regional disks are
>>>> network disks and were throttled). The team cancelled the savepoint and
>>>> just killed the kubernetes cluster. I assume some checkpoints were
>>>> interrupted as th

Re: recover from svaepoint

2021-06-07 Thread Piotr Nowojski

.internals.TransactionManager$State.READY"));
>>>>
>>>> invoke(
>>>> transactionManager,
>>>> "transitionTo",
>>>> getEnum(
>>>>
>>>> "org.apache.kafka.clients.producer.internals.TransactionManager$State.IN_TRANSACTION"));
>>>> setField(transactionManager, "transactionStarted", true);
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>> public TransactionManager(LogContext logContext,
>>>>   String transactionalId,
>>>>   int transactionTimeoutMs,
>>>>   long retryBackoffMs,
>>>>   ApiVersions apiVersions) {
>>>> this.producerIdAndEpoch = ProducerIdAndEpoch.NONE;
>>>> this.transactionalId = transactionalId;
>>>> this.log = logContext.logger(TransactionManager.class);
>>>> this.transactionTimeoutMs = transactionTimeoutMs;
>>>> this.transactionCoordinator = null;
>>>> this.consumerGroupCoordinator = null;
>>>> this.newPartitionsInTransaction = new HashSet<>();
>>>> this.pendingPartitionsInTransaction = new HashSet<>();
>>>> this.partitionsInTransaction = new HashSet<>();
>>>> this.pendingRequests = new PriorityQueue<>(10,
>>>> Comparator.comparingInt(o -> o.priority().priority));
>>>> this.pendingTxnOffsetCommits = new HashMap<>();
>>>> this.partitionsWithUnresolvedSequences = new HashMap<>();
>>>> this.partitionsToRewriteSequences = new HashSet<>();
>>>> this.retryBackoffMs = retryBackoffMs;
>>>> this.topicPartitionBookkeeper = new TopicPartitionBookkeeper();
>>>> this.apiVersions = apiVersions;
>>>> }
>>>>
>>>>
>>>>
>>>> public class ProducerIdAndEpoch {
>>>> public static final ProducerIdAndEpoch NONE = new
>>>> ProducerIdAndEpoch(RecordBatch.NO_PRODUCER_ID,
>>>> RecordBatch.NO_PRODUCER_EPOCH);
>>>>
>>>> public final long producerId;
>>>> public final short epoch;
>>>>
>>>> public ProducerIdAndEpoch(long producerId, short epoch) {
>>>> this.producerId = producerId;
>>>> this.epoch = epoch;
>>>> }
>>>>
>>>> public boolean isValid() {
>>>> return RecordBatch.NO_PRODUCER_ID < producerId;
>>>> }
>>>>
>>>> @Override
>>>> public String toString() {
>>>> return "(producerId=" + producerId + ", epoch=" + epoch + ")";
>>>> }
>>>>
>>>> @Override
>>>> public boolean equals(Object o) {
>>>> if (this == o) return true;
>>>> if (o == null || getClass() != o.getClass()) return false;
>>>>
>>>> ProducerIdAndEpoch that = (ProducerIdAndEpoch) o;
>>>>
>>>> if (producerId != that.producerId) return false;
>>>> return epoch == that.epoch;
>>>> }
>>>>
>>>> @Override
>>>> public int hashCode() {
>>>> int result = (int) (producerId ^ (producerId >>> 32));
>>>> result = 31 * result + (int) epoch;
>>>> return result;
>>>> }
>>>>
>>>> }
>>>>
>>>> （2）In the second step,
>>>> recoverAndAbort(FlinkKafkaProducer.KafkaTransactionState transaction), when
>>>> initializing the transaction, producerId and epoch in the first step
>>>> pollute ProducerIdAndEpoch.NONE. Therefore, when an initialization request
>>>> is sent to Kafka, the values of the producerId and epoch  variables in the
>>>> request parameter ProducerIdAndEpoch.NONE are equal to the values of the
>>>> producerId and epoch  variables in the first transaction commit, not equal
>>>> to - 1, - 1. So Kafka throws an exception:
>>>> Unexpected error in InitProducerIdResponse; Producer attempted an
>>>> operation with an old epoch. Either there is a newer producer with the same
>>>> transactionalId, or the producer's transaction has been expired by the
>>>> broker.
>>>

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-07 Thread Piotr Nowojski

Hi Alex,

A quick question. Are you using incremental checkpoints?

Best, Piotrek

sob., 5 cze 2021 o 21:23  napisał(a):

> Small correction, in T4 and T5 I mean Job2, not Job 1 (as job 1 was save
> pointed).
>
> Thank you,
> Alex
>
> On Jun 4, 2021, at 3:07 PM, Alexander Filipchik 
> wrote:
>
> 
> Looked through the logs and didn't see anything fishy that indicated an
> exception during checkpointing.
> To make it clearer, here is the timeline (we use unaligned checkpoints,
> and state size around 300Gb):
>
> T1: Job1 was running
> T2: Job1 was savepointed, brought down and replaced with Job2.
> T3: Attempts to savepoint Job2 failed (timed out). Job2 was cancelled,
> brought down and replaced by Job3 that was restored from extarnilized
> checkpoint of Job2
> T3: Attempts to savepoint Job3 failed (timed out). Job3 was cancelled,
> brought down and replaced by Job4 that was restored from extarnilized
> checkpoint of Job3
> T4: We realized that jobs were timing out to savepoint due to local disk
> throttling. We provisioned disk with more throughput and IO. Job4 was
> cancelled, Job4 was deployed and restored from externilized checkpoint of
> Job3, but failed as it couldn't find some files in the folder that belongs
> to the checkpoint of *Job1*
> T5: We tried to redeploy and restore from checkpoints of Job3 and Job2,
> but all the attempts failed on reading files from the *folder that
> belongs to the checkpoint of Job1*
>
> We checked the content of the folder containing checkpoints of Job1, and
> it has files. Not sure what is pointing tho missing files and what could've
> removed them.
>
> Any way we can figure out what could've happened? Is there a tool that can
> read the checkpoint and check whether it is valid?
>
> Alex
>
> On Thu, Jun 3, 2021 at 2:12 PM Alexander Filipchik 
> wrote:
>
>> On the checkpoints -> what kind of issues should I check for? I was
>> looking for metrics and it looks like they were reporting successful
>> checkpoints. It looks like some files were removed in the shared folder,
>> but I'm not sure how to check for what caused it.
>>
>> Savepoints were failing due to savepoint timeout timeout. Based on
>> metrics, our attached disks were not fast enough (GCS regional disks are
>> network disks and were throttled). The team cancelled the savepoint and
>> just killed the kubernetes cluster. I assume some checkpoints were
>> interrupted as the job triggers them one after another.
>>
>> Is there a known issue with termination during running checkpoint?
>>
>> Btw, we use the Flink Kube operator from Lyft.
>>
>> Alex
>>
>> On Thu, Jun 3, 2021 at 1:24 AM Chesnay Schepler 
>> wrote:
>>
>>> Is there anything in the Flink logs indicating issues with writing the
>>> checkpoint data?
>>> When the savepoint could not be created, was anything logged from Flink?
>>> How did you shut down the cluster?
>>>
>>> On 6/3/2021 5:56 AM, Alexander Filipchik wrote:
>>>
>>> Hi,
>>>
>>> Trying to figure out what happened with our Flink job. We use flink
>>> 1.11.1 and run a job with unaligned checkpoints and Rocks Db backend. The
>>> whole state is around 300Gb judging by the size of savepoints.
>>>
>>> The job ran ok. At some point we tried to deploy new code, but we
>>> couldn't take a save point as they were timing out. It looks like the
>>> reason it was timing out was due to disk throttle (we use google regional
>>> disks).
>>> The new code was deployed using an externalized checkpoint, but it
>>> didn't start as job was failing with:
>>>
>>> Caused by: java.io.FileNotFoundException: Item not found:
>>> 'gs://../app/checkpoints/2834fa1c81dcf7c9578a8be9a371b0d1/shared/3477b236-fb4b-4a0d-be73-cb6fac62c007'.
>>> Note, it is possible that the live version is still available but the
>>> requested generation is deleted.
>>> at com.google.cloud.hadoop.gcsio.GoogleCloudStorageExceptions
>>> .createFileNotFoundException(GoogleCloudStorageExceptions.java:45)
>>> at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.open(
>>> GoogleCloudStorageImpl.java:653)
>>> at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.open(
>>> GoogleCloudStorageFileSystem.java:277)
>>> at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.(
>>> GoogleHadoopFSInputStream.java:78)
>>> at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.open(
>>> GoogleHadoopFileSystemBase.java:620)
>>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>>> at com.css.flink.fs.gcs.moved.HadoopFileSystem.open(HadoopFileSystem
>>> .java:120)
>>> at com.css.flink.fs.gcs.moved.HadoopFileSystem.open(HadoopFileSystem
>>> .java:37)
>>> at org.apache.flink.core.fs.
>>> PluginFileSystemFactory$ClassLoaderFixingFileSystem.open(
>>> PluginFileSystemFactory.java:127)
>>> at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(
>>> SafetyNetWrapperFileSystem.java:85)
>>> at org.apache.flink.runtime.state.filesystem.FileStateHandle
>>> .openInputStream(FileStateHandle.java:69)

Re: recover from svaepoint

2021-06-02 Thread Piotr Nowojski

Hi,

I think there is no generic way. If this error has happened indeed after
starting a second job from the same savepoint, or something like that, user
can change Sink's operator UID.

If this is an issue of intentional recovery from an earlier
checkpoint/savepoint, maybe `FlinkKafkaProducer#setLogFailuresOnly` will be
helpful.

Best, Piotrek

wt., 1 cze 2021 o 15:16 Till Rohrmann  napisał(a):

> The error message says that we are trying to reuse a transaction id that is
> currently being used or has expired.
>
> I am not 100% sure how this can happen. My suspicion is that you have
> resumed a job multiple times from the same savepoint. Have you checked that
> there is no other job which has been resumed from the same savepoint and
> which is currently running or has run and completed checkpoints?
>
> @pnowojski  @Becket Qin  how
> does the transaction id generation ensures that we don't have a clash of
> transaction ids if we resume the same job multiple times from the same
> savepoint? From the code, I do see that we have a TransactionalIdsGenerator
> which is initialized with the taskName and the operator UID.
>
> fyi: @Arvid Heise 
>
> Cheers,
> Till
>
>
> On Mon, May 31, 2021 at 11:10 AM 周瑞  wrote:
>
> > HI:
> >   When "sink.semantic = exactly-once", the following exception is
> > thrown when recovering from svaepoint
> >
> >public static final String KAFKA_TABLE_FORMAT =
> > "CREATE TABLE "+TABLE_NAME+" (\n" +
> > "  "+COLUMN_NAME+" STRING\n" +
> > ") WITH (\n" +
> > "   'connector' = 'kafka',\n" +
> > "   'topic' = '%s',\n" +
> > "   'properties.bootstrap.servers' = '%s',\n" +
> > "   'sink.semantic' = 'exactly-once',\n" +
> > "   'properties.transaction.timeout.ms' =
> > '90',\n" +
> > "   'sink.partitioner' =
> > 'com.woqutench.qmatrix.cdc.extractor.PkPartitioner',\n" +
> > "   'format' = 'dbz-json'\n" +
> > ")\n";
> >   [] - Source: TableSourceScan(table=[[default_catalog, default_database,
> > debezium_source]], fields=[data]) -> Sink: Sink
> > (table=[default_catalog.default_database.KafkaTable], fields=[data]) (1/1
> > )#859 (075273be72ab01bf1afd3c066876aaa6) switched from INITIALIZING to
> > FAILED with failure cause: org.apache.kafka.common.KafkaException:
> > Unexpected error in InitProducerIdResponse; Producer attempted an
> > operation with an old epoch. Either there is a newer producer with the
> > same transactionalId, or the producer's transaction has been expired by
> > the broker.
> > at org.apache.kafka.clients.producer.internals.
> >
> TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager
> > .java:1352)
> > at org.apache.kafka.clients.producer.internals.
> > TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:
> > 1260)
> > at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse
> > .java:109)
> > at org.apache.kafka.clients.NetworkClient.completeResponses(
> > NetworkClient.java:572)
> > at
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:564)
> > at org.apache.kafka.clients.producer.internals.Sender
> > .maybeSendAndPollTransactionalRequest(Sender.java:414)
> > at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender
> > .java:312)
> > at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:
> > 239)
> > at java.lang.Thread.run(Thread.java:748)
> >
>

Re: Customer operator in BATCH execution mode

2021-05-27 Thread Piotr Nowojski

>>
>> Yes, it should be possible to register a timer for Long.MAX_WATERMARK if
>> you want to apply a transformation at the end of each key. You could
>> also use the reduce operation (DataStream#keyBy#reduce) in BATCH mode.
>
> According to [0], timer time is irrelevant since timer will be triggered
> at the end of time right? If that is the case, we can use the same code
> for both streaming and batch mode.

Yes, timers will fire regardless of it's value. However what I
believe Dawid meant, is that if you pick a value not very far from the
future, you are risking that the timer will fire while your job is still
running. Picking MAX_WATERMARK would prevent that from happening.

> Currently, we want to use batch execution mode [0] and historical data
> to build state for our streaming application
(...)
> We hope that in this way, we can rebuild our states with almost the same
code in streaming.

If that's your main purpose, you can also consider using State Processor
API [1] to bootstrap the state of your job. That's after the main purpose
of the State Processor API.

Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/libs/state_processor_api/

śr., 26 maj 2021 o 14:04 ChangZhuo Chen (陳昌倬) 
napisał(a):

> On Wed, May 26, 2021 at 01:03:53PM +0200, Dawid Wysakowicz wrote:
> > Hi,
> >
> > No there is no API in the operator to know which mode it works in. We
> > aim to have separate operators for both modes if required. You can check
> > e.g. how we do it in KeyedBroadcastStateTransformationTranslator[1].
>
> Thanks for the information. We implement this according to Piotrek's
> suggestion.
>
> >
> > Yes, it should be possible to register a timer for Long.MAX_WATERMARK if
> > you want to apply a transformation at the end of each key. You could
> > also use the reduce operation (DataStream#keyBy#reduce) in BATCH mode.
>
> According to [0], timer time is irrelevant since timer will be triggered
> at the end of time right? If that is the case, we can use the same code
> for both streaming and batch mode.
>
> [0]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/execution_mode/
>
>
> >
> > A side note, I don't fully get what you mean by "build state for our
> > streaming application". Bear in mind though you cannot take a savepoint
> > from a job running in the BATCH execution mode. Moreover it uses a
> > different kind of StateBackend. Actually a dummy one, which just
> > imitates a real state backend.
>
> What we plan to do here is:
>
> 1. Load configuration from broadcast event (custom source backed by REST
>API).
> 2. Load historical events as batch mode input (From GCS).
> 3. Use timer to trigger output so that the following will happen:
>a. Serialize keyed states into JSON.
>b. Output to Kafka.
>c. Streaming application consumes data from Kafka, and update its
>   keyed states according to it.
>
> We hope that in this way, we can rebuild our states with almost the same
> code in streaming.
>
>
> --
> ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org
> http://czchen.info/
> Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B
>

Re: Managing Jobs entirely with Flink Monitoring API

2021-05-26 Thread Piotr Nowojski

Glad to hear it!  Thanks for confirming that it works.

Piotrek

śr., 26 maj 2021 o 12:59 Barak Ben Nathan 
napisał(a):

>
>
> Hi Piotrek,
>
>
>
> This is exactly what I was searching for. Thanks!
>
>
>
> Barak
>
>
>
> *From:* Piotr Nowojski 
> *Sent:* Wednesday, May 26, 2021 9:59 AM
> *To:* Barak Ben Nathan 
> *Cc:* user@flink.apache.org
> *Subject:* Re: Managing Jobs entirely with Flink Monitoring API
>
>
>
> *CAUTION*: external source
>
> Hi Barak,
>
>
>
> Before starting the JobManager I don't think there is any API running at
> all. If you want to be able to submit/stop multiple jobs to the same
> cluster session mode is indeed the way to go. But first you need to to
> start the cluster ( start-cluster.sh ) [1]
>
>
>
> Piotrek
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/overview/
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-master%2Fdocs%2Fdeployment%2Fresource-providers%2Fstandalone%2Foverview%2F=04%7C01%7CBarak.BenNathan%40earnix.com%7Ce8aea4aaede641b4b14d08d92013c9f8%7Cae9992508a9f4ae58a5dce9de7084b84%7C0%7C0%7C637576092081936354%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000=dm9qZ%2F1HvxsK7Ye5rBfrWux94GueqhamvESw6VCFqJw%3D=0>
>
>
>
> wt., 25 maj 2021 o 14:10 Barak Ben Nathan 
> napisał(a):
>
>
>
> I want to manage the execution of Flink Jobs programmatically through
> Flink Monitoring API.
>
>
>
> I.e. I want to run/delete jobs ONLY with the
>  POST /jars/:jarid/run
>  POST /jobs/:jobid/stop
> API commands.
>
>
>
> Now, it seems that the Session Mode may fits my needs: “Session Mode: one
> JobManager instance manages multiple jobs sharing the same cluster of
> TaskManagers” (
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.12%2Fdeployment%2F=04%7C01%7CBarak.BenNathan%40earnix.com%7Ce8aea4aaede641b4b14d08d92013c9f8%7Cae9992508a9f4ae58a5dce9de7084b84%7C0%7C0%7C637576092081946311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000=4jrnAkqXfrE6aHZvzJVIHNXvCUWP38DgxoVr3G4DsWA%3D=0>
> )
>
> However, I couldn’t find a way to start the API server (i.e. a JobManager)
> that didn’t already include submitting a JAR file for a job execution.
>
> Any suggestions?
>
> Do not click on links or open attachments unless you recognize the sender.
> Please use the report button if you believe this email is suspicious.
>

Re: Flink 1.11.3 NoClassDefFoundError: Could not initialize class

2021-05-26 Thread Piotr Nowojski

Hi,

Maybe before deleting the pods, you could look inside them and inspect your
job's jar? What classes does it have inside it? The job's jar should be in
a local directory. Or maybe even first inspect the jar before submitting it?

Best, Piotrek

wt., 25 maj 2021 o 17:40 Georgi Stoyanov  napisał(a):

> Hi Piotr, thank you for the fast reply.
>
>
>
> The job is restarting in the same flink session and fails with that
> exception. When I delete the pods (we are using the google cdr, so I just
> kubectl delete FlinkCluster …) and the yaml is applied again, it’s working
> as expected. It looks to me that it’s jar problem, since I just notice it
> started to fail with a class from a internal common library, not only the
> jobs
>
> java.lang.NoClassDefFoundError: Could not initialize
> com.my.organization.core.cfg.PropertiesConfigurationClass
> at
> com.my.organization.core.CassandraSink$1.buildCluster(CassandraSink.java:162)
> at
> org.apache.flink.streaming.connectors.cassandra.ClusterBuilder.getCluster(ClusterBuilder.java:32)
> at
> org.apache.flink.streaming.connectors.cassandra.CassandraSinkBase.open(CassandraSinkBase.java:86)
> at
> org.apache.flink.streaming.connectors.cassandra.CassandraPojoSink.open(CassandraPojoSink.java:106)
> at
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> at
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
> at
> org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:291)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$1(StreamTask.java:506)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:526)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
> at java.lang.Thread.run(Thread.java:748)
>
>
>
> *From:* Piotr Nowojski 
> *Sent:* Tuesday, May 25, 2021 6:18 PM
> *To:* Georgi Stoyanov 
> *Cc:* user@flink.apache.org
> *Subject:* Re: Flink 1.11.3 NoClassDefFoundError: Could not initialize
> class
>
>
>
> Hi Georgi,
>
>
>
> I don't think it's a bug in Flink. It sounds like some problem with
> dependencies or jars in your job. Can you explain a bit more what do you
> mean by:
>
>
>
> > that some of them are constantly restarting with the following
> exception. After restart, everything is working as expected
>
>
>
> constantly restarting, but after a restart everything is working?
>
>
>
> Best,
>
> Piotrek
>
>
>
> wt., 25 maj 2021 o 16:12 Georgi Stoyanov  napisał(a):
>
> Hi all,
>
>
> We have running several Flink jobs on k8s with flink 1.11.3 and recently
> we notice that some of them are constantly restarting with the following
> exception. After restart, everything is working as expected.
> Could this be a bug?
> 2021-05-25 17:04:42
> org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot
> instantiate user function.
> at
> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:275)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.init(OperatorChain.java:126)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:459)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:526)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: unexpected exception type
> at
> java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1750)
> at
> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1280)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:

Re: Managing Jobs entirely with Flink Monitoring API

2021-05-26 Thread Piotr Nowojski

Hi Barak,

Before starting the JobManager I don't think there is any API running at
all. If you want to be able to submit/stop multiple jobs to the same
cluster session mode is indeed the way to go. But first you need to to
start the cluster ( start-cluster.sh ) [1]

Piotrek

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/overview/

wt., 25 maj 2021 o 14:10 Barak Ben Nathan 
napisał(a):

>
>
> I want to manage the execution of Flink Jobs programmatically through
> Flink Monitoring API.
>
>
>
> I.e. I want to run/delete jobs ONLY with the
>  POST /jars/:jarid/run
>  POST /jobs/:jobid/stop
> API commands.
>
>
>
> Now, it seems that the Session Mode may fits my needs: “Session Mode: one
> JobManager instance manages multiple jobs sharing the same cluster of
> TaskManagers” (
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/)
>
> However, I couldn’t find a way to start the API server (i.e. a JobManager)
> that didn’t already include submitting a JAR file for a job execution.
>
> Any suggestions?
>

Re: How to read large amount of data from hive and write to redis, in a batch manner?

2021-05-25 Thread Piotr Nowojski

Hi,

You could always buffer records in your sink function/operator, until a
large enough batch is accumulated and upload the whole batch at once. Note
that if you want to have at-least-once or exactly-once semantics, you would
need to take care of those buffered records in one way or another. For
example you could:
1. Buffer records on some in memory data structure (not Flink's state), and
just make sure that those records are flushed to the underlying sink on
`CheckpointedFunction#snapshotState()` calls
2. Buffer records on Flink's state (heap state backend or rocksdb - heap
state backend would be the fastest with little overhead, but you can risk
running out of memory), and that would easily give you exactly-once. That
way your batch could span multiple checkpoints.
3. Buffer/write records to temporary files, but in that case keep in mind
that those files need to be persisted and recovered in case of failure and
restart.
4. Ignore checkpointing and either always restart the job from scratch or
accept some occasional data loss.

FYI, virtually every connector/sink is internally batching writes to some
extent. Usually by doing option 1.

Piotrek

wt., 25 maj 2021 o 14:50 Yik San Chan 
napisał(a):

> Hi community,
>
> I have a Hive table that stores tens of millions rows of data. In my Flink
> job, I want to process the data in batch manner:
>
> - Split the data into batches, each batch has (maybe) 10,000 rows.
> - For each batch, call a batchPut() API on my redis client to dump in
> Redis.
>
> Doing so in a streaming manner is not expected, as that will cause too
> many round trips between Flink workers and Redis.
>
> Is there a way to do that? I find little clue in Flink docs, since almost
> all APIs feel better suited for streaming processing by default.
>
> Thank you!
>
> Best,
> Yik San
>

Re: Customer operator in BATCH execution mode

2021-05-25 Thread Piotr Nowojski

Hi,

1. I don't know if there is a built-in way of doing it. You can always pass
this information anyway on your own when you are starting the job (via
operator/function's constructors).
2. Yes, I think this should work.

Best,
Piotrek

wt., 25 maj 2021 o 17:05 ChangZhuo Chen (陳昌倬) 
napisał(a):

> Hi,
>
> Currently, we want to use batch execution mode [0] and historical data
> to build state for our streaming application. Due to different between
> batch & streaming mode, we want to check current execution mode in
> custom operator. So our question is:
>
>
> * Is there any API for custom operator to know current execution mode
>   (batch or streaming)?
>
> * If we want to output after all elements of one specific key are
>   processed, can we just use timer since timer is triggered at the end
>   of input [0]?
>
>
> [0]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/execution_mode/
>
> --
> ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org
> http://czchen.info/
> Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B
>

Re: Flink 1.11.3 NoClassDefFoundError: Could not initialize class

2021-05-25 Thread Piotr Nowojski

Hi Georgi,

I don't think it's a bug in Flink. It sounds like some problem with
dependencies or jars in your job. Can you explain a bit more what do you
mean by:

> that some of them are constantly restarting with the following exception.
After restart, everything is working as expected

constantly restarting, but after a restart everything is working?

Best,
Piotrek

wt., 25 maj 2021 o 16:12 Georgi Stoyanov  napisał(a):

> Hi all,
>
>
> We have running several Flink jobs on k8s with flink 1.11.3 and recently
> we notice that some of them are constantly restarting with the following
> exception. After restart, everything is working as expected.
> Could this be a bug?
> 2021-05-25 17:04:42
> org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot
> instantiate user function.
> at
> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:275)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.init(OperatorChain.java:126)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:459)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:526)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: unexpected exception type
> at
> java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1750)
> at
> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1280)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
> at
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:576)
> at
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:562)
> at
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:550)
> at
> org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:511)
> at
> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:260)
> ... 6 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
> at sun.reflect.GeneratedMethodAccessor281.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)
> ... 23 more
> Caused by: java.lang.NoClassDefFoundError: Could not initialize
> com.my.organization.MyPerfectlyWorkingJob
> ... 31 more
>
>

Re: Time needed to read from Kafka source

2021-05-25 Thread Piotr Nowojski

Hi,

That's a throughput of 700 records/second, which should be well below
theoretical limits of any deserializer (from hundreds thousands up to tens
of millions records/second/per single operator), unless your records are
huge or very complex.

Long story short, I don't know of a magic bullet to help you solve your
problem. As always you have two options, either optimise/speed up your
code/job, or scale up.

If you choose the former, think about Flink as just another Java
application. Check metrics and resource usage, and understand what resource
is the problem (cpu? memory? machine is swapping? io?). You might be able
to guess what's your bottleneck (reading from kafka? deserialisation?
something else? Flink itself?) by looking at some of the metrics
(busyTimeMsPerSecond [1] or idleTimeMsPerSecond could help with that), or
you can also simplify your job to bare minimum and test performance of
independent components. Also you can always attach a code profiler and
simply look at what's happening. First identify what's the source of the
bottleneck and then try to understand what's causing it.

Best,
Piotrek

[1] busyTimeMsPerSecond is available since Flink 1.13. Flink 1.13 also
comes with nice tools to analyse bottlenecks in the WebUI (coloring nodes
in the job graph based on busy/back pressured status and Flamegraph
support)

wt., 25 maj 2021 o 15:44 B.B.  napisał(a):

> Hi,
>
> I am in the process of optimizing my job which at the moment by our
> thinking is too slow.
>
> We are deploying job in kubernetes with 1 job manager with 1gb ram and 1
> cpu and 1 task manager with 4gb ram and 2 cpu-s (eg. 2 task slots and
> parallelism of two).
>
> The main problem is one kafka source that has 3,8 million events that we
> have to process.
> As a test we made a simple job that connects to kafka using a custom
> implementation of KafkaDeserializationSchema. There we are using
> ObjectMapper that mapps input values eg.
>
> *var event = objectMapper.readValue(consumerRecord.value(),
> MyClass.class);*
>
> This is then validated with hibernate validator and output of this
> source is printed on the console.
>
> The time needed for the job to consume all the events was one and a half
> hours, which seems a bit long.
> Is there a way we can speed up this process?
>
> Is more cpu cores or memory solution?
> Should we switch to avro deserialization schema?
>
>
>
>

Re: yarn ship from s3

2021-05-25 Thread Piotr Nowojski

Hi Vijay,

I'm not sure if I understand your question correctly. You have jar and
configs (1, 2, 3 and 4) on S3 and you want to start a Flink job using
those? Can you simply download those things (whole directory containing
those) to the machine that will be starting the Flink job?

Best, Piotrek

wt., 25 maj 2021 o 07:50 Vijayendra Yadav 
napisał(a):

> Hi Team,
>
> I am trying to find a way to ship files from aws s3 for a flink streaming
> job, I am running on AWS EMR. What i need to ship are following:
> 1) application jar
> 2) application property file
> 3) custom flink-conf.yaml
> 4) log4j application specific
>
> Please let me know options.
>
> Thanks,
> Vijay
>

Re: DataStream API in Batch Mode job is timing out, please advise on how to adjust the parameters.

2021-05-25 Thread Piotr Nowojski

Hi Marco,

How are you starting the job? For example, are you using Yarn as the
resource manager? It looks like there is just enough resources in the
cluster to run this job. Assuming the cluster is correctly configured and
Task Managers are able to connect with the Job Manager (can you share full
JM/TM logs?), I would say your job is simply too large (32 parallelism?)
for the given configuration.

Best,
Piotrek

wt., 25 maj 2021 o 06:10 Marco Villalobos 
napisał(a):

> I am running with one job manager and three task managers.
>
> Each task manager is receiving at most 8 gb of data, but the job is timing
> out.
>
> What parameters must I adjust?
>
> Sink: back fill db sink) (15/32) (50626268d1f0d4c0833c5fa548863abd)
> switched from SCHEDULED to FAILED on [unassigned resource].
> java.util.concurrent.CompletionException:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Slot request bulk is not fulfillable! Could not allocate the required slot
> within slot request timeout
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> ~[?:1.8.0_282]
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> ~[?:1.8.0_282]
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> ~[?:1.8.0_282]
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> ~[?:1.8.0_282]
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> ~[?:1.8.0_282]
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> ~[?:1.8.0_282]
> at
> org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:223)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at
> org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:168)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at
> org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at
> org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_282]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_282]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:442)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:209)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:159)
> ~[feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.actor.Actor.aroundReceive(Actor.scala:517)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.actor.Actor.aroundReceive$(Actor.scala:515)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> [feature-LUM-3882-toledo--850a6747.jar:?]
> at

Re: [ANNOUNCE] Apache Flink 1.12.3 released

2021-05-04 Thread Piotr Nowojski

Yes, thanks a lot for driving this release Arvid :)

Piotrek

czw., 29 kwi 2021 o 19:04 Till Rohrmann  napisał(a):

> Great to hear. Thanks a lot for being our release manager Arvid and to
> everyone who has contributed to this release!
>
> Cheers,
> Till
>
> On Thu, Apr 29, 2021 at 4:11 PM Arvid Heise  wrote:
>
>> Dear all,
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.12.3, which is the third bugfix release for the Apache Flink
>> 1.12 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2021/04/29/release-1.12.3.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12349691
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>> Regards,
>>
>> Your friendly release manager Arvid
>>
>

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Piotr Nowojski

Hi Dylan,

But if you are running your query in Streaming mode, aren't you counting
retractions from the FULL JOIN? AFAIK in Streaming mode in FULL JOIN, when
the first record comes in it will be immediately emitted with NULLs (not
matched, as the other table is empty). Later if a matching record is
received from the second table, the previous result will be retracted and
the new one, updated, will be re-emitted. Maybe this is what you are
observing in the varying output?

Maybe you could try to analyse how the results differ between different
runs?

Best,
Piotrek

śr., 14 kwi 2021 o 16:22 Dylan Forciea  napisał(a):

> I replaced the FIRST_VALUE with MAX to ensure that the results should be
> identical even in their content, and my problem still remains – I end up
> with a nondeterministic count of records being emitted into the sink when
> the parallelism is over 1, and that count is about 20-25% short (and not
> consistent) of what comes out consistently when parallelism is set to 1.
>
>
>
> Dylan
>
>
>
> *From: *Dylan Forciea 
> *Date: *Wednesday, April 14, 2021 at 9:08 AM
> *To: *Piotr Nowojski 
> *Cc: *"user@flink.apache.org" 
> *Subject: *Re: Nondeterministic results with SQL job when parallelism is
> > 1
>
>
>
> Pitorek,
>
>
>
> I was actually originally using a group function that WAS deterministic
> (but was a custom UDF I made), but chose something here built in. By
> non-deterministic, I mean that the number of records coming out is not
> consistent. Since the FIRST_VALUE here is on an attribute that is not part
> of the key, that shouldn’t affect the number of records coming out I
> wouldn’t think.
>
>
>
> Dylan
>
>
>
> *From: *Piotr Nowojski 
> *Date: *Wednesday, April 14, 2021 at 9:06 AM
> *To: *Dylan Forciea 
> *Cc: *"user@flink.apache.org" 
> *Subject: *Re: Nondeterministic results with SQL job when parallelism is
> > 1
>
>
>
> Hi,
>
>
>
> Yes, it looks like your query is non deterministic because of
> `FIRST_VALUE` used inside `GROUP BY`. If you have many different parallel
> sources, each time you run your query your first value might be different.
> If that's the case, you could try to confirm it with even smaller query:
>
>
>
>SELECT
>   id2,
>   FIRST_VALUE(attr) AS attr
> FROM table2
> GROUP BY id2
>
>
>
> Best,
>
> Piotrek
>
>
>
> śr., 14 kwi 2021 o 14:45 Dylan Forciea  napisał(a):
>
> I am running Flink 1.12.2, and I was trying to up the parallelism of my
> Flink SQL job to see what happened. However, once I did that, my results
> became nondeterministic. This happens whether I set the
> table.exec.resource.default-parallelism config option or I set the default
> local parallelism to something higher than 1. I would end up with less
> records in the end, and each time I ran the output record count would come
> out differently.
>
>
>
> I managed to distill an example, as pasted below (with attribute names
> changed to protect company proprietary info), that causes the issue. I feel
> like I managed to get it to happen with a LEFT JOIN rather than a FULL
> JOIN, but the distilled version wasn’t giving me wrong results with that.
> Maybe it has to do with joining to a table that was formed using a GROUP
> BY? Can somebody tell if I’m doing something that is known not to work, or
> if I have run across a bug?
>
>
>
> Regards,
>
> Dylan Forciea
>
>
>
>
>
> object Job {
>
>   def main(args: Array[String]): Unit = {
>
> StreamExecutionEnvironment.setDefaultLocalParallelism(1)
>
>
>
> val settings = EnvironmentSettings
> .newInstance().useBlinkPlanner().inStreamingMode().build()
>
> val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
>
> val streamTableEnv = StreamTableEnvironment.create(streamEnv,
> settings)
>
>
>
> val configuration = streamTableEnv.getConfig().getConfiguration()
>
> configuration.setInteger("table.exec.resource.default-parallelism", 16
> )
>
>
>
> streamEnv.setRuntimeMode(RuntimeExecutionMode.BATCH);
>
>
>
> streamTableEnv.executeSql(
>
>   """
>
>   CREATE TABLE table1 (
>
> id1 STRING PRIMARY KEY NOT ENFORCED,
>
> attr STRING
>
>   ) WITH (
>
> 'connector' = 'jdbc',
>
> 'url' = 'jdbc:postgresql://…',
>
> 'table-name' = 'table1’,
>
> 'username' = 'username',
>
> 'password' = 'password',
>
> 'scan.fetch-size' = '500',
>
> 'scan.auto-commit' = 'false'
>
>   )""")
>

Re: flink1.12.2 "Failed to execute job"

2021-04-14 Thread Piotr Nowojski

Hi,

I haven't found anything strange in the logs (I've received logs in a
separate message). It looks like the problem is that split enumeration is
taking a long time, and currently this is being done in the Job Manager's
main thread, blocking other things from executing. For the time being I
think the only thing you can do is to either speed up the split enumeration
(probably difficult) or increase the timeouts that are failing. I don't
know if there is some other workaround at the moment (Becket?).

Piotrek

śr., 14 kwi 2021 o 15:57 Piotr Nowojski  napisał(a):

> Hey,
>
> could you provide full logs from both task managers and job managers?
>
> Piotrek
>
> śr., 14 kwi 2021 o 15:43 太平洋 <495635...@qq.com> napisał(a):
>
>> After submit job, I received 'Failed to execute job' error. And the time
>> between initialization and scheduling last 214s. What has happened
>> during this period?
>>
>> version: flink: 1.12.2
>> deployment: k8s standalone
>> logs:
>>
>> 2021-04-14 12:47:58,547 WARN org.apache.flink.streaming.connectors.kafka.
>> FlinkKafkaProducer [] - Property [transaction.timeout.ms] not specified.
>> Setting it to 360 ms
>> 2021-04-14 12:48:04,175 INFO
>> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor
>> [] - Job 1276000e99efdb77bdae0df88ab91da3 is submitted.
>> 2021-04-14 12:48:04,175 INFO
>> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor
>> [] - Submitting Job with JobId=1276000e99efdb77bdae0df88ab91da3.
>> 2021-04-14 12:48:04,249 INFO org.apache.flink.runtime.dispatcher.
>> StandaloneDispatcher [] - Received JobGraph submission 
>> 1276000e99efdb77bdae0df88ab91da3
>> (Prediction Program).
>> 2021-04-14 12:48:04,249 INFO org.apache.flink.runtime.dispatcher.
>> StandaloneDispatcher [] - Submitting job 1276000e99efdb77bdae0df88ab91da3
>> (Prediction Program).
>> 2021-04-14 12:48:04,250 INFO org.apache.flink.runtime.rpc.akka.
>> AkkaRpcService [] - Starting RPC endpoint for
>> org.apache.flink.runtime.jobmaster.JobMaster at
>> akka://flink/user/rpc/jobmanager_8 .
>> 2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
>> [] - Initializing job Prediction Program (1276000e99
>> efdb77bdae0df88ab91da3).
>> 2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
>> [] - Using restart back off time strategy NoRestartBackoffTimeStrategy
>> for Prediction Program (1276000e99efdb77bdae0df88ab91da3).
>> 2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
>> [] - Running initialization on master for job Prediction Program (
>> 1276000e99efdb77bdae0df88ab91da3).
>> 2021-04-14 12:48:04,252 INFO org.apache.flink.runtime.jobmaster.JobMaster
>> [] - Successfully ran initialization on master in 0 ms.
>> 2021-04-14 12:48:04,254 INFO org.apache.flink.runtime.scheduler.adapter.
>> DefaultExecutionTopology [] - Built 10 pipelined regions in 0 ms
>> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
>> [] - Using application-defined state backend:
>> org.apache.flink.streaming.api.operators.sorted.state.
>> BatchExecutionStateBackend@3ea8cd5a
>> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.checkpoint.
>> CheckpointCoordinator [] - No checkpoint found during restore.
>> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
>> [] - Using failover strategy
>> org.apache.flink.runtime.executiongraph.failover.flip1.
>> RestartPipelinedRegionFailoverStrategy@26845997 for Prediction Program (
>> 1276000e99efdb77bdae0df88ab91da3).
>> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.
>> JobManagerRunnerImpl [] - JobManager runner for job Prediction Program (
>> 1276000e99efdb77bdae0df88ab91da3) was granted leadership with session id
>> ---- at akka.tcp://flink@flink
>> -jobmanager:6123/user/rpc/jobmanager_8.
>> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
>> [] - Starting execution of job Prediction Program 
>> (1276000e99efdb77bdae0df88ab91da3)
>> under job master id .
>> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.source.coordinator.
>> SourceCoordinator [] - Starting split enumerator for source Source:
>> TableSourceScan(table=[[default_catalog, default_database, cpu_util,
>> filter=[], project=[instance_id, value, timestamp]]], fields=[instance_id,
>> value, timestamp]) -> Calc(select=[instance_id, value, timestamp], 
>> where=[(timestamp
>> > 1618145278)]) -> SinkConversionToDataPo

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Piotr Nowojski

Hi,

Yes, it looks like your query is non deterministic because of `FIRST_VALUE`
used inside `GROUP BY`. If you have many different parallel sources, each
time you run your query your first value might be different. If that's the
case, you could try to confirm it with even smaller query:

   SELECT
  id2,
  FIRST_VALUE(attr) AS attr
FROM table2
GROUP BY id2

Best,
Piotrek

śr., 14 kwi 2021 o 14:45 Dylan Forciea  napisał(a):

> I am running Flink 1.12.2, and I was trying to up the parallelism of my
> Flink SQL job to see what happened. However, once I did that, my results
> became nondeterministic. This happens whether I set the
> table.exec.resource.default-parallelism config option or I set the default
> local parallelism to something higher than 1. I would end up with less
> records in the end, and each time I ran the output record count would come
> out differently.
>
>
>
> I managed to distill an example, as pasted below (with attribute names
> changed to protect company proprietary info), that causes the issue. I feel
> like I managed to get it to happen with a LEFT JOIN rather than a FULL
> JOIN, but the distilled version wasn’t giving me wrong results with that.
> Maybe it has to do with joining to a table that was formed using a GROUP
> BY? Can somebody tell if I’m doing something that is known not to work, or
> if I have run across a bug?
>
>
>
> Regards,
>
> Dylan Forciea
>
>
>
>
>
> object Job {
>
>   def main(args: Array[String]): Unit = {
>
> StreamExecutionEnvironment.setDefaultLocalParallelism(1)
>
>
>
> val settings = EnvironmentSettings
> .newInstance().useBlinkPlanner().inStreamingMode().build()
>
> val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
>
> val streamTableEnv = StreamTableEnvironment.create(streamEnv,
> settings)
>
>
>
> val configuration = streamTableEnv.getConfig().getConfiguration()
>
> configuration.setInteger("table.exec.resource.default-parallelism", 16
> )
>
>
>
> streamEnv.setRuntimeMode(RuntimeExecutionMode.BATCH);
>
>
>
> streamTableEnv.executeSql(
>
>   """
>
>   CREATE TABLE table1 (
>
> id1 STRING PRIMARY KEY NOT ENFORCED,
>
> attr STRING
>
>   ) WITH (
>
> 'connector' = 'jdbc',
>
> 'url' = 'jdbc:postgresql://…',
>
> 'table-name' = 'table1’,
>
> 'username' = 'username',
>
> 'password' = 'password',
>
> 'scan.fetch-size' = '500',
>
> 'scan.auto-commit' = 'false'
>
>   )""")
>
>
>
> streamTableEnv.executeSql(
>
>   """
>
>   CREATE TABLE table2 (
>
> attr STRING,
>
> id2 STRING
>
>   ) WITH (
>
> 'connector' = 'jdbc',
>
> 'url' = 'jdbc:postgresql://…',
>
> 'table-name' = 'table2',
>
> 'username' = 'username',
>
> 'password' = 'password',
>
> 'scan.fetch-size' = '500',
>
> 'scan.auto-commit' = 'false'
>
>   )""")
>
>
>
> streamTableEnv.executeSql(
>
>   """
>
>   CREATE TABLE table3 (
>
> attr STRING PRIMARY KEY NOT ENFORCED,
>
> attr_mapped STRING
>
>   ) WITH (
>
> 'connector' = 'jdbc',
>
> 'url' = 'jdbc:postgresql://…',
>
> 'table-name' = ‘table3',
>
> 'username' = ‘username',
>
> 'password' = 'password',
>
> 'scan.fetch-size' = '500',
>
> 'scan.auto-commit' = 'false'
>
>   )""")
>
>
>
> streamTableEnv.executeSql("""
>
>   CREATE TABLE sink (
>
> id STRING PRIMARY KEY NOT ENFORCED,
>
> attr STRING,
>
> attr_mapped STRING
>
>   ) WITH (
>
> 'connector' = 'jdbc',
>
> 'url' = 'jdbc:postgresql://…,
>
> 'table-name' = 'sink',
>
> 'username' = 'username',
>
> 'password' = 'password',
>
> 'scan.fetch-size' = '500',
>
> 'scan.auto-commit' = 'false'
>
>   )""")
>
>
>
> val view =
>
>   streamTableEnv.sqlQuery("""
>
>   SELECT
>
> COALESCE(t1.id1, t2.id2) AS id,
>
> COALESCE(t2.attr, t1.attr) AS operator,
>
> COALESCE(t3.attr_mapped, t2.attr, t1.attr) AS attr_mapped
>
>   FROM table1 t1
>
>   FULL JOIN (
>
> SELECT
>
>   id2,
>
>   FIRST_VALUE(attr) AS attr
>
> FROM table2
>
> GROUP BY id2
>
>   ) t2
>
>ON (t1.id1 = t2.id2)
>
>   LEFT JOIN table3 t3
>
> ON (COALESCE(t2.attr, t1.attr) = t3.attr)""")
>
> streamTableEnv.createTemporaryView("view", view)
>
>
>
> val statementSet = streamTableEnv.createStatementSet()
>
> statementSet.addInsertSql("""
>
>   INSERT INTO sink SELECT * FROM view
>
> """)
>
>
>
> statementSet.execute().await()
>
>   }
>
> }
>
>
>
>
>

Re: flink1.12.2 "Failed to execute job"

2021-04-14 Thread Piotr Nowojski

Hey,

could you provide full logs from both task managers and job managers?

Piotrek

śr., 14 kwi 2021 o 15:43 太平洋 <495635...@qq.com> napisał(a):

> After submit job, I received 'Failed to execute job' error. And the time
> between initialization and scheduling last 214s. What has happened during
> this period?
>
> version: flink: 1.12.2
> deployment: k8s standalone
> logs:
>
> 2021-04-14 12:47:58,547 WARN org.apache.flink.streaming.connectors.kafka.
> FlinkKafkaProducer [] - Property [transaction.timeout.ms] not specified.
> Setting it to 360 ms
> 2021-04-14 12:48:04,175 INFO
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor
> [] - Job 1276000e99efdb77bdae0df88ab91da3 is submitted.
> 2021-04-14 12:48:04,175 INFO
> org.apache.flink.client.deployment.application.executors.EmbeddedExecutor
> [] - Submitting Job with JobId=1276000e99efdb77bdae0df88ab91da3.
> 2021-04-14 12:48:04,249 INFO org.apache.flink.runtime.dispatcher.
> StandaloneDispatcher [] - Received JobGraph submission 
> 1276000e99efdb77bdae0df88ab91da3
> (Prediction Program).
> 2021-04-14 12:48:04,249 INFO org.apache.flink.runtime.dispatcher.
> StandaloneDispatcher [] - Submitting job 1276000e99efdb77bdae0df88ab91da3
> (Prediction Program).
> 2021-04-14 12:48:04,250 INFO org.apache.flink.runtime.rpc.akka.
> AkkaRpcService [] - Starting RPC endpoint for
> org.apache.flink.runtime.jobmaster.JobMaster at
> akka://flink/user/rpc/jobmanager_8 .
> 2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
> [] - Initializing job Prediction Program (1276000e99
> efdb77bdae0df88ab91da3).
> 2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
> [] - Using restart back off time strategy NoRestartBackoffTimeStrategy for
> Prediction Program (1276000e99efdb77bdae0df88ab91da3).
> 2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
> [] - Running initialization on master for job Prediction Program (
> 1276000e99efdb77bdae0df88ab91da3).
> 2021-04-14 12:48:04,252 INFO org.apache.flink.runtime.jobmaster.JobMaster
> [] - Successfully ran initialization on master in 0 ms.
> 2021-04-14 12:48:04,254 INFO org.apache.flink.runtime.scheduler.adapter.
> DefaultExecutionTopology [] - Built 10 pipelined regions in 0 ms
> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
> [] - Using application-defined state backend:
> org.apache.flink.streaming.api.operators.sorted.state.
> BatchExecutionStateBackend@3ea8cd5a
> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.checkpoint.
> CheckpointCoordinator [] - No checkpoint found during restore.
> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
> [] - Using failover strategy
> org.apache.flink.runtime.executiongraph.failover.flip1.
> RestartPipelinedRegionFailoverStrategy@26845997 for Prediction Program (
> 1276000e99efdb77bdae0df88ab91da3).
> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.
> JobManagerRunnerImpl [] - JobManager runner for job Prediction Program (
> 1276000e99efdb77bdae0df88ab91da3) was granted leadership with session id
> ---- at akka.tcp://flink@flink-jobmanager:
> 6123/user/rpc/jobmanager_8.
> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
> [] - Starting execution of job Prediction Program 
> (1276000e99efdb77bdae0df88ab91da3)
> under job master id .
> 2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.source.coordinator.
> SourceCoordinator [] - Starting split enumerator for source Source:
> TableSourceScan(table=[[default_catalog, default_database, cpu_util,
> filter=[], project=[instance_id, value, timestamp]]], fields=[instance_id,
> value, timestamp]) -> Calc(select=[instance_id, value, timestamp], 
> where=[(timestamp
> > 1618145278)]) -> SinkConversionToDataPoint -> Map.
> org.apache.flink.util.FlinkException: Failed to execute job 'Prediction
> Program'. at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1918)
> at
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)
> at
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1782)
> at com.jd.app.StreamingJob.main(StreamingJob.java:265) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349)
> at
>

1 2 3 4 5 6 7 >

1 - 100 of 607 matches

Mail list logo