)
at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
... 13 more
From: vino yang
Date: Thursday, December 12, 2019 at 8:46 PM
To: Peter Westermann
Cc: user
Subject: Re: State Processor API: StateMigrationException for keyed state
Hi pwestermann,
Can you share the relevant
I recently ran into an issue with our Flink cluster: A zookeeper service deploy
caused a temporary connection loss and triggered a new jobmanager leader
election. Leadership election was successful and our Flink job restarted from
the last checkpoint.
This checkpoint appears to have been taken
We use the feature for removing stateful operators via the
allowNonRestoredState relatively often and it works great. However, there
doesn’t seem to be anything like that for removing state from an existing
operator (that we want to keep).
Say my operator defines a MapState and a ValueState.
I just started testing Flink 1.11.1 and noticed that the Task Managers section
in the UI doesn’t load.
The exception in the log is:
j.i.NotSerializableException:
org.apache.flink.runtime.rest.messages.ResourceProfileInfo
\tat j.i.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
\tat
> listStates();
// Completely remove a state
void dropState(StateDescriptor stateDescriptor);
Thanks,
Peter
From: Congxian Qiu
Date: Thursday, October 29, 2020 at 10:38 AM
To: Robert Metzger
Cc: Peter Westermann , "user@flink.apache.org"
Subject: Re: Feature request: Removing state f
Renaming operators and migrating the state we still need manually is what we
have done in the past. I was just hoping for a more convenient solution.
Peter
From: David Anderson
Date: Friday, October 30, 2020 at 5:55 PM
To: Peter Westermann , "user@flink.apache.org"
Subject: R
/mnt/data is a local disk, so there shouldn’t be any additional latency. I’ll
provide more information when/if this happens again.
Peter
From: Roman Khachatryan
Date: Tuesday, May 25, 2021 at 6:54 PM
To: Peter Westermann
Cc: user@flink.apache.org
Subject: Re: Job recovery issues with state
for task local recovery, those would be in a
different directory (we have configured io.tmp.dirs as /mnt/data/tmp).
Thanks,
Peter
From: Roman Khachatryan
Date: Thursday, May 20, 2021 at 4:54 PM
To: Peter Westermann
Cc: user@flink.apache.org
Subject: Re: Job recovery issues with state restoration
Hello,
I’ve reported issues around checkpoint recovery in case of a job failure due to
zookeeper connection loss in the past. I am still seeing issues occasionally.
This is for Flink 1.12.3 with zookeeper for HA, S3 as the state backend,
incremental checkpoints, and task-local recovery enabled.
election is expected.
Thanks,
Peter
From: Piotr Nowojski
Date: Thursday, September 9, 2021 at 12:39 AM
To: Peter Westermann
Cc: user@flink.apache.org
Subject: Re: Duplicate copies of job in Flink UI/API
Hi Peter,
Can you provide relevant JobManager logs? And can you write down what steps
have you
: Chesnay Schepler
Date: Thursday, September 9, 2021 at 9:11 AM
To: Peter Westermann , Piotr Nowojski
, user@flink.apache.org
Subject: Re: Duplicate copies of job in Flink UI/API
Just to double-check that I'm understanding things correctly:
You have a job with HA, then Zookeeper breaks down, the job
We recently upgraded from Flink 1.12.4 to 1.12.5 and are seeing some weird
behavior after a change in jobmanager leadership: We’re seeing two copies of
the same job, one of those is in SUSPENDED state and has a start time of zero.
Here’s the output from the /jobs/overview endpoint:
{
"jobs":
to the REST interface. If I
requests job data from /v1/jobs/{jobId}, I get the expected response on the
leader but on the other job manager, I only get an exception stack trace:
{"errors":["Internal server error.",""]}
Peter Westermann
Team Lead – Realtime Analytics
[cid
-parent/flink-shaded-zookeeper-35/pom.xml#L47).
Looks like this is not correct if you want to use SSL.
Adding jars for netty-handler and netty-transport-native-epoll to the lib
folder addressed this issue.
Perhaps this could be addressed in the next release for flink-shaded?
Thanks,
Peter
Thanks!
From: Chesnay Schepler
Date: Monday, October 4, 2021 at 9:27 AM
To: Peter Westermann , user
Subject: Re: Missing dependency in flink-shaded-zookeeper-35
Indeed, it looks like the client-server SSL support added in 3.5 is implemented
with netty. I will create a ticket.
On 04/10/2021 15
get the
following error:
{"errors":["Internal server error.",""]}
Peter Westermann
Analytics Software Architect
[cidimage001.jpg@01D78D4C.C00AC080]
peter.westerm...@genesys.com<mailto:peter.westerm...@genesys.com>
[cidimage001.jpg@01D78D4C.C00AC080]
[cid
Just tried this again with Flink 1.14.3 since
https://issues.apache.org/jira/browse/FLINK-24550 is listed as fixed. I am
running into similar errors when calling the /v1/jobs/overview endpoint
(without any running jobs):
{"errors":["Internal server error.",""]}
Peter Westermann
Analytics Software Architect
[cidimage001.jpg@01D78D4C.C00AC080]
peter.westerm...@genesys.com<mailto:peter.westerm...@genesys.com>
[cidimage001.jpg@01D78D4C.C00AC080]
[cidimage002.jpg@01D78D4C.C00AC080]<http://www.genesys.com/>
If it happens it happens immediately. Once we receive the triggerId from
/jobs/:jobid/stop or /jobs/:jobid/savepoints we poll
/jobs/:jobid/savepoints/:triggerid every second until the status is no longer
IN_PROGRESS.
Peter Westermann
Analytics Software Architect
[cidimage001.jpg
retry such operations without triggering multiple savepoints.
Could this have anything to do with the error I am seeing?
Peter Westermann
Analytics Software Architect
[cidimage001.jpg@01D78D4C.C00AC080]
peter.westerm...@genesys.com<mailto:peter.westerm...@genesys.com>
[cid
20 matches
Mail list logo