Hi Parag,

When you restore from a savepoint do you see a line like: "Restoring job
{} from {}" in jobmanagers logs? Is the entire state lost or just part
of it? Could you explain a bit what does your job look like and how do
you check that the state is lost?

Sorry if too obvious, but what are the "accumulators" you refer to? Are
they *State primitives[1] or really constructs that are called
Accumulator[2]? The latter are not checkpointed.

Best,

Dawid

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state

[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/user_defined_functions/#accumulators--counters

On 06/10/2021 08:42, Parag Somani wrote:
> Yes Nico. I have evaluated this.
>
> I have tried below:
>
>  1. Take the savepoint
>  2. Stop the job
>  3. Shutdown the instances
>  4. Started new pod using below command:
>
> /docker-entrypoint.sh "standalone-job" "-Ds3.access-key=${AWS_ACCESS_KEY_ID}" 
> "-Ds3.secret-key=${AWS_SECRET_ACCESS_KEY}"  
> "-Ds3.endpoint=${AWS_S3_ENDPOINT}" 
> "-Dhigh-availability.zookeeper.quorum=${ZOOKEEPER_CLUSTER}" "--job-classname" 
> "com.test.MySpringBootApp" "--fromSavepoint" 
> "s3://s3-health-service-discovery/savepoints" ${args}
>
> I haven't observed any errors during start-up in logs. But the state
> got reset i.e. values stored inside the accumulator got flushed.
>
> On Tue, Oct 5, 2021 at 9:40 PM Nicolaus Weidner
> <nicolaus.weid...@ververica.com
> <mailto:nicolaus.weid...@ververica.com>> wrote:
>
>     Hi Parag,
>
>     I am not so familiar with the setup you are using, but did you
>     check out [1]? Maybe the parameter 
>     [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]] 
>     is what you are looking for?
>
>     Best regards,
>     Nico
>
>     [1] 
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#application-mode-on-docker
>     
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#application-mode-on-docker>
>
>     On Tue, Oct 5, 2021 at 12:37 PM Parag Somani
>     <somanipa...@gmail.com <mailto:somanipa...@gmail.com>> wrote:
>
>         Hello,
>
>         We are currently using Apache flink 1.12.0 deployed on k8s
>         cluster of 1.18 with zk for HA. Due to certain vulnerabilities
>         in container related with few jar(like netty-*, meso), we are
>         forced to upgrade.
>
>         While upgrading flink to 1.14.0, faced NPE,
>         
> https://issues.apache.org/jira/browse/FLINK-23901?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17402570#comment-17402570
>         
> <https://issues.apache.org/jira/browse/FLINK-23901?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17402570#comment-17402570>
>
>         To address it, I have followed steps
>
>          1. savepoint creation
>          2. Stop the job
>          3. Restore from save point where i am facing challenge.
>
>         For step #3 from above, i was able to restore from savepoint
>         mainly because:
>         "|bin/flink run -s :savepointPath [:runArgs]| "
>         It majorly about restarting a jar file uploaded. As our
>         application is based on k8s and running using docker, i was
>         not able to restore it. And because of it, state of variables
>         in accumulator got corrupted and i lost the data in one of env.
>
>         My query is, what is preffered way to restore from savepoint,
>         if application is running on k8s using docker.
>
>         We are using following command to run job manager:
>          /docker-entrypoint.sh "standalone-job" 
> "-Ds3.access-key=${AWS_ACCESS_KEY_ID}" 
> "-Ds3.secret-key=${AWS_SECRET_ACCESS_KEY}"  
> "-Ds3.endpoint=${AWS_S3_ENDPOINT}" 
> "-Dhigh-availability.zookeeper.quorum=${ZOOKEEPER_CLUSTER}" "--job-classname" 
> "<class-name>"  ${args}
>
>         Thank you in advance...!
>
>         -- 
>         Regards,
>         Parag Surajmal Somani.
>
>
>
> -- 
> Regards,
> Parag Surajmal Somani.

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to