OK – I will increase the value to something higher and see how it does in 
recovering. Thank you for your help!


___
Julian Cardarelli
CEO
T (800) 961-1549
ejul...@thentia.com
LinkedIn
DISCLAIMER
​
​Neither Thentia Corporation, nor its directors, officers, shareholders, 
representatives, employees, non-arms length companies, subsidiaries, parent, 
affiliated brands and/or agencies are licensed to provide legal advice. This 
e-mail may contain among other things legal information. We disclaim any and 
all responsibility for the content of this e-mail. YOU MUST NOT rely on any of 
our communications as legal advice. Only a licensed legal professional may give 
you advice. Our communications are never provided as legal advice, because we 
are not licensed to provide legal advice nor do we possess the knowledge, 
skills or capacity to provide legal advice. We disclaim any and all 
responsibility related to any action you might take based upon our 
communications and emphasize the need for you to never rely on our 
communications as the basis of any claim or proceeding.    
CONFIDENTIALITY
​
​This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual(s) named. If you are not the named addressee(s) you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.    
From: Guowei Ma <guowei....@gmail.com>
Sent: Thursday, September 2, 2021 11:32 PM
To: Julian Cardarelli <jul...@thentia.com>
Cc: user <user@flink.apache.org>
Subject: [External] Re: Flink on Kubernetes

Hi, Julian

I notice that your configuration includes 
"restart-strategy.fixed-delay.attempts: 10". It means that the job would fail 
after 10 times failure. So maybe it leads to the job not restarting again and 
you could increase this value.
But I am not sure if this is the root cause. So if this does not work I think 
you could share the log at that time and the flink version you use.

Best,
Guowei


On Fri, Sep 3, 2021 at 2:00 AM Julian Cardarelli 
<jul...@thentia.com<mailto:jul...@thentia.com>> wrote:
Hello –

We have implemented Flink on Kubernetes with Google Cloud Storage in high 
availability configuration as per the below configmap. Everything appears to be 
working normally, state is being saved to GCS.

However, every now and then – perhaps weekly or every other week, all of the 
submitted jobs are lost and the cluster appears completely reset. Perhaps GKE 
is doing maintenance or something of this nature, but the point being that the 
cluster does not resume from this activity in an operational state with all 
jobs placed into running status.

Is there something we are missing? Thanks!
-jc


apiVersion: v1
kind: ConfigMap
metadata:
  name: flink-config
  labels:
    app: flink
data:
  flink-conf.yaml: |+
    jobmanager.rpc.address: flink-jobmanager
    taskmanager.numberOfTaskSlots: 1
    blob.server.port: 6124
    jobmanager.rpc.port: 6123
    taskmanager.rpc.port: 6122
    jobmanager.heap.size: 1024m
    taskmanager.memory.process.size: 1024m
    kubernetes.cluster-id: cluster1
    high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
    high-availability.storageDir: 
gs://storage-uswest.yyy.com/kubernetes-flink<https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstorage-uswest.yyy.com%2Fkubernetes-flink&data=04%7C01%7Cjulian%40thentia.com%7C27156a30f4d74f0083a608d96e8b7831%7Caaed208b28414c339a4df5008ba71d0d%7C0%7C0%7C637662367559754751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NBB%2Fik2NdJzpbjvPHxFhB6%2BndkgLJ8qa7tLqUX%2FMbZk%3D&reserved=0>
    state.backend: filesystem
    state.checkpoints.dir: 
gs://storage-uswest.yyy.com/kubernetes-checkpoint<https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstorage-uswest.yyy.com%2Fkubernetes-checkpoint&data=04%7C01%7Cjulian%40thentia.com%7C27156a30f4d74f0083a608d96e8b7831%7Caaed208b28414c339a4df5008ba71d0d%7C0%7C0%7C637662367559754751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7rj%2BTRqnzaGYtKSbs60NWQfjM7BqjdGYSpbYzyr0xsM%3D&reserved=0>
    state.savepoints.dir: 
gs://storage-uswest.yyy.com/kubernetes-savepoint<https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstorage-uswest.yyy.com%2Fkubernetes-savepoint&data=04%7C01%7Cjulian%40thentia.com%7C27156a30f4d74f0083a608d96e8b7831%7Caaed208b28414c339a4df5008ba71d0d%7C0%7C0%7C637662367559764703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=f7Xkp3Rckj%2BieleEaX2lfNBcK4MQy3SHQ6if2stKLgI%3D&reserved=0>
    execution.checkpointing.interval: 3min
    execution.checkpointing.externalized-checkpoint-retention: 
DELETE_ON_CANCELLATION
    execution.checkpointing.max-concurrent-checkpoints: 1
    execution.checkpointing.min-pause: 0
    execution.checkpointing.mode: EXACTLY_ONCE
    execution.checkpointing.timeout: 10min
    execution.checkpointing.tolerable-failed-checkpoints: 0
    execution.checkpointing.unaligned: false
    restart-strategy: fixed-delay
    restart-strategy.fixed-delay.attempts: 10
    restart-strategy.fixed-delay.delay 10s

  log4j.properties: |+
    log4j.rootLogger=INFO, file
    log4j.logger.akka=INFO
    log4j.logger.org.apache.kafka=INFO
    log4j.logger.org.apache.hadoop=INFO
    log4j.logger.org.apache.zookeeper=INFO
    log4j.appender.file=org.apache.log4j.FileAppender
    log4j.appender.file.file=${log.file}
    log4j.appender.file.layout=org.apache.log4j.PatternLayout
    log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} 
%-5p %-60c %x - %m%n
    
log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR,
 file


___​
Julian

Cardarelli
CEO
T
(800) 961-1549<tel:(800)%20961-1549>
E
jul...@thentia.com<mailto:jul...@thentia.com>
LinkedIn<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fjulian-cardarelli%2F&data=04%7C01%7Cjulian%40thentia.com%7C27156a30f4d74f0083a608d96e8b7831%7Caaed208b28414c339a4df5008ba71d0d%7C0%7C0%7C637662367559764703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HeMQDKZRcTQp0I2oUIjWFYe85AgW00bJVf7sdXPIFWE%3D&reserved=0>
[Thentia 
Website]<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thentia.com%2F%3Futm_source%3Dsignature%26utm_medium%3Dbanner%26utm_campaign%3Devergreen&data=04%7C01%7Cjulian%40thentia.com%7C27156a30f4d74f0083a608d96e8b7831%7Caaed208b28414c339a4df5008ba71d0d%7C0%7C0%7C637662367559764703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuDVRVXAX853W7JcfiUWpoQO7%2FkIMSqEAOhvkOp7dG0%3D&reserved=0>
DISCLAIMER
​
​Neither Thentia Corporation, nor its directors, officers, shareholders, 
representatives, employees, non-arms length companies, subsidiaries, parent, 
affiliated brands and/or agencies are licensed to provide legal advice. This 
e-mail may contain among other things legal information. We disclaim any and 
all responsibility for the content of this e-mail. YOU MUST NOT rely on any of 
our communications as legal advice. Only a licensed legal professional may give 
you advice. Our communications are never provided as legal advice, because we 
are not licensed to provide legal advice nor do we possess the knowledge, 
skills or capacity to provide legal advice. We disclaim any and all 
responsibility related to any action you might take based upon our 
communications and emphasize the need for you to never rely on our 
communications as the basis of any claim or proceeding.
CONFIDENTIALITY
​
​This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual(s) named. If you are not the named addressee(s) you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.


Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast, a leader in email security and cyber 
resilience. Mimecast integrates email defenses with brand protection, security 
awareness training, web security, compliance and other essential capabilities. 
Mimecast helps protect large and small organizations from malicious activity, 
human error and technology failure; and to lead the movement toward building a 
more resilient world. To find out more, visit our website.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast, a leader in email security and cyber 
resilience. Mimecast integrates email defenses with brand protection, security 
awareness training, web security, compliance and other essential capabilities. 
Mimecast helps protect large and small organizations from malicious activity, 
human error and technology failure; and to lead the movement toward building a 
more resilient world. To find out more, visit our website.

Reply via email to