Hi, Julian

I notice that your configuration
includes "restart-strategy.fixed-delay.attempts: 10". It means that the job
would fail after 10 times failure. So maybe it leads to the job not
restarting again and you could increase this value.
But I am not sure if this is the root cause. So if this does not work I
think you could share the log at that time and the flink version you use.

Best,
Guowei


On Fri, Sep 3, 2021 at 2:00 AM Julian Cardarelli <[email protected]> wrote:

> Hello –
>
>
>
> We have implemented Flink on Kubernetes with Google Cloud Storage in high
> availability configuration as per the below configmap. Everything appears
> to be working normally, state is being saved to GCS.
>
>
>
> However, every now and then – perhaps weekly or every other week, all of
> the submitted jobs are lost and the cluster appears completely reset.
> Perhaps GKE is doing maintenance or something of this nature, but the point
> being that the cluster does not resume from this activity in an operational
> state with all jobs placed into running status.
>
>
>
> Is there something we are missing? Thanks!
>
> -jc
>
>
>
>
>
> apiVersion: v1
>
> kind: ConfigMap
>
> metadata:
>
>   name: flink-config
>
>   labels:
>
>     app: flink
>
> data:
>
>   flink-conf.yaml: |+
>
>     jobmanager.rpc.address: flink-jobmanager
>
>     taskmanager.numberOfTaskSlots: 1
>
>     blob.server.port: 6124
>
>     jobmanager.rpc.port: 6123
>
>     taskmanager.rpc.port: 6122
>
>     jobmanager.heap.size: 1024m
>
>     taskmanager.memory.process.size: 1024m
>
>     kubernetes.cluster-id: cluster1
>
>     high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>
>     high-availability.storageDir: gs://
> storage-uswest.yyy.com/kubernetes-flink
>
>     state.backend: filesystem
>
>     state.checkpoints.dir: gs://
> storage-uswest.yyy.com/kubernetes-checkpoint
>
>     state.savepoints.dir: gs://storage-uswest.yyy.com/kubernetes-savepoint
>
>     execution.checkpointing.interval: 3min
>
>     execution.checkpointing.externalized-checkpoint-retention:
> DELETE_ON_CANCELLATION
>
>     execution.checkpointing.max-concurrent-checkpoints: 1
>
>     execution.checkpointing.min-pause: 0
>
>     execution.checkpointing.mode: EXACTLY_ONCE
>
>     execution.checkpointing.timeout: 10min
>
>     execution.checkpointing.tolerable-failed-checkpoints: 0
>
>     execution.checkpointing.unaligned: false
>
>     restart-strategy: fixed-delay
>
>     restart-strategy.fixed-delay.attempts: 10
>
>     restart-strategy.fixed-delay.delay 10s
>
>
>
>   log4j.properties: |+
>
>     log4j.rootLogger=INFO, file
>
>     log4j.logger.akka=INFO
>
>     log4j.logger.org.apache.kafka=INFO
>
>     log4j.logger.org.apache.hadoop=INFO
>
>     log4j.logger.org.apache.zookeeper=INFO
>
>     log4j.appender.file=org.apache.log4j.FileAppender
>
>     log4j.appender.file.file=${log.file}
>
>     log4j.appender.file.layout=org.apache.log4j.PatternLayout
>
>     log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd
> HH:mm:ss,SSS} %-5p %-60c %x - %m%n
>
>
> log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR,
> file
>
>
>
>
> ___​
> Julian   Cardarelli
> CEO
> T  *(800) 961-1549* <(800)%20961-1549>
> E *[email protected]* <[email protected]>
> *LinkedIn* <https://www.linkedin.com/in/julian-cardarelli/>
> [image: Thentia Website]
> <https://www.thentia.com/?utm_source=signature&utm_medium=banner&utm_campaign=evergreen>
> DISCLAIMER
> ​
> ​Neither Thentia Corporation, nor its directors, officers, shareholders,
> representatives, employees, non-arms length companies, subsidiaries,
> parent, affiliated brands and/or agencies are licensed to provide legal
> advice. This e-mail may contain among other things legal information. We
> disclaim any and all responsibility for the content of this e-mail. YOU
> MUST NOT rely on any of our communications as legal advice. Only a licensed
> legal professional may give you advice. Our communications are never
> provided as legal advice, because we are not licensed to provide legal
> advice nor do we possess the knowledge, skills or capacity to provide legal
> advice. We disclaim any and all responsibility related to any action you
> might take based upon our communications and emphasize the need for you to
> never rely on our communications as the basis of any claim or proceeding.
>
> CONFIDENTIALITY
> ​
> ​This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error please notify the
> system manager. This message contains confidential information and is
> intended only for the individual(s) named. If you are not the named
> addressee(s) you should not disseminate, distribute or copy this e-mail.
> Please notify the sender immediately by e-mail if you have received this
> e-mail by mistake and delete this e-mail from your system. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> information is strictly prohibited.
>
>
> *Disclaimer*
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast, a leader in email security and cyber
> resilience. Mimecast integrates email defenses with brand protection,
> security awareness training, web security, compliance and other essential
> capabilities. Mimecast helps protect large and small organizations from
> malicious activity, human error and technology failure; and to lead the
> movement toward building a more resilient world. To find out more, visit
> our website.
>

Reply via email to