Re: Checkpointing to gcs taking too long

Chesnay Schepler Thu, 29 Nov 2018 02:28:36 -0800

Please provide the full Exception stack trace and the configuration ofyour job (parallelism, number of stateful operators).Have you tried using the gcs-connector in isolation? This may not be anissue with Flink.


On 28.11.2018 07:01, prakhar_mathur wrote:

I am trying to run flink on kubernetes, and trying to push checkpoints to
Google Cloud Storage. Below is the docker file


`FROM flink:1.6.2-hadoop28-scala_2.11-alpine

RUN wget -O lib/gcs-connector-latest-hadoop2.jar
https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar

RUN wget -O lib/gcs-connector-latest-hadoop2.jar
https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
&& \
wget
http://ftp.fau.de/apache/flink/flink-1.6.2/flink-1.6.2-bin-hadoop28-scala_2.11.tgz
&& \
tar xf flink-1.6.2-bin-hadoop28-scala_2.11.tgz && \
mv flink-1.6.2/lib/flink-shaded-hadoop2* lib/  && \
rm -r flink-1.6.2*`

But the checkpoints are taking around 2-3 seconds on average and around 25
seconds at max, even the state size is around 100 KB.

Even the jobs are getting restarted with the error
`AsynchronousException{java.lang.Exception: Could not materialize checkpoint
1640 for operator groupBy` and sometimes losing connections with task
managers.

Currently, I have given the heap size of 4096 MB.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Checkpointing to gcs taking too long

Reply via email to