Please provide the full Exception stack trace and the configuration of
your job (parallelism, number of stateful operators).
Have you tried using the gcs-connector in isolation? This may not be an
issue with Flink.
On 28.11.2018 07:01, prakhar_mathur wrote:
I am trying to run flink on kubernetes, and trying to push checkpoints to
Google Cloud Storage. Below is the docker file
`FROM flink:1.6.2-hadoop28-scala_2.11-alpine
RUN wget -O lib/gcs-connector-latest-hadoop2.jar
https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
RUN wget -O lib/gcs-connector-latest-hadoop2.jar
https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
&& \
wget
http://ftp.fau.de/apache/flink/flink-1.6.2/flink-1.6.2-bin-hadoop28-scala_2.11.tgz
&& \
tar xf flink-1.6.2-bin-hadoop28-scala_2.11.tgz && \
mv flink-1.6.2/lib/flink-shaded-hadoop2* lib/ && \
rm -r flink-1.6.2*`
But the checkpoints are taking around 2-3 seconds on average and around 25
seconds at max, even the state size is around 100 KB.
Even the jobs are getting restarted with the error
`AsynchronousException{java.lang.Exception: Could not materialize checkpoint
1640 for operator groupBy` and sometimes losing connections with task
managers.
Currently, I have given the heap size of 4096 MB.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/