Hey all,

Thanks for the ping, Matthias. I'm not super familiar with the details of @Yang
Wang <danrtsey...@gmail.com>'s operator, to be honest :(. Can you share
some of your FlinkApplication specs?

For the `kubectl logs` command, I believe that just reads stdout from the
container. Which logging framework are you using in your application and
how have you configured it? There's a good guide for configuring the
popular ones in the Flink docs[1]. For instance, if you're using the
default Log4j 2 framework you should configure a ConsoleAppender[2].

Hope that helps a bit,
Austin

[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/advanced/logging/
[2]:
https://logging.apache.org/log4j/2.x/manual/appenders.html#ConsoleAppender

On Tue, May 4, 2021 at 1:59 AM Matthias Pohl <matth...@ververica.com> wrote:

> Hi Fuyao,
> sorry for not replying earlier. The stop-with-savepoint operation
> shouldn't only suspend but terminate the job. Is it that you might have a
> larger state that makes creating the savepoint take longer? Even though,
> considering that you don't experience this behavior with your 2nd solution,
> I'd assume that we could ignore this possibility.
>
> I'm gonna add Austin to the conversation as he worked with k8s operators
> as well already. Maybe, he can also give you more insights on the logging
> issue which would enable us to dig deeper into what's going on with
> stop-with-savepoint.
>
> Best,
> Matthias
>
> On Tue, May 4, 2021 at 4:33 AM Fuyao Li <fuyao...@oracle.com> wrote:
>
>> Hello,
>>
>>
>>
>> Update:
>>
>> I think stopWithSavepoint() only suspend the job. It doesn’t actually
>> terminate (./bin/flink cancel) the job. I switched to cancelWithSavepoint()
>> and it works here.
>>
>>
>>
>> Maybe stopWithSavepoint() should only be used to update the
>> configurations like parallelism? For updating the image, this seems to be
>> not suitable, please correct me if I am wrong.
>>
>>
>>
>> For the log issue, I am still a bit confused. Why it is not available in
>> kubectl logs. How should I get access to it?
>>
>>
>>
>> Thanks.
>>
>> Best,
>>
>> Fuyao
>>
>>
>>
>> *From: *Fuyao Li <fuyao...@oracle.com>
>> *Date: *Sunday, May 2, 2021 at 00:36
>> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com>
>> *Subject: *[External] : Re: StopWithSavepoint() method doesn't work in
>> Java based flink native k8s operator
>>
>> Hello,
>>
>>
>>
>> I noticed that first trigger a savepoint and then delete the deployment
>> might cause the duplicate data issue. That could pose a bad influence to
>> the semantic correctness. Please give me some hints on how to make the
>> stopWithSavepoint() work correctly with Fabric8io Java k8s client to
>> perform this image update operation. Thanks!
>>
>>
>>
>> Best,
>>
>> Fuyao
>>
>>
>>
>>
>>
>>
>>
>> *From: *Fuyao Li <fuyao...@oracle.com>
>> *Date: *Friday, April 30, 2021 at 18:03
>> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com>
>> *Subject: *[External] : Re: StopWithSavepoint() method doesn't work in
>> Java based flink native k8s operator
>>
>> Hello Community, Yang,
>>
>>
>>
>> I have one more question for logging. I also noticed that if I execute
>> kubectl logs  command to the JM. The pods provisioned by the operator can’t
>> print out the internal Flink logs in the kubectl logs. I can only get
>> something like the logs below. No actual flink logs is printed here… Where
>> can I find the path to the logs? Maybe use a sidecar container to get it
>> out? How can I get the logs without checking the Flink WebUI? Also, the sed
>> error makes me confused here. In fact, the application is already up and
>> running correctly if I access the WebUI through Ingress.
>>
>>
>>
>> Reference:
>> https://github.com/wangyang0918/flink-native-k8s-operator/issues/4
>> <https://urldefense.com/v3/__https:/github.com/wangyang0918/flink-native-k8s-operator/issues/4__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptEJWo5vM$>
>>
>>
>>
>>
>>
>> [root@bastion deploy]# kubectl logs -f flink-demo-594946fd7b-822xk
>>
>>
>>
>> sed: couldn't open temporary file /opt/flink/conf/sedh1M3oO: Read-only
>> file system
>>
>> sed: couldn't open temporary file /opt/flink/conf/sed8TqlNR: Read-only
>> file system
>>
>> /docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml:
>> Read-only file system
>>
>> sed: couldn't open temporary file /opt/flink/conf/sedvO2DFU: Read-only
>> file system
>>
>> /docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml:
>> Read-only file system
>>
>> /docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp:
>> Read-only file system
>>
>> Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH
>> -Xmx3462817376 -Xms3462817376 -XX:MaxMetaspaceSize=268435456
>> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
>> -D jobmanager.memory.off-heap.size=134217728b -D
>> jobmanager.memory.jvm-overhead.min=429496736b -D
>> jobmanager.memory.jvm-metaspace.size=268435456b -D
>> jobmanager.memory.heap.size=3462817376b -D
>> jobmanager.memory.jvm-overhead.max=429496736b
>>
>> ERROR StatusLogger No Log4j 2 configuration file found. Using default
>> configuration (logging only errors to the console), or user
>> programmatically provided configurations. Set system property
>> 'log4j2.debug' to show Log4j 2 internal initialization logging. See
>> https://logging.apache.org/log4j/2.x/manual/configuration.html
>> <https://urldefense.com/v3/__https:/logging.apache.org/log4j/2.x/manual/configuration.html__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptpRoiZsE$>
>> for instructions on how to configure Log4j 2
>>
>> WARNING: An illegal reflective access operation has occurred
>>
>> WARNING: Illegal reflective access by
>> org.apache.flink.api.java.ClosureCleaner
>> (file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to field
>> java.util.Properties.serialVersionUID
>>
>> WARNING: Please consider reporting this to the maintainers of
>> org.apache.flink.api.java.ClosureCleaner
>>
>> WARNING: Use --illegal-access=warn to enable warnings of further illegal
>> reflective access operations
>>
>> WARNING: All illegal access operations will be denied in a future release
>>
>>
>>
>>
>>
>> -------- The logs stops here, flink applications logs doesn’t get printed
>> here anymore---------
>>
>>
>>
>> ^C
>>
>> [root@bastion deploy]# kubectl logs -f flink-demo-taskmanager-1-1
>>
>> sed: couldn't open temporary file /opt/flink/conf/sedaNDoNR: Read-only
>> file system
>>
>> sed: couldn't open temporary file /opt/flink/conf/seddze7tQ: Read-only
>> file system
>>
>> /docker-entrypoint.sh: line 75: /opt/flink/conf/flink-conf.yaml:
>> Read-only file system
>>
>> sed: couldn't open temporary file /opt/flink/conf/sedYveZoT: Read-only
>> file system
>>
>> /docker-entrypoint.sh: line 88: /opt/flink/conf/flink-conf.yaml:
>> Read-only file system
>>
>> /docker-entrypoint.sh: line 90: /opt/flink/conf/flink-conf.yaml.tmp:
>> Read-only file system
>>
>> Start command: $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH
>> -Xmx697932173 -Xms697932173 -XX:MaxDirectMemorySize=300647712
>> -XX:MaxMetaspaceSize=268435456
>> org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner -D
>> taskmanager.memory.framework.off-heap.size=134217728b -D
>> taskmanager.memory.network.max=166429984b -D
>> taskmanager.memory.network.min=166429984b -D
>> taskmanager.memory.framework.heap.size=134217728b -D
>> taskmanager.memory.managed.size=665719939b -D taskmanager.cpu.cores=1.0 -D
>> taskmanager.memory.task.heap.size=563714445b -D
>> taskmanager.memory.task.off-heap.size=0b --configDir /opt/flink/conf
>> -Djobmanager.memory.jvm-overhead.min='429496736b'
>> -Dpipeline.classpaths='file:usrlib/quickstart-0.1.jar'
>> -Dtaskmanager.resource-id='flink-demo-taskmanager-1-1'
>> -Djobmanager.memory.off-heap.size='134217728b'
>> -Dexecution.target='embedded'
>> -Dweb.tmpdir='/tmp/flink-web-d7691661-fac5-494e-8154-896b4fe30692'
>> -Dpipeline.jars='file:/opt/flink/usrlib/quickstart-0.1.jar'
>> -Djobmanager.memory.jvm-metaspace.size='268435456b'
>> -Djobmanager.memory.heap.size='3462817376b'
>> -Djobmanager.memory.jvm-overhead.max='429496736b'
>>
>> ERROR StatusLogger No Log4j 2 configuration file found. Using default
>> configuration (logging only errors to the console), or user
>> programmatically provided configurations. Set system property
>> 'log4j2.debug' to show Log4j 2 internal initialization logging. See
>> https://logging.apache.org/log4j/2.x/manual/configuration.html
>> <https://urldefense.com/v3/__https:/logging.apache.org/log4j/2.x/manual/configuration.html__;!!GqivPVa7Brio!PZPkOj4s7du8ItEG-AxKGR2EN6pWDuKfwcjZNKbpLfhXHRD3IoaH6zptpRoiZsE$>
>> for instructions on how to configure Log4j 2
>>
>> WARNING: An illegal reflective access operation has occurred
>>
>> WARNING: Illegal reflective access by
>> org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil
>> (file:/opt/flink/lib/flink-dist_2.11-1.12.1.jar) to method
>> java.nio.DirectByteBuffer.cleaner()
>>
>> WARNING: Please consider reporting this to the maintainers of
>> org.apache.flink.shaded.akka.org.jboss.netty.util.internal.ByteBufferUtil
>>
>> WARNING: Use --illegal-access=warn to enable warnings of further illegal
>> reflective access operations
>>
>> WARNING: All illegal access operations will be denied in a future release
>>
>> Apr 29, 2021 12:58:34 AM oracle.simplefan.impl.FanManager configure
>>
>> SEVERE: attempt to configure ONS in FanManager failed with
>> oracle.ons.NoServersAvailable: Subscription time out
>>
>>
>>
>>
>>
>> -------- The logs stops here, flink applications logs doesn’t get printed
>> here anymore---------
>>
>>
>>
>>
>>
>> Best,
>>
>> Fuyao
>>
>>
>>
>>
>>
>> *From: *Fuyao Li <fuyao...@oracle.com>
>> *Date: *Friday, April 30, 2021 at 16:50
>> *To: *user <user@flink.apache.org>, Yang Wang <danrtsey...@gmail.com>
>> *Subject: *[External] : StopWithSavepoint() method doesn't work in Java
>> based flink native k8s operator
>>
>> Hello Community, Yang,
>>
>>
>>
>> I am trying to extend the flink native Kubernetes operator by adding some
>> new features based on the repo [1]. I wrote a method to release the image
>> update functionality. [2] I added the
>>
>> triggerImageUpdate(oldFlinkApp, flinkApp, effectiveConfig);
>>
>>
>>
>> under the existing method.
>>
>> triggerSavepoint(oldFlinkApp, flinkApp, effectiveConfig);
>>
>>
>>
>>
>>
>> I wrote a function to accommodate the image change behavior.[2]
>>
>>
>>
>> Solution1:
>>
>> I want to use stopWithSavepoint() method to complete the task. However, I
>> found it will get stuck and never get completed. Even if I use get() for
>> the completeableFuture. It will always timeout and throw exceptions. See
>> solution 1 logs [3]
>>
>>
>>
>> Solution2:
>>
>> I tried to trigger a savepoint, then delete the deployment in the code
>> and then create a new application with new image. This seems to work fine.
>> Log link: [4]
>>
>>
>>
>> My questions:
>>
>>    1. Why solution 1 will get stuck? triggerSavepoint()
>>    CompleteableFuture could work here… Why stopWithSavepoint() will always 
>> get
>>    stuck or timeout? Very confused.
>>    2. For Fabric8io library, I am still new to it, did I do anything
>>    wrong in the implementation, maybe I should update the jobStatus? Please
>>    give me some suggestions.
>>    3. For work around solution 2, is there any bad influence I didn’t
>>    notice?
>>
>>
>>
>>
>>
>> [1] https://github.com/wangyang0918/flink-native-k8s-operator
>> <https://urldefense.com/v3/__https:/github.com/wangyang0918/flink-native-k8s-operator__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijNSMY0DI$>
>>
>> [2] https://pastebin.ubuntu.com/p/tQShjmdcJt/
>> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/tQShjmdcJt/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijoiwPw-I$>
>>
>> [3] https://pastebin.ubuntu.com/p/YHSPpK4W4Z/
>> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/YHSPpK4W4Z/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijmgfSmqs$>
>>
>> [4] https://pastebin.ubuntu.com/p/3VG7TtXXfh/
>> <https://urldefense.com/v3/__https:/pastebin.ubuntu.com/p/3VG7TtXXfh/__;!!GqivPVa7Brio!PJIKFBi86alhx1DCxiWp8FkWKToD8XC8tNHFFrYSZj3AKM3zqyiNRjijr_tizPo$>
>>
>>
>>
>> Best,
>>
>> Fuyao
>>
>

Reply via email to