你好,我们团队在调研Flink相关技术。关于故障重启策略有些困惑
Task 故障恢复 | Apache Flink
1.故障重启是通过什么技术手段触发的,我搜查了很多资料 ,都仅提到重启策略是怎么配置的,但是谁触发的? 它不可能挂掉了自己重启吧?
2.故障重启是Task级别还是作用于TaskManager服务?
感谢并支持Flink开发者们的工作,Thanks!
Hi, I didn't face this issue, and I'm guessing it might have something to
do with the configuration of SSL[1], have you configured the
"security.ssl.rest.enabled" option?
[1]
https://cnightlies.apache.org/flink/flink-docs-master/docs/deployment/security/security-ssl/#configuring-ssl
Jean-Damien
Hi Madan,
Maybe you can check the value of "
*execution.checkpointing.tolerable-failed-checkpoints"*[1] in your
application configuration, and try to increase this value?
[1]
Hi, Madan.
I think there is a root cause of the exception, could you share it ?
BTW, If you don't set a value for
execution.checkpointing.tolerable-failed-checkpoints, I'd recommend you
to set it which could avoid job restart due to some recoverable temporary
problems.
[1]
Hi, Lars.
Could you check whether you have configured the lifecycle of google cloud
storage[1] which is not recommended in the flink checkpoint usage?
[1] https://cloud.google.com/storage/docs/lifecycle
On Fri, Dec 9, 2022 at 2:02 AM Lars Skjærven wrote:
> Hello,
> We had an incident today
Thank you Yanfei for taking this issue as a bug and planning a fix in the
upcoming version.
I have another vulnerability bug coming on our product. It is related to
the "LZ4" compression library version. Can you please take a look at this
link?
https://nvd.nist.gov/vuln/detail/CVE-2019-17543
I
Hello,
We had an incident today with a job that could not restore after crash (for
unknown reason). Specifically, it fails due to a missing checkpoint file.
We've experienced this a total of three times with Flink 1.15.2, but never
with 1.14.x. Last time was during a node upgrade, but that was not
Hi Alexis,
Thanks a lot for your reply & guidance, which makes a lot of sense to me
overall.
Regards,
Salva
On Thu, Dec 8, 2022 at 5:34 PM Alexis Sarda-Espinosa <
sarda.espin...@gmail.com> wrote:
> Hi Salva,
>
> Just to give you further thoughts from another user, I think the "temporal
>
Hi Flink Community,
I have a few questions regarding the new KafkaSource and event time, which I
wasn't able to answer myself via checking the docs, but please point me to the
right pages in case I missed something. I'm not entirely whether my knowledge
entirely holds for the new KafkaSource,
Hi Salva,
Just to give you further thoughts from another user, I think the "temporal
join" semantics are very critical in this use case, and what you implement
for that may not easily generalize to other cases. Because of that, I'm not
sure if you can really define best practices that apply in
Thanks, I'll try that.
On Wed, Dec 7, 2022 at 7:19 PM Yaroslav Tkachenko wrote:
>
> Hi Noel,
>
> It's definitely possible. You need to implement a custom
> KafkaRecordDeserializationSchema: its "deserialize" method gives you a
> ConsumerRecord as an argument so that you can extract Kafka
Just for adding some extra references:
[5]
https://stackoverflow.com/questions/50536364/apache-flink-coflatmapfunction-does-not-process-events-in-event-time-order
[6]
https://stackoverflow.com/questions/61046062/controlling-order-of-processed-elements-within-coprocessfunction-using-custom-so
[7]
hello all,
flink kubernetes
operator在生产环境的配置应该设置多少比较合适呢?目前我直接用helm去部署的,但是看到对应的pod中的两个container是没有设置request和limit
cpu和memory和JVM内存的,我应该设置吗?规模在1000左右的任务应该设置多少会比较合适呢?
best wishes,
tanjialiang.
| |
谭家良
|
|
tanjl_w...@126.com
|
Yes, the wrong button was pushed when replying last time. -.-
Looking into the code once again [1], you're right. It looks like for
"last-state", no job is cancelled but the cluster deployment is just
deleted. I was assuming that the artifacts the documentation about the
JobResultStore resource
Is it possible to disable to dashboard port running, but still api running
on different port? Let us know if we can configure this?
On Mon, 5 Dec, 2022, 6:08 AM naga sudhakar, wrote:
> Any suggestions for these apis to work after applying these configuration?
> Basically I suspect webupload
Hi Matthias,
I think you didn't include the mailing list in your response.
According to my experiments, using last-state means the operator simply
deletes the Flink pods, and I believe that doesn't count as Cancelled, so
the artifacts for blobs and submitted job graphs are not cleaned up. I
Hi Ruibin,
1. Standalone is indeed supported since 1.2 (
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/overview/#cluster-deployment-modes),
I will correct the Known issues, that is just an oversight that we left it
there - thanks for reporting.
2.
Hi community,
I'm looking into the Flink K8s operator documents, and I'm a bit confused
about the following:
1.
The latest document (
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/)
says that Kubernetes standalone mode is not
18 matches
Mail list logo