Re: Exactly-once sink sync checkpoint stacking time effect

2022-03-29 Thread Filip Karnicki
Thank you very much for your answer. I was able to reduce the number of sinks as you described. That helped a lot, thank you. I think you must be right with regards to (2) - opening a new transaction being the culprit. It's unlikely to be (1) since this behaviour occurs even when there are 0

Re: Pyflink elastic search connectors

2022-03-29 Thread Xingbo Huang
Hi, Are you using datastream api or table api?If you are using the table api, you can use the connector by executing sql[1]. If you are using the datastream api, there is really no es connector api provided, you need to write python wrapper code, but the wrapper code is very simple. The

How to debug Metaspace exception?

2022-03-29 Thread John Smith
Hi running 1.14.4 My tasks manager still fails with java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. I have 2GB of metaspace

Re: Pyflink elastic search connectors

2022-03-29 Thread Sandeep Sharat
Hi, Thank you for the quick responses. We are using the datastream api for pyflink. We are trying to implement a wrapper in python for the same as we speak. Hopefully it will work out.  On Wed, 30 Mar, 2022, 8:02 am Xingbo Huang, wrote: > Hi, > > Are you using datastream api or table api?If

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread huweihua
Hi, Jin Can you provide more information about Flink cluster deployment modes? Is it running in Kubernetes/YARN or standalone mode? Maybe you can use application mode to keeps the environment (network accessibility) always keep same. Application mode will run the user-main method in the

RE: Watermarks event time vs processing time

2022-03-29 Thread Schwalbe Matthias
Hello Hans-Peter, I’m a little confused which version of your code you are testing against: * ProcessingTimeSessionWindows or EventTimeSessionWindows? * did you keep the withIdleness() ?? As said before: * for ProcessingTimeSessionWindows, watermarks play no role * if you keep

Re: Watermarks event time vs processing time

2022-03-29 Thread HG
Hello Matthias, I am still using ProcessingTimeSessionWindow. But it turns out I was wrong. I tested a couple of times and it did not seem to work. But now it does with both watermarkstrategies removed. My apologies.' Regards Hans-Peter This is the code: StreamExecutionEnvironment env

Re: Watermarks event time vs processing time

2022-03-29 Thread HG
Hello Matthias, When I remove all the watermark strategies it does not process anything . For example when I use WatermarkStrategy.noWatermarks() instead of the one I build nothing seems to happen at all. Also when I skip the part where I add wmStrategy to create tuple4dswm: DataStream>

Parallel processing in a 2 node cluster apparently not working

2022-03-29 Thread HG
Hi, I have a 2 node cluster just for testing. When I start the cluster and the job I see that the parallelism is 2 as expected. But only they are both on the same node. When I stop the taskmanager on that node it switches to the other one. But I expected both nodes to have a subtask. See below.

Pyflink elastic search connectors

2022-03-29 Thread Sandeep Sharat
Hello Everyone, I have been working on a streaming application using elasticsearch as the sink. I had achieved it using the java api quite easily. But due to a recent policy change we are moving towards the python api for flink, however we were unable to find any python elastic search connectors

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread 胡伟华
I see, can you provide the startup command for 1.12.3 and 1.14.4? Are these startup commands running on the same node? > 2022年3月29日 下午10:32,Jin Yi 写道: > > it's running in k8s. we're not running in app mode b/c we have many jobs > running in the same flink cluster. > > On Tue, Mar 29, 2022 at

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread 胡伟华
Are you referring to creating Flink cluster on Kubernetes by yaml file? How did you submit the job to Flink cluster? Not via the command line (flink run xxx)? > 2022年3月29日 下午10:38,Jin Yi 写道: > > no they are not. b/c we are using k8s, we use kubectl apply commands with a > yaml file to

Re: SQL Client Kafka (UPSERT?) Sink for confluent-avro

2022-03-29 Thread Georg Heiler
I got it working now: It needs to be specified both for the key and value thanks Am Mo., 28. März 2022 um 13:33 Uhr schrieb Ingo Bürk : > Hi Georg, > > which Flink version are you using? The missing property is for the > avro-confluent format, and if I recall correctly, how these are passed >

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread Jin Yi
it's running in k8s. we're not running in app mode b/c we have many jobs running in the same flink cluster. On Tue, Mar 29, 2022 at 4:29 AM huweihua wrote: > Hi, Jin > > Can you provide more information about Flink cluster deployment modes? Is > it running in Kubernetes/YARN or standalone

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread Jin Yi
no they are not. b/c we are using k8s, we use kubectl apply commands with a yaml file to specify the startup. On Tue, Mar 29, 2022 at 7:37 AM 胡伟华 wrote: > I see, can you provide the startup command for 1.12.3 and 1.14.4? > Are these startup commands running on the same node? > > 2022年3月29日

Re: "Native Kubernetes" sample in Flink documentation fails. JobManager Web Interface is wrongly generated. [Flink 1.14.4]

2022-03-29 Thread Yang Wang
By default, the idle TaskManager will be released after 30s(configured via "resourcemanager.taskmanager-timeout"). If it could not be removed, you need to check the JobManager logs for the root cause. Maybe it does not have enough permission or sth else. Best, Yang Burcu Gul POLAT EGRI

Re: Parallel processing in a 2 node cluster apparently not working

2022-03-29 Thread David Anderson
In this situation, changing your configuration [1] to include cluster.evenly-spread-out-slots: true should change the scheduling behavior to what you are looking for. [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#cluster-evenly-spread-out-slots

Re: Wrong format when passing arguments with space

2022-03-29 Thread David Morávek
Hi Kevin, -dev@f.a.o +user@f.a.o Thanks for the report! I've run some experiments and unfortunately I'm not able to reproduce the behavior you're describing. The bash "$@" expansion seems to work as expected (always receiving correctly expanded unquoted strings in the main class). Can you maybe

Re: Wrong format when passing arguments with space

2022-03-29 Thread David Morávek
cc Kevin On Tue, Mar 29, 2022 at 9:15 AM David Morávek wrote: > Hi Kevin, > > -dev@f.a.o +user@f.a.o > > Thanks for the report! I've run some experiments and unfortunately I'm not > able to reproduce the behavior you're describing. The bash "$@" expansion > seems to work as expected (always

Re: Exactly-once sink sync checkpoint stacking time effect

2022-03-29 Thread Arvid Heise
Hi Filip, two things will impact sync time for Kafka: 1. Flushing all old data [1], in particular flushing all in-flight partitions [2]. However, that shouldn't cause a stacking effect except when the brokers are overloaded on checkpoint. 2. Opening a new transaction [3]. Since all transactions

flink on k8s场景,大家一般如何解决访问hdfs的问题呢。

2022-03-29 Thread yidan zhao
如题,是需要打包hadoop client到镜像中吗。

退订

2022-03-29 Thread 袁超
退订

Re: 实时数据入库怎样过滤中间状态,保证最终一致

2022-03-29 Thread Guo Thompson
可以参考jdbc-connector写mysql的思路,在java里面用hashMap来存,key为 order_id ,然后定时把map的数据刷mysql 18703416...@163.com <18703416...@163.com> 于2022年3月1日周二 14:40写道: > 首先确定 source 事件有 eventTime ,比如 source 的返回类型为 MySource > 示例代码如下: > static class MySource { > Long ts; > String key; > Object object; > } >

Re:flink on k8s场景,大家一般如何解决访问hdfs的问题呢。

2022-03-29 Thread casel.chen
我们是直接使用云存储,像阿里云的oss,没有再搭建hadoop集群。如果flink on k8s的确需要访问hadoop的话,是需要打包hadoop发行包在镜像里面的,配置好core-site.xml, hdfs-site.xml等 在 2022-03-30 12:01:54,"yidan zhao" 写道: >如题,是需要打包hadoop client到镜像中吗。

Re: flink on k8s是否有替代yarn.ship-files的参数

2022-03-29 Thread shimin huang
好的 我了解下 感谢! yu'an huang 于2022年3月28日周一 22:12写道: > 你好, > > > 可以看看这个链接中关于usrlib的介绍(Application mode部分)。 > > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/standalone/docker/#docker-hub-flink-images > > Kubernetes不像yarn一样提供了ship文件的功能。对于Kubernetes

Re: elasticsearch+hbase

2022-03-29 Thread Guo Thompson
hbase和es的数据是怎么实时同步的? 潘明文 于2022年3月1日周二 19:07写道: > HI, >现在环境是CDH > 集群6台下,elasticsearch作为hbase二级索引,如何优化代码使得通过elasticsearch二级索引再查询hbase数据速度优化到0.1秒一下。谢谢。 > > > > >

Re: RocksDB 读 cpu 100% 如何调优

2022-03-29 Thread Guo Thompson
如果rocksDB的状态很大呢?例如:200G,这种开了火焰图经常发现瓶颈也是在rocksDB的get(),这种有优化思路么? Yun Tang 于2022年3月21日周一 14:42写道: > Hi, > > RocksDB 的CPU栈能卡在100%,很有可能是大量解压缩 index/filter block导致的,可以enable partition > index/filter [1] 看看问题是否解决。 > 相关内容也可以参考我之前线下做过的分享 [2] > > > [1] >