Re: Diagnosing bottlenecks in Flink jobs

2021-06-16 Thread Dan Hill
Thanks, JING ZHANG! I have one subtask for one Kafka source that is getting backpressure. Is there an easy way to split a single Kafka partition into multiple subtasks? Or do I need to split the Kafka partition? On Wed, Jun 16, 2021 at 10:29 PM JING ZHANG wrote: > Hi Dan, > Would you please

Re:Re: Re: Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread 东东
把其中一个改成0 在 2021-06-17 13:11:01,"yidan zhao" 写道: >是的,宿主机IP。 > >net.ipv4.tcp_tw_reuse = 1 >net.ipv4.tcp_timestamps = 1 > >东东 于2021年6月17日周四 下午12:52写道: >> >> 10.35.215.18是宿主机IP? >> >> 看一下 tcp_tw_recycle和net.ipv4.tcp_timestamps是什么值 >> 实在不行就 tcpdump 吧 >> >> >> >> 在 2021-06-17 12:41:58,"yidan

Re: Please advise bootstrapping large state

2021-06-16 Thread Marco Villalobos
Thank you very much! I tried using Flink's SQL JDBC connector, and ran into issues. According to the flink documentation, only the old planner is compatible with the DataSet API. When I connect to the table: CREATE TABLE my_table ( ) WITH ( 'connector.type' = 'jdbc', 'connector.url'

Re: Diagnosing bottlenecks in Flink jobs

2021-06-16 Thread JING ZHANG
Hi Dan, Would you please describe what's the problem about your job? High latency or low throughput? Please first check the job throughput and latency . If the job throughput matches the speed of sources producing data and the latency metric is good, maybe the job works well without bottlenecks.

Re: Re: Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
是的,宿主机IP。 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_timestamps = 1 东东 于2021年6月17日周四 下午12:52写道: > > 10.35.215.18是宿主机IP? > > 看一下 tcp_tw_recycle和net.ipv4.tcp_timestamps是什么值 > 实在不行就 tcpdump 吧 > > > > 在 2021-06-17 12:41:58,"yidan zhao" 写道: > >@东东 standalone集群。 随机时间,一会一个的,没有固定规律。

Re:Re: Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread 东东
10.35.215.18是宿主机IP? 看一下 tcp_tw_recycle和net.ipv4.tcp_timestamps是什么值 实在不行就 tcpdump 吧 在 2021-06-17 12:41:58,"yidan zhao" 写道: >@东东 standalone集群。 随机时间,一会一个的,没有固定规律。 和CPU、内存、网络的话有一定规律,但不确认,因为不是很明显。 >我排查过几个exception,时间和网络尖刺对上了,但不全能对上,所以不好说是否有这个原因。 > >此外,有个点我不是很清楚,网上这个报错很少,类似的都是

Diagnosing bottlenecks in Flink jobs

2021-06-16 Thread Dan Hill
We have a job that has been running but none of the AWS resource metrics for the EKS, EC2, MSK and EBS show any bottlenecks. I have multiple 8 cores allocated but only ~2 cores are used. Most of the memory is not consumed. MSK does not show much use. EBS metrics look mostly idle. I assumed

Re: Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
@东东 standalone集群。 随机时间,一会一个的,没有固定规律。 和CPU、内存、网络的话有一定规律,但不确认,因为不是很明显。 我排查过几个exception,时间和网络尖刺对上了,但不全能对上,所以不好说是否有这个原因。 此外,有个点我不是很清楚,网上这个报错很少,类似的都是 RemoteTransportException,然后提示中说taskmager可能已丢失之类。但我的是 LocalTransportException,不清楚netty中这俩错误的含义是不是不一样。目前来看网络上关于这俩异常的资料也查不到什么。 东东 于2021年6月17日周四

Re: Checkpoint is timing out - inspecting state

2021-06-16 Thread Dan Hill
Hi Yun. The UI was not useful for this case. I had a feeling before hand about what the issue was. We refactored the state and now the checkpoint is 10x faster. On Mon, Jun 14, 2021 at 5:47 AM Yun Gao wrote: > Hi Dan, > > Flink should already have integrate a tool in the web UI to monitor >

Re: How to deal with apps with backpressure but with "good" performance

2021-06-16 Thread JING ZHANG
Hi Jason, If you see a back pressure warning for a task, this means it is producing data faster than the downstream operators can consume. We should avoid high back pressure in online jobs because it may lead to the following problems: 1. there are potential performance bottlenecks and may cause

Re:Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread 东东
单机standalone,还是Docker/K8s ? 这个异常出现的时机,与周期性的,还是跟CPU、内存,乃至网络流量变化相关? 在 2021-06-16 19:10:24,"yidan zhao" 写道: >Hi, yingjie. >If the network is not stable, which config parameter I should adjust. > >yidan zhao 于2021年6月16日周三 下午6:56写道: >> >> 2: I use G1, and no full gc occurred, young gc count:

Re: Re: Re: Upgrade job topology in checkpoint

2021-06-16 Thread Yun Gao
Hi Padarn, From the current description it seems to me that the issue does not related to the state ? I think we may first check if the operator logic is right and whether the precedent tasks have indeed emitted records to the new sink. Best, Yun --Original Mail

Reasonable heap space usage of flink jobs

2021-06-16 Thread todd5...@163.com
For flink1.12.1 version, set taskmanager.memory.process.size: 1024m. When running, Heap Maximum: 146M, Non-Heap Maximum: 744 MB, Heap usage rate is about 10%-30%. What is the reasonable Heap usage rate? So as to do further resource optimization. -- Sent from:

退訂

2021-06-16 Thread Chongaih Hau
郵箱更換,退訂 Regards, Hau ChongAih

Re:Re: flink sql cdc做计算后, elastic search connector 多并发sink 会有问题,偶现!数据量较大,目前还没较好复现。请问有人是否也遇到过类似的问题。

2021-06-16 Thread casel.chen
Flink CDC什么时候能够支持修改并行度,进行细粒度的资源控制?目前我也遇到flink sql cdc写mysql遇到数据同步跟不上数据写入速度问题,何时能支持像mysql并行复制这种机制呢? 在 2021-06-16 17:27:14,"Leonard Xu" 写道: >看起来和 Flink-CDC 关系不大,看异常栈是 ES 侧抛出的异常 version_conflict_engine_exception, >可以查下这个异常,看下是不是有写(其他作业/业务 也在写同步表)冲突。 > >祝好, >Leonard > >> 在

邮件退订

2021-06-16 Thread wangweigu...@stevegame.cn
邮箱变更,退订!

Handling Large Broadcast States

2021-06-16 Thread Rion Williams
Hey Flink folks, I was discussing the use of the Broadcast Pattern with some colleagues today for a potential enrichment use-case and noticed that it wasn’t currently backed by RocksDB. This seems to indicate that it would be solely limited to the memory allocated, which might not support a

How to deal with apps with backpressure but with "good" performance

2021-06-16 Thread Jason Liu
Hi all, We are running Flink on AWS Kinesis Data Analytics and lately. After the Flink 1.11 upgrades, we have noticed some of our apps have continuous backpressure since the Flink job starts. However, we have been running these apps for a while now and if we decrease the source parallelism to

EnvironmentInformation class logs secrets passed as JVM/CLI arguments

2021-06-16 Thread Jose Vargas
Hi, I am using Flink 1.13.1 and I noticed that the logs coming from the EnvironmentInformation class, https://github.com/apache/flink/blob/release-1.13.1/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L444-L467, log the value of secrets that are passed in as

Re: Resource Planning

2021-06-16 Thread Rommel Holmes
Hi, Xintong and Robert Thanks for the reply. The checkpoint size for our job is 10-20GB since we are doing incremental checkpointing, if we do a savepoint, it can be as big as 150GB. 1) We will try to make Flink instance bigger. 2) Thanks for the pointer, we will take a look. 3) We do have CPU

multiple jobs in same flink app

2021-06-16 Thread Qihua Yang
Hi, Does anyone know how to run multiple jobs in same flink application? I did a simple test. First job was started. I did see the log message, but I didn't see the second job was started, even I saw the log message. public void testJobs() throws Exception { StreamExecutionEnvironment env =

Re: Flink PrometheusReporter support for HTTPS

2021-06-16 Thread Austin Cawley-Edwards
Hi Ashutosh, Sorry for the delayed response + thanks Robert for the good links and idea. Alternatively, Flink on K8s is a perfect scenario for running a sidecar proxy or gateway that handles HTTPS connections. The advantage here is that you decouple managing SSL certifications + rotation from

Re: Save state on a CoGroupFunction and recover it after a failure

2021-06-16 Thread Felipe Gutierrez
I don't understand how I can save the state of a window on the RichCoGroupFunction if the events arrive on the RichCoGroupFunction.coCgroup only when the window closes. Then, upon a failure I will not recover events that were on the window. This is why I think the approach to this problem is to

Re: Save state on a CoGroupFunction and recover it after a failure

2021-06-16 Thread Felipe Gutierrez
Hi Robert, 1 - I am using Kafka010 as data source. 2 - No, I am not using any kind of ListState. That I think it must be used 3 - Good. I am going to use CheckpointedFunction. Just a follow-up question. I was reimplementing it using CoProcessFunction to save the state and trigger the window. So,

How to use onTimer() on event stream for *ProcessFunction?

2021-06-16 Thread Felipe Gutierrez
Hi community, I don't understand why that KeyedProcessFunction.onTimer() is implemented here [1] is different from here [2]. Both are KeyedProcessFunction and they aim to fire a window on event time. At [1] the events are emitted at if (timestamp == result.lastModified + 6) and the time is

Re: RocksDB CPU resource usage

2021-06-16 Thread Padarn Wilson
Thanks Robert. I think it would be easy enough to test this hypothesis by making the same comparison with some simpler state inside the aggregation window. On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, wrote: > Depending on the datatypes you are using, seeing 3x more CPU usage seems > realistic.

Re: Flink PrometheusReporter support for HTTPS

2021-06-16 Thread Robert Metzger
It seems like the PrometheusReporter doesn't support HTTPS. The Flink reporter seems to be based on the HttpServer prometheus client. I wonder if using the servlet client would allow us to add HTTPS support:

Re: Confusions and suggestions about Configuration

2021-06-16 Thread Robert Metzger
Note to others on this mailing list. This email has also been sent with the subject "Flink parameter configuration does not take effect" to this list. I replied there, let's also discuss there. On Tue, Jun 15, 2021 at 7:39 AM Jason Lee wrote: > Hi everyone, > > When I was researching and using

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Ok, I will try. Yingjie Cao 于2021年6月16日周三 下午8:00写道: > > Maybe you can try to increase taskmanager.network.retries, > taskmanager.network.netty.server.backlog and > taskmanager.network.netty.sendReceiveBufferSize. These options are useful for > our jobs. > > yidan zhao 于2021年6月16日周三 下午7:10写道:

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Ok, I will try. Yingjie Cao 于2021年6月16日周三 下午8:00写道: > > Maybe you can try to increase taskmanager.network.retries, > taskmanager.network.netty.server.backlog and > taskmanager.network.netty.sendReceiveBufferSize. These options are useful for > our jobs. > > yidan zhao 于2021年6月16日周三 下午7:10写道:

Re: Save state on a CoGroupFunction and recover it after a failure

2021-06-16 Thread Robert Metzger
Hi Felipe, Which data source are you using? > Then, in the MyCoGroupFunction there are only events of stream02 Are you storing events in your state? > Is this the case where I have to use RichCoGroupFunction and save the state by implementing the CheckpointedFunction? If you want your state

Re: TypeInfo issue with Avro SpecificRecord

2021-06-16 Thread Robert Metzger
Thanks a lot for sharing the solution on the mailing list and in the ticket. On Tue, Jun 15, 2021 at 11:52 AM Patrick Lucas wrote: > Alright, I figured it out—it's very similar to FLINK-13703, but instead of > having to do with immutable fields, it's due to use of the Avro Gradle > plugin

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Yingjie Cao
Maybe you can try to increase taskmanager.network.retries, taskmanager.network.netty.server.backlog and taskmanager.network.netty.sendReceiveBufferSize. These options are useful for our jobs. yidan zhao 于2021年6月16日周三 下午7:10写道: > Hi, yingjie. > If the network is not stable, which config

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Yingjie Cao
Maybe you can try to increase taskmanager.network.retries, taskmanager.network.netty.server.backlog and taskmanager.network.netty.sendReceiveBufferSize. These options are useful for our jobs. yidan zhao 于2021年6月16日周三 下午7:10写道: > Hi, yingjie. > If the network is not stable, which config

Re: RocksDB CPU resource usage

2021-06-16 Thread Robert Metzger
Depending on the datatypes you are using, seeing 3x more CPU usage seems realistic. Serialization can be quite expensive. See also: https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html Maybe it makes sense to optimize there a bit. On Tue, Jun 15, 2021 at 5:23 PM JING

Re: Please advise bootstrapping large state

2021-06-16 Thread Robert Metzger
Hi Marco, The DataSet API will not run out of memory, as it spills to disk if the data doesn't fit anymore. Load is distributed by partitioning data. Giving you advice depends a bit on the use-case. I would explore two major options: a) reading the data from postgres using Flink's SQL JDBC

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
I also searched many result in internet. There are some related exception like org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException, but in my case it is org.apache.flink.runtime.io.network.netty.exception.LocalTransportException. It is different in

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
I also searched many result in internet. There are some related exception like org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException, but in my case it is org.apache.flink.runtime.io.network.netty.exception.LocalTransportException. It is different in

Re: Flink SQL as DSL for flink CEP

2021-06-16 Thread JING ZHANG
Hi Dipanjan, I'm not an expert on Flink CEP, however I would like to share my immature ideas, hope it helps. Flink provides CEP library[1] which is a programmatic library which allows user define patterns based on pattern API. Besides, Flink allows user define patterns by MATCH_RECOGNIZE clause in

Re: Flink SQL as DSL for flink CEP

2021-06-16 Thread Robert Metzger
Hi Dipanjan, Using Flink SQL's MATCH_RECOGNIZE operator is certainly a good idea if you are looking for a non-programmatic way to do CEP with Flink. On Wed, Jun 16, 2021 at 6:44 AM Dipanjan Mazumder wrote: > Hi, > > Can we say that Flink SQL is kind of a DSL overlay on flink CEP , i > mean

Re: S3 + Parquet credentials issue

2021-06-16 Thread Robert Metzger
Thanks for the logs. The OK job seems to read from "s3a://test-bucket/", while the KO job reads from "s3a://bucket-test/". Could it be that you are just trying to access the wrong bucket? What I also found interesting from the KO Job TaskManager is this log message: Caused by:

Re: flink1.11.2集群出现了3种连接拒绝,导致任务失败

2021-06-16 Thread yidan zhao
mark. 我也是第一个问题,暂时无解。 chaiyi 于2021年3月22日周一 下午12:28写道: > > 你好: > 最近建立一个3台机子的flink集群,版本是 zk-3.6.2 + hadoop-3.3.0 + > flink-1.11.2。3台机制是在同一个物理机上建立的虚拟机,应该来说不会出现网络波动导致的网络拒绝,但是为什么一直会出现网络拒绝 > 项目在运行一段时间以后,短则几个小时,长则3到5天,任务就会挂掉,一共出现了一下3种异常,全是网络连接方法的,请帮忙看看,是不是flink网络配置方面有问题。 > 1. 集群之间通信连接拒绝: > 2021-03-03

Re: Resource Planning

2021-06-16 Thread Robert Metzger
Hi Thomas, My gut feeling is that you can use the available resources more efficiently. What's the size of a checkpoint for your job (you can see that from the UI)? Given that your cluster has has an aggregate of 64 * 12 = 768gb of memory available, you might be able to do everything in memory

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Hi, yingjie. If the network is not stable, which config parameter I should adjust. yidan zhao 于2021年6月16日周三 下午6:56写道: > > 2: I use G1, and no full gc occurred, young gc count: 422, time: > 142892, so it is not bad. > 3: stream job. > 4: I will try to config taskmanager.network.retries which is

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Hi, yingjie. If the network is not stable, which config parameter I should adjust. yidan zhao 于2021年6月16日周三 下午6:56写道: > > 2: I use G1, and no full gc occurred, young gc count: 422, time: > 142892, so it is not bad. > 3: stream job. > 4: I will try to config taskmanager.network.retries which is

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
2: I use G1, and no full gc occurred, young gc count: 422, time: 142892, so it is not bad. 3: stream job. 4: I will try to config taskmanager.network.retries which is default 0, and taskmanager.network.netty.client.connectTimeoutSec 's default is 120s。 5: I checked the net fd number of the

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
2: I use G1, and no full gc occurred, young gc count: 422, time: 142892, so it is not bad. 3: stream job. 4: I will try to config taskmanager.network.retries which is default 0, and taskmanager.network.netty.client.connectTimeoutSec 's default is 120s。 5: I checked the net fd number of the

flink KeyedProcessFunction ????????

2021-06-16 Thread ????
??KeyedProcessFunctionprocessElementKeyBy??processElement100

Re: Flink parameter configuration does not take effect

2021-06-16 Thread Robert Metzger
Hi Jason, How are you deploying your Flink SQL tasks? (are you using per-job/application clusters, or a session cluster? ) I agree that the configuration management is not optimal in Flink. By default, I would recommend assuming that all configuration parameters are cluster settings, which

?????? flink sql??????????????????

2021-06-16 Thread ??????
FlinkSql WebIDE?? FlinkSQLSQL??SqlCli?? https://github.com/DataLinkDC/dlink ---- ??:

flink 作业合理的堆空间使用率

2021-06-16 Thread todd
flink1.12.1版本,设置 taskmanager.memory.process.size: 1024m。 运行时,Heap Maximum:146M,Non-Heap Maximum:744 MB,Heap 使用率大概在10%-30%之间。 想问下合理的Heap 使用率大概是多少? 从而做进一步的资源优化。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink sql平台多版本支持问题

2021-06-16 Thread todd
补充一种使用Flink api提交方式,参考:https://github.com/todd5167/flink-spark-submiter。 任务提交、状态获取继承统一的接口,上层服务在引用时,通过spi的方式进行加载即可。 缺点: - 需要对Flink client源码、类加载机制有了解。 优点: - 良好的外部集成 - 不需要额外部署服务 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink sql cdc做计算后, elastic search connector 多并发sink 会有问题,偶现!数据量较大,目前还没较好复现。请问有人是否也遇到过类似的问题。

2021-06-16 Thread Leonard Xu
看起来和 Flink-CDC 关系不大,看异常栈是 ES 侧抛出的异常 version_conflict_engine_exception, 可以查下这个异常,看下是不是有写(其他作业/业务 也在写同步表)冲突。 祝好, Leonard > 在 2021年6月16日,17:05,mokaful <649713...@qq.com> 写道: > > 相同问题,请问有处理方式吗 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/

Flink Table API 消费Kafka时设置format异常

2021-06-16 Thread wang guanglei
背景: source: kafka flink 版本:1.10 avro版本:1.10.0 代码: bsTableEnv.connect(new Kafka() .version("universal") .topic(params.get("read-topic")) .startFromEarliest() .properties(this.properties) ) .withFormat(

Re: flink sql cdc做计算后, elastic search connector 多并发sink 会有问题,偶现!数据量较大,目前还没较好复现。请问有人是否也遇到过类似的问题。

2021-06-16 Thread mokaful
相同问题,请问有处理方式吗 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Checkpoint loading failure

2021-06-16 Thread Guowei Ma
Hi Padarn Will there be these errors if the jobgraph is not modified? In addition, is this error stack all? Is it possible that other errors caused the stream to be closed? Best, Guowei On Tue, Jun 15, 2021 at 9:54 PM Padarn Wilson wrote: > Hi all, > > We have a job that has a medium size

Re: Got multiple issues when running the tutorial project "table-walkthrough" on IDEA

2021-06-16 Thread Guowei Ma
Hi, Lingfeng These job errors you posted happened when the job(`SpendReport`) was running on the IDE? According to my understanding, this document[1] & repository[2] mean that the example is to be run in docker, not in IDE. [1]

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Yingjie Cao
Hi yidan, 1. Is the network stable? 2. Is there any GC problem? 3. Is it a batch job? If so, please use sort-shuffle, see [1] for more information. 4. You may try to config these two options: taskmanager.network.retries, taskmanager.network.netty.client.connectTimeoutSec. More relevant options

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Yingjie Cao
Hi yidan, 1. Is the network stable? 2. Is there any GC problem? 3. Is it a batch job? If so, please use sort-shuffle, see [1] for more information. 4. You may try to config these two options: taskmanager.network.retries, taskmanager.network.netty.client.connectTimeoutSec. More relevant options

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Hi, here is the text exception stack: org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: readAddress(..) failed: Connection timed out (connection to '10.35.215.18/10.35.215.18:2045') at

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Hi, here is the text exception stack: org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: readAddress(..) failed: Connection timed out (connection to '10.35.215.18/10.35.215.18:2045') at

flink cdc对接多主mysql集群要怎么配置

2021-06-16 Thread kingdomad
flink cdc对接多主的mysql集群会报错如下,请问要怎么配置。感谢各位大佬。 2021-06-16 16:26:46 ERROR [blc-centos7-01:3306] io.debezium.connector.mysql.BinlogReader:864 - Encountered change event 'Event{header=EventHeaderV4{timestamp=1623829662000, eventType=TABLE_MAP, serverId=2, headerLength=19, dataLength=97,

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Robert Metzger
Hi Yidan, it seems that the attachment did not make it through the mailing list. Can you copy-paste the text of the exception here or upload the log somewhere? On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote: > Attachment is the exception stack from flink's web-ui. Does anyone > have also

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Robert Metzger
Hi Yidan, it seems that the attachment did not make it through the mailing list. Can you copy-paste the text of the exception here or upload the log somewhere? On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote: > Attachment is the exception stack from flink's web-ui. Does anyone > have also

flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Attachment is the exception stack from flink's web-ui. Does anyone have also met this problem? Flink1.12 - Flink1.13.1. Standalone Cluster, include 30 containers, each 28G mem.

Re: Re: Re: Re: Re: 关于反压的问题

2021-06-16 Thread yidan zhao
@东东 帮忙分析个其他异常吧。异常如下图所示,我是standalone集群,每天一会一个报错,目前阶段是这个报错。 yidan zhao 于2021年6月16日周三 下午3:32写道: > > 嗯,你这个说法我同意。 > > 东东 于2021年6月16日周三 下午2:34写道: > > > > 就你这个例子,只要buffer足够大,B在6min产生的数据都能放进buffer里,B就感受不到反压的影响,可以继续处理上游的数据啊,只要下一个窗口触发之前,C能处理完buffer中的数据,那么B全程都不会被限制。buffer在send和receive两端都是有的,B只关心自己的send >

Re: Re: Re: Re: Re: 关于反压的问题

2021-06-16 Thread yidan zhao
嗯,你这个说法我同意。 东东 于2021年6月16日周三 下午2:34写道: > > 就你这个例子,只要buffer足够大,B在6min产生的数据都能放进buffer里,B就感受不到反压的影响,可以继续处理上游的数据啊,只要下一个窗口触发之前,C能处理完buffer中的数据,那么B全程都不会被限制。buffer在send和receive两端都是有的,B只关心自己的send > buffer还能不能写进去。 > > > 在 2021-06-16 13:32:52,"yidan zhao" 写道: >

Re:Re: Re: Re: Re: 关于反压的问题

2021-06-16 Thread 东东
就你这个例子,只要buffer足够大,B在6min产生的数据都能放进buffer里,B就感受不到反压的影响,可以继续处理上游的数据啊,只要下一个窗口触发之前,C能处理完buffer中的数据,那么B全程都不会被限制。buffer在send和receive两端都是有的,B只关心自己的send buffer还能不能写进去。 在 2021-06-16 13:32:52,"yidan zhao" 写道: