Thanks, JING ZHANG!
I have one subtask for one Kafka source that is getting backpressure. Is
there an easy way to split a single Kafka partition into multiple
subtasks? Or do I need to split the Kafka partition?
On Wed, Jun 16, 2021 at 10:29 PM JING ZHANG wrote:
> Hi Dan,
> Would you please
把其中一个改成0
在 2021-06-17 13:11:01,"yidan zhao" 写道:
>是的,宿主机IP。
>
>net.ipv4.tcp_tw_reuse = 1
>net.ipv4.tcp_timestamps = 1
>
>东东 于2021年6月17日周四 下午12:52写道:
>>
>> 10.35.215.18是宿主机IP?
>>
>> 看一下 tcp_tw_recycle和net.ipv4.tcp_timestamps是什么值
>> 实在不行就 tcpdump 吧
>>
>>
>>
>> 在 2021-06-17 12:41:58,"yidan
Thank you very much!
I tried using Flink's SQL JDBC connector, and ran into issues. According
to the flink documentation, only the old planner is compatible with the
DataSet API.
When I connect to the table:
CREATE TABLE my_table (
) WITH (
'connector.type' = 'jdbc',
'connector.url'
Hi Dan,
Would you please describe what's the problem about your job? High latency
or low throughput?
Please first check the job throughput and latency .
If the job throughput matches the speed of sources producing data and the
latency metric is good, maybe the job works well without bottlenecks.
是的,宿主机IP。
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_timestamps = 1
东东 于2021年6月17日周四 下午12:52写道:
>
> 10.35.215.18是宿主机IP?
>
> 看一下 tcp_tw_recycle和net.ipv4.tcp_timestamps是什么值
> 实在不行就 tcpdump 吧
>
>
>
> 在 2021-06-17 12:41:58,"yidan zhao" 写道:
> >@东东 standalone集群。 随机时间,一会一个的,没有固定规律。
10.35.215.18是宿主机IP?
看一下 tcp_tw_recycle和net.ipv4.tcp_timestamps是什么值
实在不行就 tcpdump 吧
在 2021-06-17 12:41:58,"yidan zhao" 写道:
>@东东 standalone集群。 随机时间,一会一个的,没有固定规律。 和CPU、内存、网络的话有一定规律,但不确认,因为不是很明显。
>我排查过几个exception,时间和网络尖刺对上了,但不全能对上,所以不好说是否有这个原因。
>
>此外,有个点我不是很清楚,网上这个报错很少,类似的都是
We have a job that has been running but none of the AWS resource metrics
for the EKS, EC2, MSK and EBS show any bottlenecks. I have multiple 8
cores allocated but only ~2 cores are used. Most of the memory is not
consumed. MSK does not show much use. EBS metrics look mostly idle. I
assumed
@东东 standalone集群。 随机时间,一会一个的,没有固定规律。 和CPU、内存、网络的话有一定规律,但不确认,因为不是很明显。
我排查过几个exception,时间和网络尖刺对上了,但不全能对上,所以不好说是否有这个原因。
此外,有个点我不是很清楚,网上这个报错很少,类似的都是
RemoteTransportException,然后提示中说taskmager可能已丢失之类。但我的是
LocalTransportException,不清楚netty中这俩错误的含义是不是不一样。目前来看网络上关于这俩异常的资料也查不到什么。
东东 于2021年6月17日周四
Hi Yun. The UI was not useful for this case. I had a feeling before hand
about what the issue was. We refactored the state and now the checkpoint
is 10x faster.
On Mon, Jun 14, 2021 at 5:47 AM Yun Gao wrote:
> Hi Dan,
>
> Flink should already have integrate a tool in the web UI to monitor
>
Hi Jason,
If you see a back pressure warning for a task, this means it is producing
data faster than the downstream operators can consume.
We should avoid high back pressure in online jobs because it may lead to
the following problems:
1. there are potential performance bottlenecks and may cause
单机standalone,还是Docker/K8s ?
这个异常出现的时机,与周期性的,还是跟CPU、内存,乃至网络流量变化相关?
在 2021-06-16 19:10:24,"yidan zhao" 写道:
>Hi, yingjie.
>If the network is not stable, which config parameter I should adjust.
>
>yidan zhao 于2021年6月16日周三 下午6:56写道:
>>
>> 2: I use G1, and no full gc occurred, young gc count:
Hi Padarn,
From the current description it seems to me that the issue does not related to
the state ? I think we may first check if the operator logic is right and
whether
the precedent tasks have indeed emitted records to the new sink.
Best,
Yun
--Original Mail
For flink1.12.1 version, set taskmanager.memory.process.size: 1024m.
When running, Heap Maximum: 146M, Non-Heap Maximum: 744 MB, Heap usage rate
is about 10%-30%.
What is the reasonable Heap usage rate? So as to do further resource
optimization.
--
Sent from:
郵箱更換,退訂
Regards,
Hau ChongAih
Flink CDC什么时候能够支持修改并行度,进行细粒度的资源控制?目前我也遇到flink sql
cdc写mysql遇到数据同步跟不上数据写入速度问题,何时能支持像mysql并行复制这种机制呢?
在 2021-06-16 17:27:14,"Leonard Xu" 写道:
>看起来和 Flink-CDC 关系不大,看异常栈是 ES 侧抛出的异常 version_conflict_engine_exception,
>可以查下这个异常,看下是不是有写(其他作业/业务 也在写同步表)冲突。
>
>祝好,
>Leonard
>
>> 在
邮箱变更,退订!
Hey Flink folks,
I was discussing the use of the Broadcast Pattern with some colleagues today
for a potential enrichment use-case and noticed that it wasn’t currently backed
by RocksDB. This seems to indicate that it would be solely limited to the
memory allocated, which might not support a
Hi all,
We are running Flink on AWS Kinesis Data Analytics and lately. After
the Flink 1.11 upgrades, we have noticed some of our apps have continuous
backpressure since the Flink job starts. However, we have been running
these apps for a while now and if we decrease the source parallelism to
Hi,
I am using Flink 1.13.1 and I noticed that the logs coming from the
EnvironmentInformation class,
https://github.com/apache/flink/blob/release-1.13.1/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L444-L467,
log the value of secrets that are passed in as
Hi, Xintong and Robert
Thanks for the reply.
The checkpoint size for our job is 10-20GB since we are doing incremental
checkpointing, if we do a savepoint, it can be as big as 150GB.
1) We will try to make Flink instance bigger.
2) Thanks for the pointer, we will take a look.
3) We do have CPU
Hi,
Does anyone know how to run multiple jobs in same flink application?
I did a simple test. First job was started. I did see the log message, but
I didn't see the second job was started, even I saw the log message.
public void testJobs() throws Exception {
StreamExecutionEnvironment env =
Hi Ashutosh,
Sorry for the delayed response + thanks Robert for the good links and idea.
Alternatively, Flink on K8s is a perfect scenario for running a sidecar
proxy or gateway that handles HTTPS connections. The advantage here is that
you decouple managing SSL certifications + rotation from
I don't understand how I can save the state of a window on the
RichCoGroupFunction if the events arrive on the
RichCoGroupFunction.coCgroup only when the window closes. Then, upon a
failure I will not recover events that were on the window. This is why I
think the approach to this problem is to
Hi Robert,
1 - I am using Kafka010 as data source.
2 - No, I am not using any kind of ListState. That I think it must be used
3 - Good. I am going to use CheckpointedFunction.
Just a follow-up question. I was reimplementing it using CoProcessFunction
to save the state and trigger the window. So,
Hi community,
I don't understand why that KeyedProcessFunction.onTimer() is implemented
here [1] is different from here [2]. Both are KeyedProcessFunction and they
aim to fire a window on event time. At [1] the events are emitted at if
(timestamp == result.lastModified + 6) and the time is
Thanks Robert. I think it would be easy enough to test this hypothesis by
making the same comparison with some simpler state inside the aggregation
window.
On Wed, 16 Jun 2021, 7:58 pm Robert Metzger, wrote:
> Depending on the datatypes you are using, seeing 3x more CPU usage seems
> realistic.
It seems like the PrometheusReporter doesn't support HTTPS.
The Flink reporter seems to be based on the HttpServer prometheus client. I
wonder if using the servlet client would allow us to add HTTPS support:
Note to others on this mailing list. This email has also been sent with the
subject "Flink parameter configuration does not take effect" to this list.
I replied there, let's also discuss there.
On Tue, Jun 15, 2021 at 7:39 AM Jason Lee wrote:
> Hi everyone,
>
> When I was researching and using
Ok, I will try.
Yingjie Cao 于2021年6月16日周三 下午8:00写道:
>
> Maybe you can try to increase taskmanager.network.retries,
> taskmanager.network.netty.server.backlog and
> taskmanager.network.netty.sendReceiveBufferSize. These options are useful for
> our jobs.
>
> yidan zhao 于2021年6月16日周三 下午7:10写道:
Ok, I will try.
Yingjie Cao 于2021年6月16日周三 下午8:00写道:
>
> Maybe you can try to increase taskmanager.network.retries,
> taskmanager.network.netty.server.backlog and
> taskmanager.network.netty.sendReceiveBufferSize. These options are useful for
> our jobs.
>
> yidan zhao 于2021年6月16日周三 下午7:10写道:
Hi Felipe,
Which data source are you using?
> Then, in the MyCoGroupFunction there are only events of stream02
Are you storing events in your state?
> Is this the case where I have to use RichCoGroupFunction and save the
state by implementing the CheckpointedFunction?
If you want your state
Thanks a lot for sharing the solution on the mailing list and in the
ticket.
On Tue, Jun 15, 2021 at 11:52 AM Patrick Lucas
wrote:
> Alright, I figured it out—it's very similar to FLINK-13703, but instead of
> having to do with immutable fields, it's due to use of the Avro Gradle
> plugin
Maybe you can try to
increase taskmanager.network.retries,
taskmanager.network.netty.server.backlog and
taskmanager.network.netty.sendReceiveBufferSize. These options are useful
for our jobs.
yidan zhao 于2021年6月16日周三 下午7:10写道:
> Hi, yingjie.
> If the network is not stable, which config
Maybe you can try to
increase taskmanager.network.retries,
taskmanager.network.netty.server.backlog and
taskmanager.network.netty.sendReceiveBufferSize. These options are useful
for our jobs.
yidan zhao 于2021年6月16日周三 下午7:10写道:
> Hi, yingjie.
> If the network is not stable, which config
Depending on the datatypes you are using, seeing 3x more CPU usage seems
realistic.
Serialization can be quite expensive. See also:
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
Maybe it makes sense to optimize there a bit.
On Tue, Jun 15, 2021 at 5:23 PM JING
Hi Marco,
The DataSet API will not run out of memory, as it spills to disk if the
data doesn't fit anymore.
Load is distributed by partitioning data.
Giving you advice depends a bit on the use-case. I would explore two major
options:
a) reading the data from postgres using Flink's SQL JDBC
I also searched many result in internet. There are some related
exception like
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException,
but in my case it is
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException.
It is different in
I also searched many result in internet. There are some related
exception like
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException,
but in my case it is
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException.
It is different in
Hi Dipanjan,
I'm not an expert on Flink CEP, however I would like to share my immature
ideas, hope it helps.
Flink provides CEP library[1] which is a programmatic library which allows
user define patterns based on pattern API.
Besides, Flink allows user define patterns by MATCH_RECOGNIZE clause in
Hi Dipanjan,
Using Flink SQL's MATCH_RECOGNIZE operator is certainly a good idea if you
are looking for a non-programmatic way to do CEP with Flink.
On Wed, Jun 16, 2021 at 6:44 AM Dipanjan Mazumder wrote:
> Hi,
>
> Can we say that Flink SQL is kind of a DSL overlay on flink CEP , i
> mean
Thanks for the logs.
The OK job seems to read from "s3a://test-bucket/", while the KO job reads
from "s3a://bucket-test/". Could it be that you are just trying to access
the wrong bucket?
What I also found interesting from the KO Job TaskManager is this log
message:
Caused by:
mark. 我也是第一个问题,暂时无解。
chaiyi 于2021年3月22日周一 下午12:28写道:
>
> 你好:
> 最近建立一个3台机子的flink集群,版本是 zk-3.6.2 + hadoop-3.3.0 +
> flink-1.11.2。3台机制是在同一个物理机上建立的虚拟机,应该来说不会出现网络波动导致的网络拒绝,但是为什么一直会出现网络拒绝
> 项目在运行一段时间以后,短则几个小时,长则3到5天,任务就会挂掉,一共出现了一下3种异常,全是网络连接方法的,请帮忙看看,是不是flink网络配置方面有问题。
> 1. 集群之间通信连接拒绝:
> 2021-03-03
Hi Thomas,
My gut feeling is that you can use the available resources more efficiently.
What's the size of a checkpoint for your job (you can see that from the
UI)?
Given that your cluster has has an aggregate of 64 * 12 = 768gb of memory
available, you might be able to do everything in memory
Hi, yingjie.
If the network is not stable, which config parameter I should adjust.
yidan zhao 于2021年6月16日周三 下午6:56写道:
>
> 2: I use G1, and no full gc occurred, young gc count: 422, time:
> 142892, so it is not bad.
> 3: stream job.
> 4: I will try to config taskmanager.network.retries which is
Hi, yingjie.
If the network is not stable, which config parameter I should adjust.
yidan zhao 于2021年6月16日周三 下午6:56写道:
>
> 2: I use G1, and no full gc occurred, young gc count: 422, time:
> 142892, so it is not bad.
> 3: stream job.
> 4: I will try to config taskmanager.network.retries which is
2: I use G1, and no full gc occurred, young gc count: 422, time:
142892, so it is not bad.
3: stream job.
4: I will try to config taskmanager.network.retries which is default
0, and taskmanager.network.netty.client.connectTimeoutSec 's default
is 120s。
5: I checked the net fd number of the
2: I use G1, and no full gc occurred, young gc count: 422, time:
142892, so it is not bad.
3: stream job.
4: I will try to config taskmanager.network.retries which is default
0, and taskmanager.network.netty.client.connectTimeoutSec 's default
is 120s。
5: I checked the net fd number of the
??KeyedProcessFunctionprocessElementKeyBy??processElement100
Hi Jason,
How are you deploying your Flink SQL tasks? (are you using
per-job/application clusters, or a session cluster? )
I agree that the configuration management is not optimal in Flink. By
default, I would recommend assuming that all configuration parameters are
cluster settings, which
FlinkSql WebIDE??
FlinkSQLSQL??SqlCli??
https://github.com/DataLinkDC/dlink
----
??:
flink1.12.1版本,设置 taskmanager.memory.process.size: 1024m。
运行时,Heap Maximum:146M,Non-Heap Maximum:744 MB,Heap 使用率大概在10%-30%之间。
想问下合理的Heap 使用率大概是多少? 从而做进一步的资源优化。
--
Sent from: http://apache-flink.147419.n8.nabble.com/
补充一种使用Flink api提交方式,参考:https://github.com/todd5167/flink-spark-submiter。
任务提交、状态获取继承统一的接口,上层服务在引用时,通过spi的方式进行加载即可。
缺点:
- 需要对Flink client源码、类加载机制有了解。
优点:
- 良好的外部集成
- 不需要额外部署服务
--
Sent from: http://apache-flink.147419.n8.nabble.com/
看起来和 Flink-CDC 关系不大,看异常栈是 ES 侧抛出的异常 version_conflict_engine_exception,
可以查下这个异常,看下是不是有写(其他作业/业务 也在写同步表)冲突。
祝好,
Leonard
> 在 2021年6月16日,17:05,mokaful <649713...@qq.com> 写道:
>
> 相同问题,请问有处理方式吗
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
背景:
source: kafka
flink 版本:1.10
avro版本:1.10.0
代码:
bsTableEnv.connect(new Kafka()
.version("universal")
.topic(params.get("read-topic"))
.startFromEarliest()
.properties(this.properties)
)
.withFormat(
相同问题,请问有处理方式吗
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Hi Padarn
Will there be these errors if the jobgraph is not modified?
In addition, is this error stack all? Is it possible that other errors
caused the stream to be closed?
Best,
Guowei
On Tue, Jun 15, 2021 at 9:54 PM Padarn Wilson wrote:
> Hi all,
>
> We have a job that has a medium size
Hi, Lingfeng
These job errors you posted happened when the job(`SpendReport`) was
running on the IDE?
According to my understanding, this document[1] & repository[2] mean that
the example is to be run in docker, not in IDE.
[1]
Hi yidan,
1. Is the network stable?
2. Is there any GC problem?
3. Is it a batch job? If so, please use sort-shuffle, see [1] for more
information.
4. You may try to config these two options: taskmanager.network.retries,
taskmanager.network.netty.client.connectTimeoutSec. More relevant options
Hi yidan,
1. Is the network stable?
2. Is there any GC problem?
3. Is it a batch job? If so, please use sort-shuffle, see [1] for more
information.
4. You may try to config these two options: taskmanager.network.retries,
taskmanager.network.netty.client.connectTimeoutSec. More relevant options
Hi, here is the text exception stack:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
readAddress(..) failed: Connection timed out (connection to
'10.35.215.18/10.35.215.18:2045')
at
Hi, here is the text exception stack:
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
readAddress(..) failed: Connection timed out (connection to
'10.35.215.18/10.35.215.18:2045')
at
flink cdc对接多主的mysql集群会报错如下,请问要怎么配置。感谢各位大佬。
2021-06-16 16:26:46 ERROR [blc-centos7-01:3306]
io.debezium.connector.mysql.BinlogReader:864 - Encountered change event
'Event{header=EventHeaderV4{timestamp=1623829662000, eventType=TABLE_MAP,
serverId=2, headerLength=19, dataLength=97,
Hi Yidan,
it seems that the attachment did not make it through the mailing list. Can
you copy-paste the text of the exception here or upload the log somewhere?
On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote:
> Attachment is the exception stack from flink's web-ui. Does anyone
> have also
Hi Yidan,
it seems that the attachment did not make it through the mailing list. Can
you copy-paste the text of the exception here or upload the log somewhere?
On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote:
> Attachment is the exception stack from flink's web-ui. Does anyone
> have also
Attachment is the exception stack from flink's web-ui. Does anyone
have also met this problem?
Flink1.12 - Flink1.13.1. Standalone Cluster, include 30 containers,
each 28G mem.
@东东 帮忙分析个其他异常吧。异常如下图所示,我是standalone集群,每天一会一个报错,目前阶段是这个报错。
yidan zhao 于2021年6月16日周三 下午3:32写道:
>
> 嗯,你这个说法我同意。
>
> 东东 于2021年6月16日周三 下午2:34写道:
> >
> > 就你这个例子,只要buffer足够大,B在6min产生的数据都能放进buffer里,B就感受不到反压的影响,可以继续处理上游的数据啊,只要下一个窗口触发之前,C能处理完buffer中的数据,那么B全程都不会被限制。buffer在send和receive两端都是有的,B只关心自己的send
>
嗯,你这个说法我同意。
东东 于2021年6月16日周三 下午2:34写道:
>
> 就你这个例子,只要buffer足够大,B在6min产生的数据都能放进buffer里,B就感受不到反压的影响,可以继续处理上游的数据啊,只要下一个窗口触发之前,C能处理完buffer中的数据,那么B全程都不会被限制。buffer在send和receive两端都是有的,B只关心自己的send
> buffer还能不能写进去。
>
>
> 在 2021-06-16 13:32:52,"yidan zhao" 写道:
>
就你这个例子,只要buffer足够大,B在6min产生的数据都能放进buffer里,B就感受不到反压的影响,可以继续处理上游的数据啊,只要下一个窗口触发之前,C能处理完buffer中的数据,那么B全程都不会被限制。buffer在send和receive两端都是有的,B只关心自己的send
buffer还能不能写进去。
在 2021-06-16 13:32:52,"yidan zhao" 写道:
68 matches
Mail list logo