Re:Re: cumulate 不能和比较函数连用

2021-08-10 文章 李航飞
org.apache.flink.client.deployment.application.DetachedApplicationRunner [] - 
Could not execute application: 
org.apache.flink.client.program.ProgramInvocationException: The main 
method caused an error: Currently Flink doesn't support individual window 
table-valued function CUMULATE(time_col=[ts], max_size=[10 min], step=[1 
min]). Please use window table-valued function with aggregate together 
using window_start and window_end as group keys. at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
 ~[flink-clients_2.12-1.13.1.jar:1.13.1]at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
 ~[flink-clients_2.12-1.13.1.jar:1.13.1] at 
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) 
~[flink-clients_2.12-1.13.1.jar:1.13.1]at 
org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)
 ~
在 2021-08-11 12:44:38,"Caizhi Weng"  写道:
>Hi!
>
>descriptor 拼错了吧。我在本地没有复现这个问题,Flink 版本是多少呢?
>
>李航飞  于2021年8月11日周三 上午11:41写道:
>
>> sql语句如下:
>> select count(clicknum) as num
>>
>> from table(
>>
>> cumulate(table testTable, desctiptor(crtTime),interval '1'minutes,
>> interval '10' minutes))
>>
>> where clicknum <>'-99'
>>
>> group by window_start,window_end
>>
>>
>> 报错 信息:
>> Flink doesn't support individual window table-valued function
>> cumulate(time_col=[app_date],max_size=[10 min],step=[1 min]...
>>
>>
>> 请问如何解决,谢谢


Re: cumulate 不能和比较函数连用

2021-08-10 文章 Caizhi Weng
Hi!

descriptor 拼错了吧。我在本地没有复现这个问题,Flink 版本是多少呢?

李航飞  于2021年8月11日周三 上午11:41写道:

> sql语句如下:
> select count(clicknum) as num
>
> from table(
>
> cumulate(table testTable, desctiptor(crtTime),interval '1'minutes,
> interval '10' minutes))
>
> where clicknum <>'-99'
>
> group by window_start,window_end
>
>
> 报错 信息:
> Flink doesn't support individual window table-valued function
> cumulate(time_col=[app_date],max_size=[10 min],step=[1 min]...
>
>
> 请问如何解决,谢谢


????

2021-08-10 文章 Lee2097


cumulate 不能和比较函数连用

2021-08-10 文章 李航飞
sql语句如下:
select count(clicknum) as num 

from table(

cumulate(table testTable, desctiptor(crtTime),interval '1'minutes, interval 
'10' minutes))

where clicknum <>'-99'

group by window_start,window_end


报错 信息:
Flink doesn't support individual window table-valued function 
cumulate(time_col=[app_date],max_size=[10 min],step=[1 min]...


请问如何解决,谢谢

退订

2021-08-10 文章 bo.zhang
退订






 


退订

2021-08-10 文章 xiaow...@cjhxfund.com
退订


xiaow...@cjhxfund.com



本邮件及其附件可能含有特权或未公开信息(含保密信息)。如并非预期收件人,您于此获得通知绝对不得披露、复制、传播或使用本邮件所含任何信息,并请立即联系发件人并彻底删毁本邮件(含备份)。虽然本邮件及其附件据信并无有可能影响对其进行接收并打开的计算机系统的病毒或其他瑕疵,但收件人仍应自行负责确保其无病毒。创金合信基金管理有限公司及其关联公司并不对以任何方式使用本邮件所造成的损失或损害承担责任。
This email and any attachments may contain information that is privileged, 
undisclosed or confidential. If you are not the intended recipient, you are 
hereby notified that any disclosure, copying, distribution, or use of the 
information contained herein is STRICTLY PROHIBITED, please immediately contact 
the sender and destroy the material in its entirety. Although this email and 
any attachments are believed to be free of any virus or other defect that might 
affect any computer system into which it is received and opened, it is the 
responsibility of the recipient to ensure that it is virus free. No 
responsibility is accepted by TruValue Asset Management Co., Ltd.and its 
affiliates for any loss or damage arising in any way from its use.


Re:Re:Re:Re: Over窗口聚合性能调优问题

2021-08-10 文章 Michael Ran
图看不到,建议你展开算子,看看背压在什么地方



在 2021-08-10 20:27:51,"Wanghui (HiCampus)"  写道:

单个来看,GC并非很频繁,但是背压是HIGH

 

-邮件原件-
发件人: Michael Ran [mailto:greemqq...@163.com]
发送时间: 2021年8月10日 20:24
收件人: user-zh@flink.apache.org
主题: Re:Re: Over窗口聚合性能调优问题

 

看看GC 情况呢,后端写入速度呢? 有背压吗?

在 2021-07-30 19:44:19,"Tianwang Li"  写道:

>(小的)tumbling window + (大的)over window

>这样会不会好一些。

> 

> 

>Wanghui (HiCampus)  于2021年7月30日周五下午3:17写道:

> 

>> Hi all:

>>我在测试Over窗口时,当窗口是5秒~15s级别时,处理速度能够达到2000/s。

>> 但是当我把窗口调整为10分钟以上时,处理速度从2000开始急速下降,几分钟后下降至230/s。

>> 请问下:

>>Over窗口的性能该如何优化,因为我后续会将窗口调整为24小时,按照目前的情况来看,性能会下降很快。

>>我的测试节点配置:8C + 16G

>>Flink配置: taskmanager process memory: 8G Best regards WangHui

>> 

>> 

> 

>--

>**

> tivanli

>**

Re:

2021-08-10 文章 Caizhi Weng
Hi!

超时的原因可能有特别多。但从你的描述来看,可能是因为并发度增加导致的资源紧张。是否观察过 gc log 看看有没有长时间的 full
gc?另外也可以在某一个 tm 上一次心跳特别长的时候 jstack 看一下栈,都能帮助分析原因。

Chenyu Zheng  于2021年8月10日周二 下午7:13写道:

> Hi 开发者,
>
>
> 我正尝试在k8s上部署flink集群,但是当我将并行度调的比较大(128)时,会经常遇到Jobmanager/Taskmanager的各种超时错误,然后我的任务会被自动取消。
>
> 我确定这不是一个网络问题,因为:
>
>   *   在32/64并行度从没有出现过这个问题,但是在128并行度,每次运行都会出现这个错误
>   *   我们的flink是部署在生产环境的k8s集群中,没有其他容器反馈遇到了网络问题
>   *   将heartbeat.timeout调大(300s)可以解决这个问题
>
> 我的flink环境:
> ·Flink 1.12.5 with java8, scala 2.11
> ·Jobmanager Start command: $JAVA_HOME/bin/java -classpath
> $FLINK_CLASSPATH -Xmx15703474176 -Xms15703474176
> -XX:MaxMetaspaceSize=268435456 -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure
> -XX:+PrintGCCause -XX:+PrintHeapAtGC -XX:+PrintSafepointStatistics
> -XX:PrintSafepointStatisticsCount=1
> -Dlog.file=/opt/flink/log/jobmanager.log
> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
> -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties
> org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
> -D jobmanager.memory.off-heap.size=134217728b -D
> jobmanager.memory.jvm-overhead.min=1073741824b -D
> jobmanager.memory.jvm-metaspace.size=268435456b -D
> jobmanager.memory.heap.size=15703474176b -D
> jobmanager.memory.jvm-overhead.max=1073741824b
> ·Taskmanager Start command: $JAVA_HOME/bin/java -classpath
> $FLINK_CLASSPATH -Xmx1664299798 -Xms1664299798
> -XX:MaxDirectMemorySize=493921243 -XX:MaxMetaspaceSize=268435456
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure -XX:+PrintGCCause
> -XX:+PrintHeapAtGC -XX:+PrintSafepointStatistics
> -XX:PrintSafepointStatisticsCount=1
> -Dlog.file=/opt/flink/log/taskmanager.log
> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
> -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties
> org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner -D
> taskmanager.memory.framework.off-heap.size=134217728b -D
> taskmanager.memory.network.max=359703515b -D
> taskmanager.memory.network.min=359703515b -D
> taskmanager.memory.framework.heap.size=134217728b -D
> taskmanager.memory.managed.size=1438814063b -D taskmanager.cpu.cores=1.0 -D
> taskmanager.memory.task.heap.size=1530082070b -D
> taskmanager.memory.task.off-heap.size=0b -D
> taskmanager.memory.jvm-metaspace.size=268435456b -D
> taskmanager.memory.jvm-overhead.max=429496736b -D
> taskmanager.memory.jvm-overhead.min=429496736b --configDir /opt/flink/conf
> -Djobmanager.rpc.address='10.50.132.154'
> -Dpipeline.classpaths='file:usrlib/flink-playground-clickcountjob-print.jar'
> -Djobmanager.memory.off-heap.size='134217728b'
> -Dweb.tmpdir='/tmp/flink-web-07190d10-c6ea-4b1a-9eee-b2d0b2711a76'
> -Drest.address='10.50.132.154'
> -Djobmanager.memory.jvm-overhead.max='1073741824b'
> -Djobmanager.memory.jvm-overhead.min='1073741824b'
> -Dtaskmanager.resource-id='stream-367f634e41349f7195961cdb0c6c-taskmanager-1-17'
> -Dexecution.target='embedded'
> -Dpipeline.jars='file:/opt/flink/usrlib/flink-playground-clickcountjob-print.jar'
> -Djobmanager.memory.jvm-metaspace.size='268435456b'
> -Djobmanager.memory.heap.size='15703474176b'
>
> 请问这种超时现象是一种正确的表现吗?我应该做什么来定位这种超时现象的根源呢?
>
> 谢谢!
>
> Chenyu
>


Re:Re:Re: Over窗口聚合性能调优问题

2021-08-10 文章 Wanghui (HiCampus)
单个来看,GC并非很频繁,但是背压是HIGH

[cid:image001.png@01D78E26.2F8FBE90]



-邮件原件-
发件人: Michael Ran [mailto:greemqq...@163.com]
发送时间: 2021年8月10日 20:24
收件人: user-zh@flink.apache.org
主题: Re:Re: Over窗口聚合性能调优问题



看看GC 情况呢,后端写入速度呢? 有背压吗?

在 2021-07-30 19:44:19,"Tianwang Li" 
mailto:litianw...@gmail.com>> 写道:

>(小的)tumbling window + (大的)over window

>这样会不会好一些。

>

>

>Wanghui (HiCampus) mailto:wanghu...@huawei.com>> 
>于2021年7月30日周五 下午3:17写道:

>

>> Hi all:

>>我在测试Over窗口时,当窗口是5秒~15s级别时,处理速度能够达到2000/s。

>> 但是当我把窗口调整为10分钟以上时,处理速度从2000开始急速下降,几分钟后下降至230/s。

>> 请问下:

>>Over窗口的性能该如何优化,因为我后续会将窗口调整为24小时,按照目前的情况来看,性能会下降很快。

>>我的测试节点配置:8C + 16G

>>Flink配置: taskmanager process memory: 8G Best regards WangHui

>>

>>

>

>--

>**

> tivanli

>**


Re:Re: Over窗口聚合性能调优问题

2021-08-10 文章 Michael Ran
看看GC 情况呢,后端写入速度呢? 有背压吗?
在 2021-07-30 19:44:19,"Tianwang Li"  写道:
>(小的)tumbling window + (大的)over window
>这样会不会好一些。
>
>
>Wanghui (HiCampus)  于2021年7月30日周五 下午3:17写道:
>
>> Hi all:
>>我在测试Over窗口时,当窗口是5秒~15s级别时,处理速度能够达到2000/s。
>> 但是当我把窗口调整为10分钟以上时,处理速度从2000开始急速下降,几分钟后下降至230/s。
>> 请问下:
>>Over窗口的性能该如何优化,因为我后续会将窗口调整为24小时,按照目前的情况来看,性能会下降很快。
>>我的测试节点配置:8C + 16G
>>Flink配置: taskmanager process memory: 8G
>> Best regards
>> WangHui
>>
>>
>
>-- 
>**
> tivanli
>**


StreamSQL udf ????????????

2021-08-10 文章 ??????
 sql 
 1.12.0 Timestamp ?? 
UDF??@DataTypeHint  Timestamp ?? sql ?? 
timestame(3)  timestame(9)??public static 
class ToTimestamp extends ScalarFunction {
   public static final String FUNCTION_NAME = "TO_TIMESTAMP";
   private static final long serialVersionUID = -2859363577738468502L;

   public @DataTypeHint(value = "TIMESTAMP(3)", bridgedTo = 
java.sql.Timestamp.class)
   Timestamp eval(Long time) {
  return new Timestamp(time);
   }
}
org.apache.flink.table.types.utils.ClassDataTypeConverter 


Re: [ANNOUNCE] Apache Flink 1.13.2 released

2021-08-10 文章 Arvid Heise
Thank you!

On Tue, Aug 10, 2021 at 11:04 AM Jingsong Li  wrote:

> Thanks Yun Tang and everyone!
>
> Best,
> Jingsong
>
> On Tue, Aug 10, 2021 at 9:37 AM Xintong Song 
> wrote:
>
>> Thanks Yun and everyone~!
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Mon, Aug 9, 2021 at 10:14 PM Till Rohrmann 
>> wrote:
>>
>> > Thanks Yun Tang for being our release manager and the great work! Also
>> > thanks a lot to everyone who contributed to this release.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Mon, Aug 9, 2021 at 9:48 AM Yu Li  wrote:
>> >
>> >> Thanks Yun Tang for being our release manager and everyone else who
>> made
>> >> the release possible!
>> >>
>> >> Best Regards,
>> >> Yu
>> >>
>> >>
>> >> On Fri, 6 Aug 2021 at 13:52, Yun Tang  wrote:
>> >>
>> >>>
>> >>> The Apache Flink community is very happy to announce the release of
>> >>> Apache Flink 1.13.2, which is the second bugfix release for the Apache
>> >>> Flink 1.13 series.
>> >>>
>> >>> Apache Flink® is an open-source stream processing framework for
>> >>> distributed, high-performing, always-available, and accurate data
>> streaming
>> >>> applications.
>> >>>
>> >>> The release is available for download at:
>> >>> https://flink.apache.org/downloads.html
>> >>>
>> >>> Please check out the release blog post for an overview of the
>> >>> improvements for this bugfix release:
>> >>> https://flink.apache.org/news/2021/08/06/release-1.13.2.html
>> >>>
>> >>> The full release notes are available in Jira:
>> >>>
>> >>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218==12315522
>> >>>
>> >>> We would like to thank all contributors of the Apache Flink community
>> >>> who made this release possible!
>> >>>
>> >>> Regards,
>> >>> Yun Tang
>> >>>
>> >>
>>
>
>
> --
> Best, Jingsong Lee
>


Flink HIve 文件压缩报错

2021-08-10 文章 周瑞
您好:Flink 
写入Hive的时候,在压缩文件的时候有个待压缩的文件丢失了,导致Flink程序一直在不断重启,请问文件丢失是什么原因导致的,这种情况怎么能够让Flink程序正常启动
2021-08-10 19:34:19 java.io.UncheckedIOException: 
java.io.FileNotFoundException: File does not exist: 
hdfs://mycluster/user/hive/warehouse/test.db/offer_69/pt_dt=2021-8-10-72/.uncompacted-part-b2108114-b92b-4c37-b204-45f0150236f4-0-3
   at 
org.apache.flink.table.filesystem.stream.compact.CompactCoordinator.lambda$coordinate$1(CompactCoordinator.java:163)
 at 
org.apache.flink.table.runtime.util.BinPacking.pack(BinPacking.java:38)  at 
org.apache.flink.table.filesystem.stream.compact.CompactCoordinator.lambda$coordinate$2(CompactCoordinator.java:173)
 at java.util.HashMap.forEach(HashMap.java:1288) at 
org.apache.flink.table.filesystem.stream.compact.CompactCoordinator.coordinate(CompactCoordinator.java:169)
  at 
org.apache.flink.table.filesystem.stream.compact.CompactCoordinator.commitUpToCheckpoint(CompactCoordinator.java:151)
at 
org.apache.flink.table.filesystem.stream.compact.CompactCoordinator.processElement(CompactCoordinator.java:141)
  at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
  at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
 at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
   at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
  at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
  at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
  at 
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620) 
 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)   at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) at 
java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: 
File does not exist: 
hdfs://mycluster/user/hive/warehouse/test.db/offer_69/pt_dt=2021-8-10-72/.uncompacted-part-b2108114-b92b-4c37-b204-45f0150236f4-0-3
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
  at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
  at 
org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.getFileStatus(HadoopFileSystem.java:85)
at 
org.apache.flink.table.filesystem.stream.compact.CompactCoordinator.lambda$coordinate$1(CompactCoordinator.java:161)
 ... 19 more

退订

2021-08-10 文章 xg...@126.com
退订


xg...@126.com


Re: Flink 1.12.5: The heartbeat of JobManager/TaskManager with id xxx timed out

2021-08-10 文章 Chenyu Zheng
JobManager timeout error:
2021-08-10 09:58:35,350 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Sink: Print 
to Std. Out (79/128) (b498a5b17c87eb70c3da9aea93890e25) switched from DEPLOYING 
to FAILED on stream-93072a8b402f49cca9c134a6e8b4887a-taskmanager-1-121 @ 
10.50.151.120 (dataPort=46281).
java.util.concurrent.TimeoutException: The heartbeat of JobManager with id 
56ad1a5ded99f9f16ec1c786ad299159 timed out.
at 
org.apache.flink.runtime.taskexecutor.TaskExecutor$JobManagerHeartbeatListener.lambda$notifyHeartbeatTimeout$0(TaskExecutor.java:2260)
 ~[flink-dist_2.11-1.12.5.jar:1.12.5]
at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_302]
at 
org.apache.flink.runtime.taskexecutor.TaskExecutor$JobManagerHeartbeatListener.notifyHeartbeatTimeout(TaskExecutor.java:2258)
 ~[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
org.apache.flink.runtime.heartbeat.HeartbeatMonitorImpl.run(HeartbeatMonitorImpl.java:111)
 ~[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_302]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_302]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
 ~[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
 ~[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
 ~[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.12.5.jar:1.12.5]
2021-08-10 09:58:35,357 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Discarding 
the results produced by task execution b498a5b17c87eb70c3da9aea93890e25.
2021-08-10 09:58:35,362 INFO  
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
 [] - Calculating tasks to restart to recover the failed task 
6d2677a0ecc3fd8df0b72ec675edf8f4_78.
2021-08-10 09:58:35,433 INFO  
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
 [] - 512 tasks should be restarted to recover the failed task 
6d2677a0ecc3fd8df0b72ec675edf8f4_78.
2021-08-10 09:58:35,437 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job Click 
Event Count () switched from state RUNNING to 
FAILING.
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:118)
 ~[flink-dist_2.11-1.12.5.jar:1.12.5]
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:80)
 

user-zh@flink.apache.org

2021-08-10 文章 Chenyu Zheng
Hi 开发者,

我正尝试在k8s上部署flink集群,但是当我将并行度调的比较大(128)时,会经常遇到Jobmanager/Taskmanager的各种超时错误,然后我的任务会被自动取消。

我确定这不是一个网络问题,因为:

  *   在32/64并行度从没有出现过这个问题,但是在128并行度,每次运行都会出现这个错误
  *   我们的flink是部署在生产环境的k8s集群中,没有其他容器反馈遇到了网络问题
  *   将heartbeat.timeout调大(300s)可以解决这个问题

我的flink环境:
·Flink 1.12.5 with java8, scala 2.11
·Jobmanager Start command: $JAVA_HOME/bin/java -classpath 
$FLINK_CLASSPATH -Xmx15703474176 -Xms15703474176 -XX:MaxMetaspaceSize=268435456 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintPromotionFailure -XX:+PrintGCCause -XX:+PrintHeapAtGC 
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 
-Dlog.file=/opt/flink/log/jobmanager.log 
-Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 
-Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties 
org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint 
-D jobmanager.memory.off-heap.size=134217728b -D 
jobmanager.memory.jvm-overhead.min=1073741824b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=15703474176b -D 
jobmanager.memory.jvm-overhead.max=1073741824b
·Taskmanager Start command: $JAVA_HOME/bin/java -classpath 
$FLINK_CLASSPATH -Xmx1664299798 -Xms1664299798 
-XX:MaxDirectMemorySize=493921243 -XX:MaxMetaspaceSize=268435456 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintPromotionFailure -XX:+PrintGCCause -XX:+PrintHeapAtGC 
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 
-Dlog.file=/opt/flink/log/taskmanager.log 
-Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 
-Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties 
org.apache.flink.kubernetes.taskmanager.KubernetesTaskExecutorRunner -D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.network.max=359703515b -D 
taskmanager.memory.network.min=359703515b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=1438814063b -D taskmanager.cpu.cores=1.0 -D 
taskmanager.memory.task.heap.size=1530082070b -D 
taskmanager.memory.task.off-heap.size=0b -D 
taskmanager.memory.jvm-metaspace.size=268435456b -D 
taskmanager.memory.jvm-overhead.max=429496736b -D 
taskmanager.memory.jvm-overhead.min=429496736b --configDir /opt/flink/conf 
-Djobmanager.rpc.address='10.50.132.154' 
-Dpipeline.classpaths='file:usrlib/flink-playground-clickcountjob-print.jar' 
-Djobmanager.memory.off-heap.size='134217728b' 
-Dweb.tmpdir='/tmp/flink-web-07190d10-c6ea-4b1a-9eee-b2d0b2711a76' 
-Drest.address='10.50.132.154' 
-Djobmanager.memory.jvm-overhead.max='1073741824b' 
-Djobmanager.memory.jvm-overhead.min='1073741824b' 
-Dtaskmanager.resource-id='stream-367f634e41349f7195961cdb0c6c-taskmanager-1-17'
 -Dexecution.target='embedded' 
-Dpipeline.jars='file:/opt/flink/usrlib/flink-playground-clickcountjob-print.jar'
 -Djobmanager.memory.jvm-metaspace.size='268435456b' 
-Djobmanager.memory.heap.size='15703474176b'

请问这种超时现象是一种正确的表现吗?我应该做什么来定位这种超时现象的根源呢?

谢谢!

Chenyu


Re: [ANNOUNCE] Apache Flink 1.13.2 released

2021-08-10 文章 Jingsong Li
Thanks Yun Tang and everyone!

Best,
Jingsong

On Tue, Aug 10, 2021 at 9:37 AM Xintong Song  wrote:

> Thanks Yun and everyone~!
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Aug 9, 2021 at 10:14 PM Till Rohrmann 
> wrote:
>
> > Thanks Yun Tang for being our release manager and the great work! Also
> > thanks a lot to everyone who contributed to this release.
> >
> > Cheers,
> > Till
> >
> > On Mon, Aug 9, 2021 at 9:48 AM Yu Li  wrote:
> >
> >> Thanks Yun Tang for being our release manager and everyone else who made
> >> the release possible!
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >> On Fri, 6 Aug 2021 at 13:52, Yun Tang  wrote:
> >>
> >>>
> >>> The Apache Flink community is very happy to announce the release of
> >>> Apache Flink 1.13.2, which is the second bugfix release for the Apache
> >>> Flink 1.13 series.
> >>>
> >>> Apache Flink® is an open-source stream processing framework for
> >>> distributed, high-performing, always-available, and accurate data
> streaming
> >>> applications.
> >>>
> >>> The release is available for download at:
> >>> https://flink.apache.org/downloads.html
> >>>
> >>> Please check out the release blog post for an overview of the
> >>> improvements for this bugfix release:
> >>> https://flink.apache.org/news/2021/08/06/release-1.13.2.html
> >>>
> >>> The full release notes are available in Jira:
> >>>
> >>>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218==12315522
> >>>
> >>> We would like to thank all contributors of the Apache Flink community
> >>> who made this release possible!
> >>>
> >>> Regards,
> >>> Yun Tang
> >>>
> >>
>


-- 
Best, Jingsong Lee


退订

2021-08-10 文章 xg...@126.com
退订