Re: Restoring from a checkpoint or savepoint on a different Kafka consumer group

2021-01-17 Thread 赵一旦
If you changed the consumer group in your new job, the group id will be the new one you set. The job will continue to consumer the topics from the savepoint/checkpoint you specified no matter whether the group id is the original one? Rex Fenley 于2021年1月18日周一 下午12:53写道: > Hello, > > When using

Re: flink 设置broadcastStream 的MapStateDescriptor

2021-01-17 Thread 赵一旦
key和value都是你自己设置的,看你需要设置什么类型哈。这个不是强制的。 你的map state的key和value在具体业务场景下需要什么类型,那个地方就设置什么类型的TypeInformation,懂吧。 smq <374060...@qq.com> 于2021年1月18日周一 下午12:18写道: > 发自我的iPhone > > > -- 原始邮件 -- > 发件人: 明启 孙 <374060...@qq.com > 发送时间: 2021年1月18日 11:30 > 收件人: user-zh 主题:

flink1.10.1 merge??????????scala??????????????????flink??bug??

2021-01-17 Thread bigdata
?? org.apache.flink.table.planner.codegen.CodeGenException: No matching merge method found for AggregateFunction com.autoai.cns.udaf.PercentileUDAF'mergescala??java?? flink1.10.1

Re: Flink Application cluster/standalone job: some JVM Options added to Program Arguments

2021-01-17 Thread Matthias Pohl
Hi Alexey, thanks for reaching out to the Flink community. I'm not 100% sure whether you have an actual issue or whether it's just the changed behavior you are confused about. The change you're describing was introduced in Flink 1.12 as part of the work on FLIP-104 [1] exposing the actual memory

Re: flink滑动窗口输出结果的问题

2021-01-17 Thread 赵一旦
我看你还写到 “每分钟触发统计一次结果”,你是不是做了自定义trigger啥的,导致逻辑不对了。 默认情况就可以实现你要的效果,不要自定义trigger哈这里。 赵一旦 于2021年1月18日周一 下午3:52写道: > 补充下,是针对每个key,每1min输出一个结果。采用1h窗口,1min滑动窗口。 > > > > 赵一旦 于2021年1月18日周一 下午3:51写道: > >> 你貌似没懂我的意思。我的意思是你的需求和sliding window的效果应该就是一样的。 >> 不是很清楚你表达的最早什么的是什么含义。 >> >> 基于event time做sliding

Re: flink滑动窗口输出结果的问题

2021-01-17 Thread 赵一旦
补充下,是针对每个key,每1min输出一个结果。采用1h窗口,1min滑动窗口。 赵一旦 于2021年1月18日周一 下午3:51写道: > 你貌似没懂我的意思。我的意思是你的需求和sliding window的效果应该就是一样的。 > 不是很清楚你表达的最早什么的是什么含义。 > > 基于event time做sliding window,只要你没设置其他东西,就应该是1min输出一个结果而已哈。 > > eriendeng 于2021年1月18日周一 上午11:42写道: > >>

Re: flink滑动窗口输出结果的问题

2021-01-17 Thread 赵一旦
你貌似没懂我的意思。我的意思是你的需求和sliding window的效果应该就是一样的。 不是很清楚你表达的最早什么的是什么含义。 基于event time做sliding window,只要你没设置其他东西,就应该是1min输出一个结果而已哈。 eriendeng 于2021年1月18日周一 上午11:42写道: > 只要sliding出来才满足你每分钟执行的滑动要求吧?至于你只要最早/新的某一条,你可以从window groupby得出的流再次group > by然后再用window时间筛选你要的数据。 > > > > -- > Sent from:

Re: CEP Test cases example

2021-01-17 Thread Aeden Jameson
This may help you out. https://github.com/apache/flink/blob/master/flink-libraries/flink-cep/src/test/java/org/apache/flink/cep/CEPITCase.java On Sun, Jan 17, 2021 at 10:32 AM narasimha wrote: > > Hi, > > I'm using Flink CEP, but couldn't find any examples for writing test cases > for the

??????????: flink sql hop????????udaf????????????merge??????????????????????

2021-01-17 Thread bigdata
?? class PercentileUDAF extends AggregateFunction[String, ListBuffer[Float]]{ // val percentile1 = 0.5 val percentile2 = 0.75 val percentile3 = 0.98 val percentile4 = 0.99 override def getValue(accumulator: ListBuffer[Float]): String = { // val

Re: flink sql 消费kafka 消息 写入Hive不提交分区

2021-01-17 Thread 花乞丐
多谢各位的耐心回答,我已经找到问题了,目前是水印使用有点问题,是我自己的问题,不好意思各位 修改之后,发现还是无法提交分区数据,经调试发现,watermark值目前是ok,但是其次是,由于Flink的toMills方法使用的UTC时间,导致我们从分区提取值时,比原始值大了8个小时,因此,导致水印一直小于 partition_time+commitDelay。接下来进行相应处理即可。

????: flink sql hop????????udaf????????????merge??????????????????????

2021-01-17 Thread Evan
?? ?? bigdata ?? 2021-01-18 14:52 user-zh ?? flink sql hopudafmerge?? ?? flink1.10.1

flink sql hop????????udaf????????????merge??????????????????????

2021-01-17 Thread bigdata
?? flink1.10.1 sql??hop??udaf??merge??merge?? org.apache.flink.table.planner.codegen.CodeGenException: No matching merge method found for AggregateFunction com.autoai.cns.udaf.PercentileUDAF' merge ??

Restoring from a checkpoint or savepoint on a different Kafka consumer group

2021-01-17 Thread Rex Fenley
Hello, When using the Kafka consumer connector, if we restore a from a checkpoint or savepoint using a differently named consumer group than the one we originally ran a job with will it still pick up exactly where it left off or are you locked into using the same consumer group as before?

Re: Number of parallel connections for Elasticsearch Connector

2021-01-17 Thread Rex Fenley
Great, thanks! On Sun, Jan 17, 2021 at 6:24 PM Yangze Guo wrote: > Hi, Rex. > > > How many connections does the ES connector use to write to Elasticsearch? > I think the number is equal to your parallelism. Each subtask of an > Elasticsearch sink will have its own separate Bulk Processor as

Re: 回复: flink sql 消费kafka 消息 写入Hive不提交分区

2021-01-17 Thread Shengkai Fang
这种情况一般是kafka的某个分区,不存在数据,导致总体的watermark不前进。遇到这种情况一般是需要手动设置idle source[1]。但是社区的watemark push down存在一些问题[2],已经在修复了。 [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/config.html#table-exec-source-idle-timeout [2]

转发:flink 设置broadcastStream 的MapStateDescriptor

2021-01-17 Thread smq
发自我的iPhone -- 原始邮件 -- 发件人: 明启 孙 <374060...@qq.com 发送时间: 2021年1月18日 11:30 收件人: user-zh

flink sql hop????????udaf????????????merge??????????????????????

2021-01-17 Thread bigdata
?? flink1.10.1 sql??hop??udaf??merge??merge?? org.apache.flink.table.planner.codegen.CodeGenException: No matching merge method found for AggregateFunction com.autoai.cns.udaf.PercentileUDAF' merge ??

Flink SQL and checkpoints and savepoints

2021-01-17 Thread Dan Hill
How well does Flink SQL work with checkpoints and savepoints? I tried to find documentation for it in v1.11 but couldn't find it. E.g. what happens if the Flink SQL is modified between releases? New columns? Change columns? Adding joins?

Re: flink滑动窗口输出结果的问题

2021-01-17 Thread eriendeng
只要sliding出来才满足你每分钟执行的滑动要求吧?至于你只要最早/新的某一条,你可以从window groupby得出的流再次group by然后再用window时间筛选你要的数据。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: 回复: flink sql 消费kafka 消息 写入Hive不提交分区

2021-01-17 Thread 花乞丐
代码已经附上,我现在是数据已经写入hdfs,有文件生产,但是目前添加的水印无效,所以一直没有更新metastore信息,导致metastore中一直没有分区信息,必须在hive shell中执行命令:hive (ods)> msck repair table order_info。之后才可以查询到数据,经过debug发现,在分区提交的时候,需要判断水印的值比从分区提取的值+延迟时间大,才会提交分区,但是现在,水印的值一直是Long.MIN_VALUE,导致一直无法提交水印,我在代码中已经设置了水印,是不是我的水印设置姿势不对,还请指教! package

Re: Flink SQL, temporal joins and backfilling data

2021-01-17 Thread Dan Hill
Hi Timo. Sorry for the delay. I'll message this message the next time I hit this. I haven't restarted my job in 12 days. I'll check the watermarks the next time I restart. On Tue, Jan 5, 2021 at 4:47 AM Timo Walther wrote: > Hi Dan, > > are you sure that your watermarks are still correct

flink 设置broadcastStream 的MapStateDescriptor

2021-01-17 Thread 明启 孙
大家好: MapStateDescriptor (String name, TypeInformation keyTypeInfo, TypeInformation valueTypeInfo)中的第二个参数key值得是什么,其中value指的是状态值,在网上看到key大部分用的是String类型,不知道这个key是根据什么指定的。 smq

Re: 回复: flink sql 消费kafka 消息 写入Hive不提交分区

2021-01-17 Thread 花乞丐
package com.zallsteel.flink.app.log; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONObject; import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonElement; import com.google.gson.JsonParser; import com.zallsteel.flink.entity.ChangelogVO;

flink native k8s 有计划支持hostAlias配置码?

2021-01-17 Thread 高函
请问社区有计划支持native k8s模式下配置hostAlais码? 如果自己扩展的话,需要在模块中添加对应的hostAlais的配置项,并打包自定义的docker 镜像码? 谢谢~

Re: 回复:flink怎么读kafka offset

2021-01-17 Thread hoose
我理解的是这样,虽然不是从savepoint里恢复,但kafka consumer_topic有了我们之前保存的groupid,那么默认的是flink是这样设置: setStartFromGroupOffsets():默认值,从当前消费组记录的偏移量开始,接着上次的偏移量消费 这样可以理解还是从上次提交的offset开始继续消费对吧? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink滑动窗口输出结果的问题

2021-01-17 Thread marble.zh...@coinflex.com.INVALID
是的, 现在的问题是sliding会产生多个结果,而我只要输出最早的那个窗口的结果数据。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink CEP 动态加载 pattern

2021-01-17 Thread mokaful
Hi,对于动态更新cep pattern这个问题,不知道楼主现在有啥突破可以借鉴一下呢? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re:Re: flink监控

2021-01-17 Thread penguin.
那请问对于每个节点的CPU、内存使用率以及节点之间的通信量如何进行实时监控获取数据呢? 在 2021-01-18 10:15:22,"赵一旦" 写道: >slot好像只是逻辑概念,监控意义不大,没有资源隔离。 > >penguin. 于2021年1月15日周五 下午5:06写道: > >> Hi, >> flink集群中,能对TaskManager的每个TaskSlot进行监控吗?比如每个slot的cpu和内存使用率之类的指标。 >> >> >> penguin

Re: Declaring and using ROW data type in PyFlink DataStream

2021-01-17 Thread Shuiqiang Chen
Hi meneldor, Xingbo, Sorry for the late reply. Thanks a lot for Xingbo’s clarification. And according to the stacktrace of the exception, could you have a check whether the result data match the specified return type? BTW, please share your code if it’s ok, it will be of help to debug. Best,

回复: yarn Per-Job Cluster Mode提交任务时 通过cli指定的内存参数无效

2021-01-17 Thread 刘海
还有一个问题,我已经有一个job在运行了,当我再次提交一个job运行的时候输出下面这些信息,去yarn查看发现job并未启动起来,有遇到过这个现象吗? [root@cdh1 flink-1.12.0]# ./bin/flink run -d -t yarn-per-job -D jobmanager.memory.process.size=1.5GB -D taskmanager.memory.process.size=3GB -D heartbeat.timeout=180

回复: yarn Per-Job Cluster Mode提交任务时 通过cli指定的内存参数无效

2021-01-17 Thread 刘海
是我本地服务器的路径,需要在三个节点上都上传这个jar包吗? 放在 /opt/flink-1.12.0/examples目录下了 | | 刘海 | | liuha...@163.com | 签名由网易邮箱大师定制 在2021年1月18日 10:38,Yangze Guo 写道: 请问这个路径是你本地的路径么?需要client端能根据这个路径找到jar包 Best, Yangze Guo On Mon, Jan 18, 2021 at 10:34 AM 刘海 wrote: 你好 根据你的建议我试了一下 将提交命令改为: ./bin/flink run -d -t

Re: yarn Per-Job Cluster Mode提交任务时 通过cli指定的内存参数无效

2021-01-17 Thread Yangze Guo
请问这个路径是你本地的路径么?需要client端能根据这个路径找到jar包 Best, Yangze Guo On Mon, Jan 18, 2021 at 10:34 AM 刘海 wrote: > > 你好 > 根据你的建议我试了一下 > 将提交命令改为: ./bin/flink run -d -t yarn-per-job -tm 1536 -jm 3072 -D > jobmanager.memory.process.size=1.5GB -D taskmanager.memory.process.size=3GB > -D

????

2021-01-17 Thread Tent
Thanks, BR, Tent

回复: yarn Per-Job Cluster Mode提交任务时 通过cli指定的内存参数无效

2021-01-17 Thread 刘海
你好 根据你的建议我试了一下 将提交命令改为: ./bin/flink run -d -t yarn-per-job -tm 1536 -jm 3072 -D jobmanager.memory.process.size=1.5GB -D taskmanager.memory.process.size=3GB -D heartbeat.timeout=180 /opt/flink-1.12.0/examples/myProject/bi-wxfx-fqd-1.0.1.jar jar包我使用了一个绝对路径:

Re: Monitor the Flink

2021-01-17 Thread Yangze Guo
Hi, First of all, there’s no resource isolation atm between operators/tasks within a slot, except for managed memory. So, monitoring of individual tasks might be meaningless. Regarding TM/JM level cpu/memory metrics, you can refer to [1] and [2]. Regarding the traffic between tasks, you can

flink sql hop????????udaf????

2021-01-17 Thread bigdata
?? flink1.10.1 sql??hop??udaf??merge??merge?? org.apache.flink.table.planner.codegen.CodeGenException: No matching merge method found for AggregateFunction com.autoai.cns.udaf.PercentileUDAF' def merge(accumulator:

PROCTIME()函数语义问题

2021-01-17 Thread smailxie
如果PROMCTIME()函数的语义指的是机器处理record的本地时间,那为什么flink要取UTC时区的时间? -- Name:谢波 Mobile:13764228893

Re: Number of parallel connections for Elasticsearch Connector

2021-01-17 Thread Yangze Guo
Hi, Rex. > How many connections does the ES connector use to write to Elasticsearch? I think the number is equal to your parallelism. Each subtask of an Elasticsearch sink will have its own separate Bulk Processor as both the Client and the Bulk Processor are class private[1]. The subtasks will

Re: flink滑动窗口输出结果的问题

2021-01-17 Thread 赵一旦
从你的描述来看,你说的貌似就是sliding window呀。 9-10,9.01-10.01... marble.zh...@coinflex.com.INVALID 于2021年1月15日周五 下午5:45写道: > 大家好, 我现在的场景需求是,窗口size是1个小时,每分钟触发统计一次结果。比如现在是10点钟,则统计9点到10点的结果。 > 下一分钟则在10:01分时触发统计9:01到10:01的结果。 > > 如果用Sliding window, 比如.timeWindow(Time.hours(1L), Time.minutes(1)), >

Re: flink监控

2021-01-17 Thread 赵一旦
slot好像只是逻辑概念,监控意义不大,没有资源隔离。 penguin. 于2021年1月15日周五 下午5:06写道: > Hi, > flink集群中,能对TaskManager的每个TaskSlot进行监控吗?比如每个slot的cpu和内存使用率之类的指标。 > > > penguin

Re: yarn Per-Job Cluster Mode提交任务时 通过cli指定的内存参数无效

2021-01-17 Thread Yangze Guo
Hi, 请使用 -D -tm -jm 不需要加y前缀 Best, Yangze Guo Best, Yangze Guo On Mon, Jan 18, 2021 at 9:19 AM 刘海 wrote: > > > 刘海 > liuha...@163.com > 签名由 网易邮箱大师 定制 > 在2021年1月18日 09:15,刘海 写道: > > Hi Dear All, >请教各位一个问题,下面是我的集群配置: > 1、我现在使用的是flink1.12版本; > 2、基于CDH6.3.2搭建的hadoop三个节点的集群,使用CDH自带的yarn集群; >

Re: flink1.12.0 HA on k8s native运行一段时间后jobmanager-leader产生大量ConfigMap问题

2021-01-17 Thread macdoor
您好,我刚刚开始使用 flink 1.12.1 HA on k8s,发现jobmanager大约半小时左右会restart,都是这种错误,您遇到过吗?谢谢! 2021-01-17 04:52:12,399 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Suspending SlotPool. 2021-01-17 04:52:12,399 INFO org.apache.flink.runtime.jobmaster.JobMaster [] -

K8s HA Session模式下1.12.1 jobmanager 周期性 restart

2021-01-17 Thread macdoor
大约几十分钟就会restart,请教大佬们有查的思路,每次抛出的错误都是一样的,运行一段时间也会积累很多ConfigMap,下面是一个具体的错误 错误内容 2021-01-17 04:16:46,116 ERROR org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Fatal error occurred in ResourceManager. org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error

回复:yarn Per-Job Cluster Mode提交任务时 通过cli指定的内存参数无效

2021-01-17 Thread 刘海
| | 刘海 | | liuha...@163.com | 签名由网易邮箱大师定制 在2021年1月18日 09:15,刘海 写道: Hi Dear All, 请教各位一个问题,下面是我的集群配置: 1、我现在使用的是flink1.12版本; 2、基于CDH6.3.2搭建的hadoop三个节点的集群,使用CDH自带的yarn集群; 3、flink运行模式:Per-Job Cluster on yarn(三个节点,没每个节点48核64G内存); 4、以下是我三个节点的 flink-conf.yaml

yarn Per-Job Cluster Mode提交任务时 通过cli指定的内存参数无效

2021-01-17 Thread 刘海
Hi Dear All, 请教各位一个问题,下面是我的集群配置: 1、我现在使用的是flink1.12版本; 2、基于CDH6.3.2搭建的hadoop三个节点的集群,使用CDH自带的yarn集群; 3、flink运行模式:Per-Job Cluster on yarn(三个节点,没每个节点48核64G内存); 4、以下是我三个节点的 flink-conf.yaml 的配置,三个flink节点除了jobmanager.rpc.address不同外其它配置都一样:

????: flink sql hop????udaf????

2021-01-17 Thread Evan
merge??marge Evan Cheng 2021??1??18??09:00:07 bigdata ?? 2021-01-17 22:31 user-zh ?? flink sql hopudaf ?? flink1.10.1sql

Re: Using ClusterIP with KubernetesHAServicesFactory

2021-01-17 Thread Kevin Kwon
Ok it seems that this check is ran by the K8S CLI which in my case runs in a CICD cluster If this check should happen, I'd like to override this value with the ingress address Is there a way I can override the rest address that the K8S CLI taps on? On Fri, Jan 15, 2021 at 7:55 PM Kevin Kwon

CEP Test cases example

2021-01-17 Thread narasimha
Hi, I'm using Flink CEP, but couldn't find any examples for writing test cases for the streams with CEP. Can someone help on how to write test cases for streams with CEP applied on it? -- A.Narasimha Swamy

flink sql hop????udaf????

2021-01-17 Thread bigdata
?? flink1.10.1sql hop??udafmarge?? org.apache.flink.table.api.ValidationException: Function class 'com.autoai.cns.udaf.PercentileUDAF' does not implement at least one method named 'merge' which is public, not abstract and (in case of table

Resource changed on src filesystem after upgrade

2021-01-17 Thread Mark Davis
Hi all, I am upgrading my DataSet jobs from Flink 1.8 to 1.12. After the upgrade I started to receive the errors like this one: 14:12:57,441 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager - Worker container_e120_1608377880203_0751_01_000112 is terminated.

flink sql hop????udaf????

2021-01-17 Thread bigdata
?? flink1.10.1sql hop??udafmarge?? org.apache.flink.table.api.ValidationException: Function class 'com.autoai.cns.udaf.PercentileUDAF' does not implement at least one method named 'merge' which is public, not abstract and (in case of table

[Announce] Zeppelin 0.9.0 is released (Flink on Zeppelin)

2021-01-17 Thread Jeff Zhang
Hi flink folks, I'd like to tell you that Zeppelin 0.9.0 is officially released, in this new version we made a big refactoring and improvement on flink support. It supports 3 major versions of flink (1.10, 1.11, 1.12) You can download zeppelin 0.9.0 here. http://zeppelin.apache.org/download.html

[Announce] Zeppelin 0.9.0 is released (Flink on Zeppelin)

2021-01-17 Thread Jeff Zhang
Hi flink folks, I'd like to tell you that Zeppelin 0.9.0 is officially released, in this new version we made a big refactoring and improvement on flink support. It supports 3 major versions of flink (1.10, 1.11, 1.12) You can download zeppelin 0.9.0 here. http://zeppelin.apache.org/download.html

Monitor the Flink

2021-01-17 Thread penguin.
Hello, In the Flink cluster, How to monitor each taskslot of taskmanager? For example, the CPU and memory usage of each slot and the traffic between slots. What is the way to get the traffic between nodes? thank you very much! penguin

flink集群监控

2021-01-17 Thread penguin.
Hello, 请问在flink集群中, 怎么对TaskManager的每个TaskSlot进行监控呢?比如每个slot的cpu和内存使用率以及slot通信量之类的指标。 有什么办法来获取节点间的通信量呢? 多谢! penguin

Re:Re: Re: 请教个Flink checkpoint的问题

2021-01-17 Thread 邮件帮助中心
1:先前测试过使用stopWithSavepoint时会将之前成功保存的checkpoint数据给删除掉,后来我们查看了下源码,里面描述如下,就是调用该方法时Flink会将程序设置成Finished态的,可能和实际使用场景有出入。 /** * Stops a program on Flink cluster whose job-manager is configured in this client's configuration. * Stopping works only for streaming programs. Be aware, that the program

??????flink??????kafka offset

2021-01-17 Thread ??????
savepoint??flink-kafka-connector?? Kafka https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/connectors/kafka.html | | ?? | | 18500348...@163.com | ?? ??2021??1??16?? 19:38?? ??

Re: Re: 请教个Flink checkpoint的问题

2021-01-17 Thread Congxian Qiu
Hi 你可以看一下是不是 作业的状态变成了 FINISH 了,现在 Flink 在 FINISH 的时候,会删除 checkpoint(就算设置了 retain on cancel 也会删除) PS :FLINK-18263 正在讨论是否在 Job 变为 FINISH 状态后保留 checkpoint [1] https://issues.apache.org/jira/browse/FLINK-18263 Best, Congxian yinghua...@163.com 于2021年1月15日周五 上午11:23写道: >