from:"刘建刚"

Re: 维表如何实现动态查询

2019-07-07 文章刘建刚

可以设置多长时间加载一次最新的维表，比如每1分钟加载一次。 > 在 2019年7月3日，下午12:12，雒正林写道： > > 维表（mysql）是动态变化的，与流表join 时，维表一直是第一次查询到的数据，后面维表变化的数据，在join时，查询不到。 >

Re: How to load udf jars in flink program

2019-08-15 文章刘建刚

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 > > > > ____ > 发件人: 刘建刚 > 发送时间: Thursday, August 15, 2019 5:31:33 PM > 收件人: user-zh@flink.apache.org ; > u...@flink.apache.org ; d...@flink.apache.org < > d...@flink.apache.o

Re: How to load udf jars in flink program

2019-08-15 文章刘建刚

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 > > > > ____ > 发件人: 刘建刚 > 发送时间: Thursday, August 15, 2019 5:31:33 PM > 收件人: user-zh@flink.apache.org ; > u...@flink.apache.org ; d...@flink.apache.org < > d...@flink.apache.o

How to load udf jars in flink program

2019-08-15 文章刘建刚

We are using per-job to load udf jar when start job. Our jar file is in another path but not flink's lib path. In the main function, we use classLoader to load the jar file by the jar path. But it reports the following error when job starts running. If the jar file is in lib,

How to calculate one day's uv every minute by SQL

2019-09-04 文章刘建刚

We want to calculate one day's uv and show the result every minute . We have implemented this by java code: dataStream.keyBy(dimension) .incrementWindow(Time.days(1), Time.minutes(1)) .uv(userId) The input data is big. So we use

How to implement grouping set in stream

2019-09-10 文章刘建刚

I want to implement grouping set in stream. I am new to flink sql. I want to find a example to teach me how to self define rule and implement corresponding operator. Can anyone give me any suggestion?

Uncertain result when using group by in stream sql

2019-09-13 文章刘建刚

I use flink stream sql to write a demo about "group by". The records are [(bj, 1), (bj, 3), (bj, 5)]. I group by the first element and sum the second element. Every time I run the program, the result is different. It seems that the records are out of order. Even sometimes record is

Re: Error:java: 无效的标记: --add-exports=java.base/sun.net.util=ALL-UNNAMED

2019-11-21 文章刘建刚

mg)" >> >> Hope it works for you >> Thanks. >> >> [1]. https://www.jetbrains.com/idea/download/other.html >> >> >> On Mon, Nov 4, 2019 at 5:44 PM Till Rohrmann wrote: >> >>> Try to reimport that maven project. This should re

How to use two continuously window with EventTime in sql

2019-10-29 文章刘建刚

For one sql window, I can register table with event time and use time field in the tumble window. But if I want to use the result for the first window and use another window to process it, how can I do it? Thank you.

How to estimate the memory size of flink state

2019-11-19 文章刘建刚

We are using flink 1.6.2. For filesystem backend, we want to monitor the state size in memory. Once the state size becomes bigger, we can get noticed and take measures such as rescaling the job, or the job may fail because of the memory. We have tried to get the memory usage for the

Re: How to estimate the memory size of flink state

2019-11-20 文章刘建刚

ail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=sysukelee=sysukelee%40gmail.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22sysukelee%40gmail.com%22%5D> > On 11/20/2019 15:08，刘建刚 > <mailto:liujiangangp...@gmail.com> wrote： > We are using flin

Error:java: 无效的标记: --add-exports=java.base/sun.net.util=ALL-UNNAMED

2019-11-04 文章刘建刚

Hi, I am using flink 1.9 in idea. But when I run a unit test in idea. The idea reports the following error:"Error:java: 无效的标记: --add-exports=java.base/sun.net.util=ALL-UNNAMED". Everything is ok when I use flink 1.6. I am using jdk 1.8. Is it related to the java version?

Re: java.lang.StackOverflowError

2020-01-22 文章刘建刚

多谢，已经找到解决的issue了：https://issues.apache.org/jira/browse/FLINK-10367 <https://issues.apache.org/jira/browse/FLINK-10367> > 2020年1月22日下午4:48，zhisheng 写道： > > 1、建议问题别同时发到三个邮件去 > 2、找找还有没有更加明显的异常日志 > > 刘建刚于2020年1月22日周三上午10:25写道： > >> I am using flink 1.6.2

Re: Fail to deploy flink on k8s in minikube

2020-01-12 文章刘建刚

on-cluster-on-kubernetes > > http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/ > [3] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator > [4] https://github.com/lyft/flinkk8soperator > [5] > https://ci.apache.org/projects/flink/flink-doc

Fail to deploy flink on k8s in minikube

2020-01-12 文章刘建刚

I fail to deploy flink on k8s referring to https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html When I run the command 'kubectl create -f jobmanager-deployment.yaml', following error is reported: [image: image.png] I am new to k8s. Our team want

java.lang.StackOverflowError

2020-01-21 文章刘建刚

I have a flink job which fails occasionally. I am eager to avoid this problem. Can anyone help me? The error stacktrace is as following: java.io.IOException: java.lang.StackOverflowError at

Re: java.lang.StackOverflowError

2020-01-21 文章刘建刚

I am using flink 1.6.2 on yarn. State backend is rocksdb. > 2020年1月22日上午10:15，刘建刚写道： > > I have a flink job which fails occasionally. I am eager to avoid this > problem. Can anyone help me? The error stacktrace is as following: > java.io.IOException: java.lang.Sta

How to get kafka record's timestamp in job

2019-12-31 文章刘建刚

In kafka010, ConsumerRecord has a field named timestamp. It is encapsulated in Kafka010Fetcher. How can I get the timestamp when I write a flink job? Thank you very much.

Re: flinksql如何控制结果输出的频率

2020-04-15 文章刘建刚

册一个新的emit delay之后的处理时间定时器。 > > 你可以根据这个原理，再对照下你的数据，看看是否符合预期。 > > 刘建刚 mailto:liujiangangp...@gmail.com>> > 于2020年4月15日周三下午3:32写道： > 我们也经常有固定窗口定期触发的需求，但是添加以下参数并没有得到预期的效果（10秒的窗口，期待每秒都输出结果），是我的使用方法不对还是其他问题呢？多谢各位，下面是伪代码： > > public class EarlyEmitter

Re: flinksql如何控制结果输出的频率

2020-04-15 文章刘建刚

我们也经常有固定窗口定期触发的需求，但是添加以下参数并没有得到预期的效果（10秒的窗口，期待每秒都输出结果），是我的使用方法不对还是其他问题呢？多谢各位，下面是伪代码： public class EarlyEmitter { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1);

Re: 请教：用flink实现实时告警的功能

2020-09-06 文章刘建刚

针对规则改变，要想实时生效，有两种建议： 1. 利用维表join的功能来join数据库中的规则，flink内部可以配置一定的缓存策略。可以查看下Lookup的实现。 2. 也可以把规则打到kafka的表里，然后通过broadcast来广播最新的规则。李军于2020年9月4日周五下午5:46写道： > 您好！ > > >可以使用Flink+drools 做。drools可以实时更新规则 > 2020-9-4 > | | > 李军 > | > | > hold_li...@163.com > | > 签名由网易邮箱大师定制 >

Re: RocksDBStateBackend 问题

2020-09-07 文章刘建刚

直接存在rocksdb数据库。rocksdb会首先将数据写到内存buffer中（不会太大），等buffer满了再刷到磁盘。相比filesystem的statebackend，rocksdb会因为序列化和反序列化导致处理速度慢一些，但是优势是可以利用磁盘的超大空间来存储更大的状态。 zilong xiao 于2020年9月7日周一下午5:51写道： > 可以看下这个文档： > >

How to visit outer service in batch for sql

2020-08-26 文章刘建刚

For API, we can visit outer service in batch through countWindow, such as the following. We can visit outer service every 1000 records. If we visit outer service every record, it will be very slow for our job. source.keyBy(new KeySelector()) .countWindow(1000)

Re: 回复： BLinkPlanner sql join状态清理

2020-09-29 文章刘建刚

miniBatch下是无法ttl的，这个是修复方案：https://github.com/apache/flink/pull/11830 Benchao Li 于2020年9月29日周二下午5:18写道： > Hi Ericliuk, > > 这应该是实现的bug，你可以去社区建一个issue描述下这个问题。 > 有时间的话也可以帮忙修复一下，没有时间社区也会有其他小伙伴帮忙来修复的~ > > Ericliuk 于2020年9月29日周二下午4:59写道： > > >

Re: 执行mvn构建错误

2020-09-25 文章刘建刚

看着是mvn无法下载到某些包，你有使用过其他版本吗？如果都是相同的问题，那么应该是你本地环境或者网络环境的问题。迟成于2020年9月25日周五下午1:45写道： > 环境： > > tag release-1.11.2 > > commit fe361357 > > Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f) > > Java version: 1.8.0_251, vendor: Oracle Corporation, runtime: >

Re: group agg 开启了mini batch之后，state ttl不生效的问题

2020-09-30 文章刘建刚

修复方案参考https://github.com/apache/flink/pull/11830 kandy.wang 于2020年9月30日周三下午2:19写道： > group agg 开启了mini batch之后，state ttl不生效的问题： > > > 现在我们发现好像计算订单指标，写hbase，开启mini batch确实是需要的。这样可以大大降低sink > 算子的数据量，降低hbase的写入tps,减少hbase压力。不然每来一条数据就处理一次的话，单个任务就可以把hbase 的tps 干到十几万。 > > >

Re: FlinkSQL是否支持设置窗口trigger实现continuious trigger呢

2020-09-28 文章刘建刚

提供另外一种思路：内层是10s的翻滚窗口，外层接一个按5分钟为key的group by。为防止状态过大，可以设置ttl。简单demo如下： SELECT * FROM (SELECT TUMBLE_START(proctime, INTERVAL '10' SECOND) AS st, * FROM * GROUP BY TUMBLE(proctime, INTERVAL '10' SECOND) ) GROUP BY st / (5 * 60 * 1000) 赵一旦于2020年9月27日周日

Re: Re:HistoryServer完成任务丢失的问题

2020-09-28 文章刘建刚

修复方案为：https://issues.apache.org/jira/browse/FLINK-18959 xiao cai 于2020年9月27日周日下午6:42写道： > 貌似是个bug，我的版本是1.11.0 > > > > https://issues.apache.org/jira/browse/FLINK-18959?jql=project%20%3D%20FLINK%20AND%20issuetype%20%3D%20Bug%20AND%20text%20~%20%22history%20server%22 > > > 原始邮件 > 发件人: xiao cai

Re: 流与流 left join

2021-06-14 文章刘建刚

https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/joins/#interval-joins chenchencc <1353637...@qq.com> 于2021年6月15日周二上午9:46写道： > 你好，谢谢哈，想问下有相关的资料或者案例能发下吗？ > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/

Re: (无主题)

2021-06-21 文章刘建刚

嗨，你使用的是session还是perJob的作业？哪个flink版本？有详细的日志吗？一般不退户，可能是master卡在了哪里，比如我们遇到过卡在handler或者异步执行有问题。田磊于2021年6月20日周日上午11:14写道： > > 我用flink跑hbase的数据，flink的界面显示任务已经finished，正在running的任务为0。而yarn的界面显示正在running的状态，一直都结束不了，需要手动kill，是什么情况啊。 > > > | | > totorobabyfans > | > | > 邮箱：totorobabyf...@163.com > | >

Re: Re: flink on yarn 模式下，yarn集群的resource-manager切换导致flink应用程序重启，并且未从最后一次checkpoint恢复

2021-05-28 文章刘建刚

有一个好的思路，下一步先把ha搞起来再试试。 > >> org.apache.flink.configuration.GlobalConfiguration [] - > Loading > >> configuration property: execution.savepoint.path, > >> > hdfs:///user/flink/checkpoints/default/f9b85edbc6ca779b6e60414f3e3964f2/chk-100 > > > > > > > > > &g

Re: rocksdb状态后端最多保留checkpoints问题

2021-05-28 文章刘建刚

增量快照的原理是sst文件共享，系统会自动帮助你管理sst文件的引用，类似java的引用，并不会因为一个快照删除了就会把实际的数据删除掉。也就不会发生你说的情况 tison 于2021年5月28日周五上午1:47写道： > rocksdb 增量 checkpoint 不是你这么理解的，总的不会恢复不了。原因可以参考下面的材料 > > - > https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html > 官方 blog 介绍 > -

Re: flink on yarn 模式下，yarn集群的resource-manager切换导致flink应用程序重启，并且未从最后一次checkpoint恢复

2021-05-28 文章刘建刚

这种情况是不符合预期的。请问通过以下步骤可以稳定复现吗？ 1、从savepoint恢复； 2、作业开始定期做savepoint； 3、作业failover。如果是的话，可能需要排查下checkpoint 文件是否存在，zookeeper上是否更新。如果还是有问题，需要通过日志来排查了。董建 <62...@163.com> 于2021年5月28日周五下午5:37写道： > 我遇到的问题现象是这样的 > > > > > 1、flink版本flink-1.12.2，启动命令如下，指定-s是因为job有做过cancel，这里重启。 > > > > > flink run -d -s >

Re: 消息队列量级特别如何优化消费

2021-03-05 文章刘建刚

本质原因是作业资源不足无法处理大量数据，好像只有扩大并发来解决了。 allanqinjy 于2021年3月5日周五上午10:48写道： > > > hi, > 由于消息队列量特别大，在flink接入以后被压严重。除了修改并行度以外还有没有其他的优化方案!!! > > > | | > allanqinjy > | > | > allanqi...@163.com > | > 签名由网易邮箱大师定制 > >

Re: 退订

2021-07-30 文章刘建刚

Send anything to user-zh-unsubscr...@flink.apache.org hihl 于2021年7月27日周二下午5:50写道： > 退订

Re: 通过savepoint重启任务，link消费kafka，有重复消息

2021-08-02 文章刘建刚

cancel with savepoint是之前的接口了，会造成kafka数据的重复。新的stop with savepoint则会在做savepoint的时候，不再发送数据，从而避免了重复数据，哭啼可以参考 https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/savepoints/ Jim Chen 于2021年8月2日周一下午2:33写道： > 我是通过savepoint的方式重启的，命令如下： > > cancel command: > >

Re: Flink SQL是否支持Count Window函数？

2021-09-23 文章刘建刚

这个目前还不支持，但是可以基于TVF来实现，现在已经建了一个jira了： https://issues.apache.org/jira/browse/FLINK-24002 Caizhi Weng 于2021年9月22日周三上午11:17写道： > Hi！ > > 据我所知目前暂时没有增加 count window 的打算，以后可能会在最新的 Window TVF 里添加 count window tvf。 > > 不建议在 SQL 中自行实现 count window，因为 SQL 添加 window 较为复杂。但可以考虑先将 SQL 转为 > datastream，用

Re: 回复：Flink的api停止方式

2021-10-11 文章刘建刚

可以尝试以下两种方法： 1、达到停止条件时，通过一定方式通知外界工具，外界工具来帮忙停止作业。 2、现在RichFunction里可以拿到jobId，但是拿不到applicationId，可以看看能否修改代码获取它，比如通过环境变量。然后再调用restful 接口停止作业。 lei-tian 于2021年10月11日周一上午9:11写道： > 因为要在代码里面判断是否停止的条件，停止的时候还是要在代码里面停止吧。 > > > > > > > > > > > > > > > > > > 在 2021-10-11 09:06:17，"995626544"

Re: flink on native k8s 资源弹性扩容问题

2021-09-26 文章刘建刚

这个不支持，你可以通过外部的工具来做到。比如，检测cpu到了一定程度就自动的重启作业来扩容。赵旭晨于2021年9月23日周四下午9:14写道： > 目前生产上环境作业参数0.2（cpu），5G > 平常增量跑的时候cpu占用率不到5%，上游数据全量初始化时经常会把CPU打满 > > 想问下：flink能否做到弹性扩容？当pod的request cpu打满时自动增加cpu，当高峰期过后处于增量阶段时再收回部分pod资源？ > > > > >

Re: [ANNOUNCE] Open source of remote shuffle project for Flink batch data processing

2021-11-30 文章刘建刚

Good work for flink's batch processing! Remote shuffle service can resolve the container lost problem and reduce the running time for batch jobs once failover. We have investigated the component a lot and welcome Flink's native solution. We will try it and help improve it. Thanks, Liu Jiangang

Re: batch模式下任务plan的状态一直为CREATED

2021-12-21 文章刘建刚

固定资源的情况下，batch的调度会按照拓扑顺序执行算子。如果你的资源只够运行一个source，那么等source运行完毕后才能运行SortLimit。 RS 于2021年12月21日周二 16:53写道： > hi， > > 版本：flink1.14 > > 模式：batch > > 测试场景：消费hive大量数据，计算某个字段的 top 10 > > > 使用sql-client测试，创建任务之后，生成2个plan，一个Source，一个SortLimit。Source状态为RUNNING，SortLimit状态一直为CREATED。 > >

Re: 请教flink sql作业链路延迟监控如何实现

2021-12-22 文章刘建刚

1、端到端的延迟可以通过latencyMarker来监控，但是可能会对性能有一定的影响。具体参考 https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#end-to-end-latency-tracking 2、kafka本身的延迟，直接使用kafka的groupId的lag即可。 3、Flink处理的延迟，这个好像没有原生的，可以通过反压来查看是否有有性能问题。另外，通过1、2也可以反映延迟情况。 RS 于2021年12月23日周四 10:37写道： >

Re: Re: Re:Re: batch模式下任务plan的状态一直为CREATED

2021-12-22 文章刘建刚

创建任务之后，生成2个plan。。。你这里的表述是两个相连接的JobVertex吧！第二个依赖第一个，需要等第一个执行完才会执行第二个。如果是流模式的话，二者会同时执行。 RS 于2021年12月23日周四 10:30写道： > 只有一条SQL，只是数据量比较大，使用的BATCH模式。 > SELECT price > > FROM hive.data.data1 > > ORDER BY price DESC > > > > 在 2021-12-22 18:35:13，"刘建刚" 写道： >

Re: Re:Re: batch模式下任务plan的状态一直为CREATED

2021-12-22 文章刘建刚

资源吗？ > > > > > > > > > > > >在 2021-12-21 17:57:21，"刘建刚" 写道： > > >>固定资源的情况下，batch的调度会按照拓扑顺序执行算子。如果你的资源只够运行一个source，那么等source运行完毕后才能运行SortLimit。 > >> > >>RS 于2021年12月21日周二 16:53写道： > >> > >>> hi， >

Re: flink on yarn 的pre_job提交失败,但是session模式可以成功

2021-11-04 文章刘建刚

通过你上面的信息是看不出来的，里头的链接你可以看下详细日志 http://ark1.analysys.xyz:8088/cluster/app/application_1635998548270_0028 陈卓宇 <2572805...@qq.com.invalid> 于2021年11月4日周四下午6:29写道： > yarn的错误日志: > org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to > initialize the cluster entrypoint

How to execute multi SQL in one job

2021-10-25 文章刘建刚

I have multi batch SQL commands separated by semicolon(;). The SQL commands need to be executed in order(In other cases, the SQL command may share sources or sinks). I want to execute them in one job. When I use tableEnv.executeSql(multiSQL), it will throw errors. How can I execute them in one

Re: How to execute multi SQL in one job

2021-10-25 文章刘建刚

> if (insertSqlBuffer.size > 0) { > insertSqlBuffer.foreach(item => { > println("insert sql: " + item) > statementSet.addInsertSql(item) > }) > val explain = statementSet.explain() > println(explain) > statementSet.execute() > } > > > ``` &

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-10 文章刘建刚

Glad to see the suggestion. In our test, we found that small jobs with the changing configs can not improve the performance much just as your test. I have some suggestions: - The config can affect the memory usage. Will the related memory configs be changed? - Can you share the tpcds

Re: [DISCUSS] Change some default config values of blocking shuffle

2022-01-04 文章刘建刚

> > >>>> >> > Hi Jiangang, >>>> >> > >>>> >> > Thanks for your suggestion. >>>> >> > >>>> >> > >>> The config can affect the memory usage. Will the related >>>> memor

49 matches

Mail list logo