Re:create table语句从kafka中读取数据时,创建的表的数据保存多久?

2020-12-15 Thread felixzh
数据还是保存在kafka里面的,具体保存多久,要看你kafka的全局配置或者特定topic的配置 At 2020-12-09 20:24:17, "邮件帮助中心" wrote: >

flink 1.11 版本sql client 不支持checkpoint

2020-12-15 Thread lingchanhu
在官网文档中看到在代码中对于开启checkpoint配置,但是sql client 的相关文档没有checkpoint的描述,是不支持么? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink jobmanager TLS connectivity to Zookeeper

2020-12-15 Thread Robert Metzger
Hey Azeem, I haven't tried this myself, but from the code / documentation, this could work: Flink ships with ZK 3.4 by default. You need to remove the ZK3.4 jar file from the lib/ folder and add the ZK3.5 file from opt/ to lib/. According to this guide, you could try passing the SSL

Re: failed w/ Application Mode but succeeded w/ Per-Job Cluster Mode

2020-12-15 Thread Dongwon Kim
I just added the following option to the script: -Dclassloader.parent-first-patterns.additional=org.apache.kafka.common.serialization Now it seems to work. Why do the application mode and the per-job cluster mode behave differently when it comes to the classloading? Is it a bug? or intended?

UDF函数类型匹配问题

2020-12-15 Thread chen310
flink版本是1.11.1 创建了一个udf函数如下 public class FromUnixTime extends ScalarFunction { private static final Logger logger = LoggerFactory.getLogger(FromUnixTime.class); public String eval(long unixTime, String timeZone, String format) { try { DateFormat dateFormat = new

failed w/ Application Mode but succeeded w/ Per-Job Cluster Mode

2020-12-15 Thread Dongwon Kim
Hi, I have an artifact which works perfectly fine with Per-Job Cluster Mode with the following bash script: #!/bin/env bash export FLINK_CONF_DIR=./conf export HADOOP_CLASSPATH=`hadoop classpath` $FLINK_HOME/bin/flink run -t yarn-per-job myjar.jar myconf.conf I tried Application Mode [1]

Re: Get current kafka offsets for kafka sources

2020-12-15 Thread Juha Mynttinen
Hey, Have a look at [1]. Basically, you won't see the "real-time" consumer group offsets stored in Kafka itself, but only the ones the Flink Kafka consumer stores there when checkpointing (assuming you have checkpointing enabled). The same information is available in Flink metrics [2],

Re: pause and resume flink stream job based on certain condition

2020-12-15 Thread Robert Metzger
What you can also do is rely on Flink's backpressure mechanism: If the map operator that validates the messages detects that the external system is down, it blocks until the system is up again. This effectively causes the whole streaming job to pause: the Kafka source won't read new messages. On

Re: Direct Memory full

2020-12-15 Thread Robert Metzger
Hey Rex, the direct memory is used for IO. There is no concept of direct memory being "full". The only thing that can happen is that you have something in place (Kubernetes, YARN) that limits / enforces the memory use of a Flink process, and you run out of your memory allowance. The direct memory

Re: Connecting to kinesis with mfa

2020-12-15 Thread Robert Metzger
Hey Avi, Maybe providing secret/access key + session token doesn't work, and you need to provide either one of them? https://docs.aws.amazon.com/credref/latest/refdocs/setting-global-aws_session_token.html I'll also ping some AWS contributors active in Flink to take a look at this. Best, Robert

Re: Get current kafka offsets for kafka sources

2020-12-15 Thread Rex Fenley
I'll give a look into that approach. Thanks On Tue, Dec 15, 2020 at 9:48 PM Aeden Jameson wrote: > My understanding is the FlinkKafkaConsumer is a wrapper around the > Kafka consumer libraries so if you've set the group.id property you > should be able to see the offsets with something like >

Re: pyflink 定义udf 运行报错

2020-12-15 Thread Xingbo Huang
Hi, 因为你没有提供详细的作业信息,单看报错可以看到是使用的Python UDF抛出来的,更具体点是你Python UDF返回的字符串结果在java端反序列的时候失败了,你可以检查一下你对应的Python UDF Best, Xingbo Leopard 于2020年12月16日周三 上午9:42写道: > pyflink 1.11.1 > > Fail to run sql command: SELECT > driverStatus,userId,latitude,locTime,longitude,city_code,ad_code >

Re: Get current kafka offsets for kafka sources

2020-12-15 Thread Aeden Jameson
My understanding is the FlinkKafkaConsumer is a wrapper around the Kafka consumer libraries so if you've set the group.id property you should be able to see the offsets with something like kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-flink-application. On Tue,

Get current kafka offsets for kafka sources

2020-12-15 Thread Rex Fenley
Hi, Is there any way to fetch the current kafka topic offsets for the kafka sources for flink? Thanks! -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com | BLOG | FOLLOW US | LIKE US

Re: pyflink 引用第三库的文件出现安装权限的问题

2020-12-15 Thread Xingbo Huang
Hi, 默认就是你每台机器的python指向的环境下,当然你也可以通过-pyexec指定不同的python环境 Best, Xingbo magichuang 于2020年12月15日周二 下午8:02写道: > 我现在看看那个报错,flink是把requirements.txt 和 cached_dir 已经先上传到hdfs上了,因为 >

使用flinksql提供的内置函数LAST_VALUE 发现存的state越来越大

2020-12-15 Thread guoliang_wang1335
使用flink1.10。。开启了mini-batch和设置了idleStateRetentionTime,执行中会进行left join操作,最后sink的时候insert into table select a, LAST_VALUE(b) group by a; 有关配置如下: val tConfig: TableConfig = tEnv.getConfig tConfig.setIdleStateRetentionTime(Time.hours(1), Time.hours(2)) val configuration =

flink1.9.1 如何配置RocksDB的block-cache-usage参数

2020-12-15 Thread bradyMk
Hi~想请教一下大家: 最近使用flink1.9版本用RocksDB做增量ck,我想配置如下两个内容的指标来监控任务的内存情况: ①block-cache-usage ②write buffer 但是在官网[1]并没有找到相关指标,通过查阅资料得知: write buffer对应的指标为:state.backend.rocksdb.metrics.cur-size-all-mem-tables 而block-cache-usage的指标是1.10版本之后才有的,1.9版本没有这个指标; 问: ①write buffer是否对应这个指标 ->

Re: flink sql 1.12 写数据到elasticsearch,部署问题

2020-12-15 Thread Yangze Guo
需要放 flink-sql-connector-elasticsearch7_2.11-1.12.0.jar Best, Yangze Guo On Wed, Dec 16, 2020 at 11:34 AM cljb...@163.com wrote: > > hi, > flink sql 1.12版本,写数据到elasticsearch时,本地执行正常,部署到服务器上,报如下错误。 > 检查了打的jar包,里面是包含相应的类的,在flink > lib下也已经放了flink-connector-elasticsearch7_2.11-1.12.0.jar 包。

Re: flink1.11 datastream elasticsearch sink 写入es需要账号密码验证,目前支持这种操作吗?

2020-12-15 Thread Yangze Guo
1.11版本中尚不支持username和password的设置,这两个配置在1.12中加入了新的es connector[1] [1] https://issues.apache.org/jira/browse/FLINK-18361 Best, Yangze Guo On Wed, Dec 16, 2020 at 11:34 AM 李世钰 wrote: > > flink1.11 datastream elasticsearch sink 写入es需要账号密码验证,目前支持这种操作吗? > elasticsearch7.0 > > > > > > >

Re: 使用flinksql提供的内置函数LAST_VALUE 发现存的state越来越大

2020-12-15 Thread guoliang_wang1335
补充下具体设置: 使用flink1.10。。开启了mini-batch和设置了idleStateRetentionTime,在最后sink的时候insert into table select a, LAST_VALUE(b) group by a; 有关配置如下: val tConfig: TableConfig = tEnv.getConfig tConfig.setIdleStateRetentionTime(Time.hours(1), Time.hours(2)) val configuration =

Re: FlinkSQL多个select group by统计union后输出,或多个select union后在group by统计的区别,以及问题。

2020-12-15 Thread 赵一旦
为了方便描述,重新给出了完整SQL,以及部分分析到如下地址。 https://www.yuque.com/sixhours-gid0m/eegye3/xrz2mm 欢迎大家帮忙解答。 赵一旦 于2020年12月16日周三 上午10:52写道: > > 从这2个方案的source结点来看没有太大区别。但问题在于,我从web-ui的metric标签查看outputwatermark的时候。发现方案2中0号并行实例存在8个带有outputwatermark的指标(1个source开头,7个calc开头)。方案3中则只有2个。 > > 赵一旦 于2020年12月16日周三

flink sql 1.12 写数据到elasticsearch,部署问题

2020-12-15 Thread cljb...@163.com
hi, flink sql 1.12版本,写数据到elasticsearch时,本地执行正常,部署到服务器上,报如下错误。 检查了打的jar包,里面是包含相应的类的,在flink lib下也已经放了flink-connector-elasticsearch7_2.11-1.12.0.jar 包。 调整了类的加载,试了child-first和parent-first都不行 有遇到类似问题的吗? 谢谢! 错误提示如下: org.apache.flink.client.program.ProgramInvocationException: The main

flink1.11 datastream elasticsearch sink 写入es需要账号密码验证,目前支持这种操作吗?

2020-12-15 Thread 李世钰
flink1.11 datastream elasticsearch sink 写入es需要账号密码验证,目前支持这种操作吗? elasticsearch7.0 -- -- 李世钰 Mail:m...@lishiyu.cn TEL:18801236165

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-12-15 Thread Yun Gao
Hi Aljoscha, Very thanks for the feedbacks! For the remaining issues: > 1. You mean we would insert "artificial" barriers for barrier 2 in case we receive EndOfPartition while other inputs have already received barrier 2? I think that makes sense, yes. Yes, exactly,

Re: flink1.9.1单任务配置rocksDB不生效

2020-12-15 Thread bradyMk
Hi~谢谢解答~ 我去查看了下TM的日志,发现的确是启动了rocksDB状态后端; 可是为什么在web ui 中 Job Manager --> Configuration 中 state.backend还是显示的是:filesystem呢? 不应该是:RocksDB 么? - Best Wishes -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink1.12 docker 镜像啥时候有

2020-12-15 Thread Yang Wang
目前社区在将镜像推到docker hub的过程中遇到了点问题,正在解决 具体你可以跟进一下这个PR https://github.com/docker-library/official-images/pull/9249 当前你也可以自己build一个镜像来使用,方法如下: git clone https://github.com/apache/flink-docker.git git checkout dev-master./add-custom.sh -u

Re: flink-shaded-hadoop-2-uber版本如何选择

2020-12-15 Thread Yang Wang
以flink-shaded-hadoop-2-uber的2.8.3-10.0为例 2.8.3指的hadoop的版本,10.0指定的flink-shaded[1]的版本 社区从1.10开始不再推荐使用flink-shaded-hadoop的方式,而且通过设置HADOOP_CLASSPATH环境变量来提交[2], 这样可以让Flink变得hadoop free,从而同时支持hadoop2和hadoop3 如果你还坚持使用flink-shaded-hadoop,那就建议使用最新的版本就可以了2.8.3-10.0 [1].

Re: FlinkSQL多个select group by统计union后输出,或多个select union后在group by统计的区别,以及问题。

2020-12-15 Thread 赵一旦
从这2个方案的source结点来看没有太大区别。但问题在于,我从web-ui的metric标签查看outputwatermark的时候。发现方案2中0号并行实例存在8个带有outputwatermark的指标(1个source开头,7个calc开头)。方案3中则只有2个。 赵一旦 于2020年12月16日周三 上午10:41写道: > 有没有人懂啊。今天的新发现如下。 > 我看了下我的source结点的WEB-UI上展示的那个名字,然后在文本编辑器中划分了下。发现如下。 > 方案2: > > Source:

Re: Flink1.11.1版本Application Mode job on K8S集群,too old resource version问题

2020-12-15 Thread Yang Wang
我之间建了一个JIRA来跟进too old resource version的问题[1] 目前在Flink里面采用了Watcher来监控Pod的状态变化,当Watcher被异常close的时候就会触发fatal error进而导致JobManager的重启 我这边做过一些具体的测试,在minikube、自建的K8s集群、阿里云ACK集群,稳定运行一周以上都是正常的。这个问题复现是通过重启 K8s的APIServer来做到的。所以我怀疑你那边Pod和APIServer之间的网络是不是不稳定,从而导致这个问题经常出现。 [1].

Re: FlinkSQL多个select group by统计union后输出,或多个select union后在group by统计的区别,以及问题。

2020-12-15 Thread 赵一旦
有没有人懂啊。今天的新发现如下。 我看了下我的source结点的WEB-UI上展示的那个名字,然后在文本编辑器中划分了下。发现如下。 方案2: Source: TableSourceScan(table=[[default_catalog, default_database, baidu_log, watermark=[-(TO_TIMESTAMP(FROM_UNIXTIME(/(CASE(IS NOT NULL($1), CAST($1):BIGINT NOT NULL, 0:BIGINT), 1000))), 6:INTERVAL SECOND)]]],

Flink - sending clicks+impressions to AWS Personalize

2020-12-15 Thread Dan Hill
I want to try using AWS Personalize to get content recommendations. One of the fields on the input (click) event is a list of recent impressions. E.g. { ... eventType: 'click', eventId: 'click-1', itemId: 'item-1' impression: ['item-2', 'item-3',

Re: Flink 1.12

2020-12-15 Thread Yang Wang
Hi Boris, What is -p 10? It is same to --parallelism 10. Set the default parallelism to 10. does it require a special container build? No, the official flink docker image could be used directly. Unfortunately, we do not have the image now. And we are trying to figure out. You could follow the

Re: Flink 1.10.0 on yarn 提交job失败

2020-12-15 Thread Xintong Song
看起来是 Yarn 没有给应用设置 hadoop classpath。可以登机器确认一下 launch_container.sh 的内容,container 启动命令里是否包含了正确的 hadoop classpath。Yarn 是定制过的版本吗?按理说开源版本都会给 container 设置 hadoop classpath 的。 1.10 版本以前可以运行是因为 flink 自带了 shaded hadoop,从 1.10 版本开始 flink 默认不再携带 shaded hadoop,而是使用集群环境的 hadoop 依赖。你也可以自己携带 shaded

flink sql数据处理时延的测试方法?

2020-12-15 Thread jindy_liu
请问下,在flink sql里,一般用啥方法去衡量一个任务里,一条mysql cdc流从输入flink,走完flink内部算子,到sink外部系统的整体时延? 或者说整个任务的时延? 总说是实时,目前也不知道处理的实时的量级! -- Sent from: http://apache-flink.147419.n8.nabble.com/

加锁两次

2020-12-15 Thread xiaobao li
org.apache.flink.runtime.entrypoint.ClusterEntrypoint的runCluster方法已经加锁了,synchronized (lock)锁住了整个方法 那么在这个方法内部又调用了initializeServices方法,这个方法里面又有synchronized (lock),加锁两次有啥意义吗?

Re: Application Mode job on K8S集群,无法缩容问题

2020-12-15 Thread Yang Wang
没有特别理解清楚你的意思,你是希望让K8s来完成自动伸缩? Native Flink on K8s的基本理念是由Flink的ResourceManager向K8s主动申请/释放TaskManager Pod, 如果一个TaskManager处于idle状态,那就会被自动释放。这样是更加灵活的,TaskManager的生命周期 交给Flink ResourceManager来进行控制。而且以后可能会有TaskManager是不同规格的情况,所以直接 用Pod会更加合适一些。 你如果是想要根据cpu、内存负载来调整TM数量,那可以使用Standalone Flink on

Re: flink 1.12提交用户任务方法问题

2020-12-15 Thread Yang Wang
org.apache.flink.client.ClientUtils#submitJob这个方法不是给用户直接来使用的,所以重构的过程中可能会被移除掉 建议你使用org.apache.flink.client.program.rest.RestClusterClient#submitJob进行代替 Best, Yang 陈帅 于2020年12月15日周二 下午8:28写道: > 请问 flink 1.11 版本下 用于提交用户任务的方法 > org.apache.flink.client.ClientUtils.submitJob(ClusterClient,

Re: 转发: 两条流去重后再关联出现不符合预期数据

2020-12-15 Thread hdxg1101300...@163.com
我是这样想的,因为最后的两条流关联是 两条结果流的关联,两条结果流 都属于回撤流,任何一边变化都是2条消息;对于左侧第一条就是回撤,第二条就是变化后的;但是右边发生变化 则会有两条数据,false消息 和左边关联 认为变化整个流表示变化回撤再显示关联后的数据;true数据来了再次关联 认为整个流变化;撤回再关联发出; 我的想法是可不可以 之和右边流为true的数据关联; hdxg1101300...@163.com 发件人: hdxg1101300123 发送时间: 2020-12-15 23:44 收件人: user-zh 主题: 转发:

Re: Is working with states supported in pyflink1.12?

2020-12-15 Thread Xingbo Huang
Hi, As Chesnay said, PyFlink has already supported Python DataStream stateless APIs so that users are able to perform some basic data transformations, but doesn't provide state access support yet in release-1.12. The proposal[1] of enhancing the API with state access has been made and related

回复: temparol table join后无法sink

2020-12-15 Thread guoliubi...@foxmail.com
找到原因了,数据问题,两个kafka source的earliest的数据timestamp差距比较大,导致在join时一直在堆积数据等待另一个队列的时间戳到达。调整offset让两个队列的时间戳一致后问题消失。 guoliubi...@foxmail.com 发件人: guoliubi...@foxmail.com 发送时间: 2020-12-16 07:36 收件人: user-zh 主题: temparol table join后无法sink Hi, 流程是从两个kafka队列中取数据,做完temparol table

Re: flink-shaded-hadoop-2-uber*-* 版本确定问题

2020-12-15 Thread Yang Wang
你得确认hadoop classpath返回的是完整的,正常情况下hadoop classpath这个命令会把所有的hadoop jar都包含进去的 如果报类或者方法不存在需要确认相应的jar是否存在,并且包含进去了 社区推荐hadoop classpath的方式主要是想让Flink做到hadoop free,这样在hadoop2和hadoop3都可以正常运行了 Best, Yang Jacob <17691150...@163.com> 于2020年12月15日周二 上午9:25写道: > 谢谢回复! > > 这个文档我也有查看 > >

pyflink 定义udf 运行报错

2020-12-15 Thread Leopard
pyflink 1.11.1 Fail to run sql command: SELECT driverStatus,userId,latitude,locTime,longitude,city_code,ad_code ,geo_to_h3(latitude,longitude,7) as h3_hash,geo_to_numpy_int_h3(latitude,longitude,7) as h3_code FROM lbs_trace CROSS JOIN UNNEST(datas),lateral table(split_json(expandInfo)) as

Flink 1.12.0 写ORC文件,自定义文件名

2020-12-15 Thread Jacob
目前,Flink在写ORC文件时候,可通过OutputFileConfig类配置文件的前缀后缀:.withPartPrefix("prefix")、.withPartSuffix(".ext") 生成的文件格式为:part-- 有没有可以完全自定义生成的文件名,比如:"dt=1608006781874",dt=时间戳的形式生成文件,目的是可以直接作为分区load在hive表。后期容易操作hive表。如果是flink默认的文件格式无法load在hive表。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

回复: 邮件退订

2020-12-15 Thread Evan
你好,退订需发邮件到 user-zh-unsubscr...@flink.apache.org 更多详细情况可以参考 [1] https://flink.apache.org/community.html#mailing-lists 发件人: 谢治平 发送时间: 2020-12-16 09:08 收件人: user-zh 主题: 邮件退订 您好,邮件退订一下

Re: Flink 1.12

2020-12-15 Thread Boris Lublinsky
Thanks Chesney for your quick response, I read documentation https://cwiki.apache.org/confluence/display/FLINK/FLIP-144%3A+Native+Kubernetes+HA+for+Flink#FLIP144:NativeKubernetesHAforFlink-NativeK8s

邮件退订

2020-12-15 Thread 谢治平
您好,邮件退订一下

temparol table join后无法sink

2020-12-15 Thread guoliubi...@foxmail.com
Hi, 流程是从两个kafka队列中取数据,做完temparol table join后取滚动窗口做UDAF,然后sink,代码大概如下 joined_table = t_env.sql_query(""" SELECT o.exchangeCode_ as code, o.price, o.volume, o.eventTime FROM orders AS o INNER JOIN quotes FOR SYSTEM_TIME AS OF o.eventTime q ON

Re: Is working with states supported in pyflink1.12?

2020-12-15 Thread Chesnay Schepler
It is currently not possible to access state with the Python API. A proposal has recently been made to enhance the API with state access (under FLIP-152), but at this time I cannot provide a prediction for when it might be released. On 12/15/2020 7:55 PM, Nadia Mostafa wrote: Hello, I'm

Is working with states supported in pyflink1.12?

2020-12-15 Thread Nadia Mostafa
Hello, I'm new to flink and trying to build a stateful application using python datastream API but can't find any example of how to use states in python in flink 1.12 documentation. Is states supported in the python datastream API?And if so, how can I use it? Thanks in advance!

Flink - Create Temporary View and "Rowtime attributes must not be in the input rows of a regular join"

2020-12-15 Thread Dan Hill
When I try to refactor my joins into a temporary view to share joins and state, I get the following error. I tried a few variations of the code snippets below (adding TIMESTAMP casts based on Google searches). I removed a bunch of fields to simplify this example. Is this a known issue? Do I

Re: Flink 1.12

2020-12-15 Thread Chesnay Schepler
Unfortunately no; there are some discussions going on in the docker-library/official-images PR that have to be resolved first, but currently these would require changes on the Flink side that we cannot do (because it is already

Re: Flink 1.12

2020-12-15 Thread Boris Lublinsky
Thanks. Do you have ETA for docker images? > On Dec 14, 2020, at 3:43 AM, Chesnay Schepler wrote: > > 1) It is compiled with Java 8 but runs on Java 8 & 11. > 2) Docker images are not yet published. > 3) It is mentioned at the top of the Kubernetes HA Services documentation > that it also

转发: 两条流去重后再关联出现不符合预期数据

2020-12-15 Thread hdxg1101300123
-- 转发的邮件 -- 发件人:hdxg1101300123 日期:2020年12月15日 10:36 主题:两条流去重后再关联出现不符合预期数据 收件人:user-zh 抄送: > 你好: >     我在使用flink > 1.11.2版本的时候使用flinksql处理两条流。因为两条流都是数据库变更信息,我需要取最新的数据关联;所以分别对两条流做row_number=1 > (SELECT [column_list] FROM ( >    SELECT [column_list], > ROW_NUMBER()

Flink 1.12 and Stateful Functions

2020-12-15 Thread Jan Brusch
Hi, just a quick question: Is there a rough estimation, when the Flink 1.12 Features (especially the new HA-Mode) will also be available in Flink Stateful Functions? Best regards Jan

Re: FlinkSQL多个select group by统计union后输出,或多个select union后在group by统计的区别,以及问题。

2020-12-15 Thread 赵一旦
方案2没问题,方案3的window算子部分没有watermark。 赵一旦 于2020年12月15日周二 下午10:49写道: > 具体SQL如下。 > 方案2: > > > INSERT INTO flink_sdk_stats > ( > SELECT > DATE_FORMAT(TUMBLE_END(`event_time`, INTERVAL '5' MINUTE), > 'MMddHHmm') AS `time`, > sid

回复:Flink sql 列裁剪原理请教

2020-12-15 Thread SmileSmile
hi,hailongwang project_remove可以消掉两个链接在一起的projection,如果只投影一个字段,可是经过好几层sql嵌套,底层投影了大量的字段。如何做到更好的列裁剪,这块flink的相关实现是否有? | | a511955993 | | 邮箱:a511955...@163.com | 签名由 网易邮箱大师 定制 在2020年12月15日 22:33,hailongwang 写道: Hi, 1. projection prune 可查看:CoreRules.PROJECT_REMOVE,

Re: FlinkSQL多个select group by统计union后输出,或多个select union后在group by统计的区别,以及问题。

2020-12-15 Thread 赵一旦
具体SQL如下。 方案2: INSERT INTO flink_sdk_stats ( SELECT DATE_FORMAT(TUMBLE_END(`event_time`, INTERVAL '5' MINUTE), 'MMddHHmm') AS `time`, sid AS `supply_id`, 'd77' AS `field_key`, d77 AS `filed_value`, count(1)

FlinkSQL多个select group by统计union后输出,或多个select union后在group by统计的区别,以及问题。

2020-12-15 Thread 赵一旦
需要,针对某个表,按照key1(xxx+yyy+ky1),key2(xxx+yyy+ky2),等多组key统计。其中xxx+yyy为共同字段。目前有如下3种实现我。 (1)每组key分别统计,分别insert。 (2)每组key分别统计,然后union结果,然后insert。 (3)针对表多次select,然后union,然后再基于key统计,然后insert。 第三种方案中,会将ky1、ky2这几个不同的字段通过 select 'ky1' as key_name, ky1 as key_value union select 'ky2' as key_name, ky2 as

Re:Re: 关于 stream-stream Interval Join 的问题

2020-12-15 Thread hailongwang
Hi, 可以关注: https://issues.apache.org/jira/browse/FLINK-20162 https://issues.apache.org/jira/browse/FLINK-20387 Best, Hailong 在 2020-12-15 10:40:19,"赵一旦" 写道: >补充,实际FROM_UNIXTIME应该返回 TIMESTAMP WITH LOCAL TIME ZONE >这个类型。(然后FlinkSQL可以自己转为TIMESTAMP)。 > >此外,关于分窗,除了offset这种显示的由用户来解决时区分窗以外。还可以通过支持

Re:Flink sql 列裁剪原理请教

2020-12-15 Thread hailongwang
Hi, 1. projection prune 可查看:CoreRules.PROJECT_REMOVE, FlinkLogicalCalcRemoveRule.INSTANCE 2. projection push into tablesource 可查看:PushProjectIntoTableSourceScanRule Best, Hailong 在 2020-12-15 20:57:32,"SmileSmile" 写道: >hi,社区的各位,是否有了解flink sql的列裁剪的实现原理? >

Re: zeppelin+flink1.12问题

2020-12-15 Thread Jeff Zhang
钉钉群里的新版本已经解决了,钉钉群号:32803524 赵一旦 于2020年12月15日周二 下午4:37写道: > 如题,zeppelin+flink1.12报错(select 1)。 > org.apache.zeppelin.interpreter.InterpreterException: > org.apache.zeppelin.interpreter.InterpreterException: > java.lang.NoSuchMethodError: > >

Flink sql 列裁剪原理请教

2020-12-15 Thread SmileSmile
hi,社区的各位,是否有了解flink sql的列裁剪的实现原理? 通过calcite的rbo可以实现sql优化,calcite的coreRules好像没有实现列裁剪。看一些文章有提到flink有实现projection pushdown。请问下这部分源码对应哪里 Best! | | a511955993 | | 邮箱:a511955...@163.com | 签名由 网易邮箱大师 定制

Re: Application Mode job on K8S集群,无法缩容问题

2020-12-15 Thread lichunguang
现在这种模式,可以实现每个pod可以配置不同的配置,如CPU、MEM。 但是整体的资源配置是相同的,是否使用statefulset模式启动TM pod更合适呢? -- Sent from: http://apache-flink.147419.n8.nabble.com/

flink 1.12提交用户任务方法问题

2020-12-15 Thread 陈帅
请问 flink 1.11 版本下 用于提交用户任务的方法 org.apache.flink.client.ClientUtils.submitJob(ClusterClient, JobGraph)方法在 flink 1.12版本下找不到了, 现在用哪个方法取代了呢?又该如何获取提交任务后的jobId呢?谢谢! | publicstaticJobExecutionResultsubmitJob( | | | ClusterClientclient, | | | JobGraphjobGraph) throwsProgramInvocationException { | | |

Re: FlinkSQL kafka->dedup->kafka

2020-12-15 Thread Konstantin Knauf
HI Laurent, Did you manage to find the error in your MATCH_RECOGNIZE statement? If I had to take a guess, I'd say it's because you are accessing A, but due to the quantifier of * there might actually be no event A. Cheers, Konstantin On Fri, Nov 27, 2020 at 10:03 PM Laurent Exsteens <

Re: flink1.9.1单任务配置rocksDB不生效

2020-12-15 Thread Congxian Qiu
Hi state.backend 应该是你在 flink-conf 中设置了这个值。具体到你这里的情况,最终的配置是 RocksDB(以代码为准,如果代码没有设置会使用 flink-conf 中的文件)。你可以看看 TM 日志,应该可以看到更详细的信息 Best, Congxian bradyMk 于2020年12月15日周二 下午5:05写道: > Hi,想请教大家一个问题,我用单任务配置使用rocksDB状态后端,代码如下: > > val backend = new RocksDBStateBackend(path, true) >

Re: pyflink 引用第三库的文件出现安装权限的问题

2020-12-15 Thread magichuang
我现在看看那个报错,flink是把requirements.txt 和 cached_dir 已经先上传到hdfs上了,因为 /yarn/nm/usercache/root/appcache/application_1608026509770_0001/flink-dist-cache-f2899798-cfd4-42b9-95d3-204e7e34b943/418cd5a449fd6ec4b61a3cc2dca4ea08/requirements.txt

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-12-15 Thread Aljoscha Krettek
Thanks for the thorough update! I'll answer inline. On 14.12.20 16:33, Yun Gao wrote: 1. To include EndOfPartition into consideration for barrier alignment at the TM side, we now tend to decouple the logic for EndOfPartition with the normal alignment behaviors to avoid the complex

Connecting to kinesis with mfa

2020-12-15 Thread Avi Levi
Hi guys, we are struggling to connect to kinesis when mfa is activated. I did configured everything according to the documentation but still getting exception : val producerConfig = new Properties() producerConfig.put(AWSConfigConstants.AWS_REGION, awsRegion)

flink1.9.1单任务配置rocksDB不生效

2020-12-15 Thread bradyMk
Hi,想请教大家一个问题,我用单任务配置使用rocksDB状态后端,代码如下: val backend = new RocksDBStateBackend(path, true) backend.setPredefinedOptions(PredefinedOptions.SPINNING_DISK_OPTIMIZED) env.setStateBackend(backend.asInstanceOf[StateBackend]) 但是运行代码后,去webui查看Job Manager --> Configuration

zeppelin+flink1.12问题

2020-12-15 Thread 赵一旦
如题,zeppelin+flink1.12报错(select 1)。 org.apache.zeppelin.interpreter.InterpreterException: org.apache.zeppelin.interpreter.InterpreterException: java.lang.NoSuchMethodError: org.apache.flink.api.common.ExecutionConfig.disableSysoutLogging()Lorg/apache/flink/api/common/ExecutionConfig; at