Re: Jupyter PyFlink Web UI

2021-06-08 Thread Maciej Bryński
Nope. I found the following solution. conf = Configuration() env = StreamExecutionEnvironment(get_gateway().jvm.org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf._j_configuration)) env_settings =

Re: How to configure column width in Flink SQL client?

2021-06-08 Thread Ingo Bürk
Hi Svend, I think it definitely makes sense to open a JIRA issue for it to discuss it also with the people working on the SQL client. Thanks for taking care of this! Regards Ingo On Wed, Jun 9, 2021 at 7:25 AM Svend wrote: > Thanks for the feed-back Ingo, > > Do you think a PR would be

Re: How to configure column width in Flink SQL client?

2021-06-08 Thread Svend
Thanks for the feed-back Ingo, Do you think a PR would be welcome to make that parameter configurable? At the place where I work, UUID are often used as column values and they are 36 character longs => very often a very useful piece of information to us is not readable. I had a quick look,

Re: Add control mode for flink

2021-06-08 Thread Xintong Song
> > 2. There are two kinds of existing special elements, special stream > records (e.g. watermarks) and events (e.g. checkpoint barrier). They all > flow through the whole DAG, but events needs to be acknowledged by > downstream and can overtake records, while stream records are not). So I’m >

Re: Add control mode for flink

2021-06-08 Thread Steven Wu
> producing control events from JobMaster is similar to triggering a savepoint. Paul, here is what I see the difference. Upon job or jobmanager recovery, we don't need to recover and replay the savepoint trigger signal. On Tue, Jun 8, 2021 at 8:20 PM Paul Lam wrote: > +1 for this feature.

Re: Add control mode for flink

2021-06-08 Thread Paul Lam
+1 for this feature. Setting up a separate control stream is too much for many use cases, it would very helpful if users can leverage the built-in control flow of Flink. My 2 cents: 1. @Steven IMHO, producing control events from JobMaster is similar to triggering a savepoint. The REST api is

Re: Re: Add control mode for flink

2021-06-08 Thread Steven Wu
option 2 is probably not feasible, as checkpoint may take a long time or may fail. Option 1 might work, although it complicates the job recovery and checkpoint. After checkpoint completion, we need to clean up those control signals stored in HA service. On Tue, Jun 8, 2021 at 1:14 AM 刘建刚 wrote:

[table-walkthrough exception] Unable to create a source for reading table...

2021-06-08 Thread Lingfeng Pu
Hi, I'm following the tutorial to run the "flink-playground/table-walkthrough" project on IDEA. However, I got *the exception as follows:* Exception in thread "main" org.apache.flink.table.api.ValidationException: Unable to create a source for reading table

Re: Jupyter PyFlink Web UI

2021-06-08 Thread Dian Fu
Hi Macike, You could try if the following works: ``` table_env.get_config().get_configuration().set_string("rest.bind-port", "xxx") ``` Regards, Dian > 2021年6月8日 下午8:26,maverick 写道: > > Hi, > I've got a question. I'm running PyFlink code from Jupyter Notebook starting > TableEnvironment with

Re: ByteSerializationSchema in PyFlink

2021-06-08 Thread Dian Fu
Hi Wouter, Great to hear and thanks for the sharing! Regards, Dian > 2021年6月8日 下午4:44,Wouter Zorgdrager 写道: > > Hi Dian, all, > > The way I resolved right now, is to write my own custom serializer which only > maps from bytes to bytes. See the code below: > public class KafkaBytesSerializer

Re: sql client提交 flink任务失败

2021-06-08 Thread Shengkai Fang
可以看看之前的问题,看看能否解决。 Best, Shengkai [1] http://apache-flink.147419.n8.nabble.com/Flink-td7866.html [2] https://issues.apache.org/jira/browse/FLINK-20780 Fei Han 于2021年6月8日周二 下午8:03写道: > > @all: > Flink环境:Flink1.13.1 > HADOOP环境:CDH5.15.2 > 测试命令如下:./bin/sql-client.sh embedded -i

Re: 邮件退订

2021-06-08 Thread Zhiwen Sun
退订是发邮件到 user-zh-unsubscr...@flink.apache.org Zhiwen Sun On Tue, Jun 8, 2021 at 10:21 PM happiless wrote: > 您好,麻烦邮件退订一下 > > > 发自我的iPhone

????

2021-06-08 Thread on the way

回复:FlinkSQL over PARTITION BY窗口不同并行度计算结果不同

2021-06-08 Thread 18814118038
有没有大佬指点下,谢谢 回复的原邮件 | 发件人 | Num<18814118...@163.com> | | 发送日期 | 2021年06月08日 21:08 | | 收件人 | user-zh | | 主题 | FlinkSQL over PARTITION BY窗口不同并行度计算结果不同 | 大家好,我kafka中有一批数据,我在统计每个元素30分钟的count值时,发现当我设置不同并行度时,统计结果也不一样,请问应该以什么思路排查比较好 SELECT user, price, count(id) OVER (

Records Are Never Emitted in a Tumbling Event Window When Each Key Only Has One Record

2021-06-08 Thread Joseph Lorenzini
Hi all,   I have observed behavior joining two keyed streams together, where events are never emitted.  The source of each stream is a different kafka topic. I am curious to know if this expected and if there’s a way to work around it.   I am using a tumbling event window. All

Persisting state in RocksDB

2021-06-08 Thread Paul K Moore
Hi all, First post here, so please be kind :) Firstly some context; I have the following high-level job topology: (1) FlinkPulsarSource -> (2) RichAsyncFunction -> (3) SinkFunction 1. The FlinkPulsarSource reads event notifications about article updates from a Pulsar topic 2. The

Re: Questions about implementing a flink source

2021-06-08 Thread Evan Palmer
Hello again, Thank you for all of your help so far, I have a few more questions if you have the time :) 1. Deserialization Schema There's been some debate within my team about whether we should offer a DeserializationSchema and SerializationSchema in our source and sink. If we include don't

????

2021-06-08 Thread gaopeng

????

2021-06-08 Thread ????

2 weeks left to submit your talks for Flink Forward Global 2021!

2021-06-08 Thread Caito Scherr
Hi there, The Call for Presentations [1] for Flink Forward Global 2021 closes in just 2 weeks on Monday, June 21! Are you working on an inspiring Flink story, real-world application, or use case? Now is a good time to finalize and submit your talk ideas to get the chance to present them to the

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-08 Thread Kezhu Wang
Could it be same as FLINK-21028[1] (titled as “Streaming application didn’t stop properly”, fixed in 1.11.4, 1.12.2, 1.13.0) ? [1]: https://issues.apache.org/jira/browse/FLINK-21028 Best, Kezhu Wang On June 8, 2021 at 22:54:10, Yun Gao (yungao...@aliyun.com) wrote: Hi Thomas, I tried but do

Re: Using s3 bucket for high availability

2021-06-08 Thread Kurtis Walker
Sorry, fat finger send before I finished writing…. Hello, I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config: kubernetes.service-account: flink-service-account high-availability:

Re: Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-08 Thread Yun Gao
Hi Chirag, As far as I know, If you are running a single job, I think all th pods share the same state.checkpoints.dir configuration should be as expected, and it is not necessary to configuraiton the rocksdb local dir since Flink will chosen a default dir. Regarding the latest exception, I

Using s3 bucket for high availability

2021-06-08 Thread Kurtis Walker
Hello, I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-08 Thread Yun Gao
Hi Thomas, I tried but do not re-produce the exception yet. I have filed an issue for the exception first [1]. [1] https://issues.apache.org/jira/browse/FLINK-22928 --Original Mail -- Sender:Thomas Wang Send Date:Tue Jun 8 07:45:52 2021 Recipients:Yun Gao

Re: State migration for sql job

2021-06-08 Thread aitozi
Thanks for JING & Kurt's reply. I think we prefer to choose the option (a) that will not take the history data into account. IMO, if we want to process all the historical data, we have to store the original data, which may be a big overhead to backend. But if we just aggregate after the new

Re: Query related to Minimum scrape interval for Prometheus and fetching metrics of all vertices in a job through Flink Rest API

2021-06-08 Thread Chesnay Schepler
There is no recommended scrape interval because it is largely dependent on your requirements. For example, if you're fine with reacting to problems within an hour, then a 5s scrape interval doesn't make sense. The lower the interval the more resources must of course be spent on serving the

邮件退订

2021-06-08 Thread happiless
您好,麻烦邮件退订一下 发自我的iPhone

Re: Allow setting job name when using StatementSet

2021-06-08 Thread Yuval Itzchakov
Yup, that worked. Thank you guys for pointing it out! On Tue, Jun 8, 2021, 09:33 JING ZHANG wrote: > I agree with Nico, I just add the link of pipeline.name > > here. > > Nicolaus Weidner 于2021年6月7日周一 >

FlinkSQL over PARTITION BY窗口不同并行度计算结果不同

2021-06-08 Thread Num
大家好,我kafka中有一批数据,我在统计每个元素30分钟的count值时,发现当我设置不同并行度时,统计结果也不一样,请问应该以什么思路排查比较好 SELECT user, price, count(id) OVER ( PARTITION BY user ORDER BY actionTime RANGE BETWEEN INTERVAL '30' MINUTE preceding AND CURRENT ROW) AS c FROM kafkaTable;

Re: Query related to Minimum scrape interval for Prometheus and fetching metrics of all vertices in a job through Flink Rest API

2021-06-08 Thread Ashutosh Uttam
Thanks Matthias. We are using Prometheus for fetching metrics. Is there any recommended scrape interval ? Also is there any impact if lower scrape intervals are used? Regards, Ashutosh On Fri, May 28, 2021 at 7:17 PM Matthias Pohl wrote: > Hi Ashutosh, > you can set the metrics update

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

2021-06-08 Thread Till Rohrmann
Great :-) On Tue, Jun 8, 2021 at 1:11 PM Yingjie Cao wrote: > Hi Till, > > Thanks for the suggestion. The blog post is already on the way. > > Best, > Yingjie > > Till Rohrmann 于2021年6月8日周二 下午5:30写道: > >> Thanks for the update Yingjie. Would it make sense to write a short blog >> post about

Jupyter PyFlink Web UI

2021-06-08 Thread maverick
Hi, I've got a question. I'm running PyFlink code from Jupyter Notebook starting TableEnvironment with following code: env_settings = EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build() table_env = TableEnvironment.create(env_settings) How can I enable Web UI in

Re: flink sql cdc数据同步至mysql

2021-06-08 Thread Leonard Xu
试着回答下这两个问题。 > flink 1.12的jdbc connector不支持 sink.parallism 参数,是不是说只能和上游cdc > connector保持相同的并行度1?如果是的话,下游sink会有瓶颈,因为表中存有大量的历史数据,将存量数据同步到目标表需要花费不少时间,而新的业务数据也源源不断进来,导致追不上。这要如何解决? 是的,关键问题是cdc connector为了保证数据一致性只能单并发,所以作业也只能单并发。这个需要cdc connector支持多并发读取,下游sink自然就能解决。 > flink 1.13的jdbc connector新增

Re: How to unsubscribe?

2021-06-08 Thread Leonard Xu
Hi, Morgan Just send an email with any content to user-unsubscr...@flink.apache.org will unsubscribe the mail from Flink user mail list. And also send an email to with any content to dev-unsubscr...@flink.apache.org

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

2021-06-08 Thread Yingjie Cao
Hi Till, Thanks for the suggestion. The blog post is already on the way. Best, Yingjie Till Rohrmann 于2021年6月8日周二 下午5:30写道: > Thanks for the update Yingjie. Would it make sense to write a short blog > post about this feature including some performance improvement numbers? I > think this could

????????????????????socket??????????????

2021-06-08 Thread zdj
rocketMq??flink1.11.2 ??rocketmq source org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator at

Flink on yarn-cluster模式提交任务报错

2021-06-08 Thread maker_d...@foxmail.com
我在CDH集群上使用Flink on yarn-cluster模式提交任务,报错不能部署,找不到jar包。 这个jar包是我没有用到的,但是在flink的lib中是存在的,并且我已经将lib的目录添加到环境变量中: export HADOOP_CLASSPATH=/opt/cloudera/parcels/FLINK/lib/flink/lib The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: The main

How to unsubscribe?

2021-06-08 Thread Geldenhuys , Morgan Karl
How can I unsubscribe to this mailing lists? The volume of is just getting too much at the moment. Following the steps described in the website (https://flink.apache.org/community.html) did not appear to do anything. Sorry for the spam and thanks in advance.

Re: FlinkSQL cannot update pk column UID to expr

2021-06-08 Thread WeiXubin
我把表移动到了普通 MYSQL,可以正常运行。 经过排查应该是 ADB 建表 SQL 中的 DISTRIBUTE BY HASH(`uid`) 所导致,该语法用于处理数据倾斜问题。 看起来似乎是联表 join 的时候要求定义主键,但是定义主键后会转换为 upsert 流,而 ADB 中定义了 DISTRIBUTE BY 与 upsert 冲突了,不知道是否这么理解 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-08 Thread Chirag Dewan
Hi, Although this looks like a problem to me, I still cant conclude it.  I tried reducing my TM replicas from 2 to 1 with 4 slots and 4 cores each. I was hoping that with single TM there will be file write conflicts. But that doesn't seem to be the case as still get the: Caused by:

Re: State migration for sql job

2021-06-08 Thread Kurt Young
What kind of expectation do you have after you add the "max(a)" aggregation: a. Keep summing a and start to calculate max(a) after you added. In other words, max(a) won't take the history data into account. b. First process all the historical data to get a result of max(a), and then start to

Re: [DISCUSS] FLIP-148: Introduce Sort-Merge Based Blocking Shuffle to Flink

2021-06-08 Thread Till Rohrmann
Thanks for the update Yingjie. Would it make sense to write a short blog post about this feature including some performance improvement numbers? I think this could be interesting to our users. Cheers, Till On Mon, Jun 7, 2021 at 4:49 AM Jingsong Li wrote: > Thanks Yingjie for the great effort!

Re: FlinkSQL cannot update pk column UID to expr

2021-06-08 Thread WeiXubin
详细的异常打印信息如下: java.sql.BatchUpdateException: [3, 2021060816420017201616500303151172306] cannot update pk column UID to expr at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at

FlinkSQL cannot update pk column UID to expr

2021-06-08 Thread WeiXubin
基础场景: 从 KafkaSource 输入数据,输出到 sinktable, 期间 Left join 关联 DimTable 维表。 Flink 版本 1.12.2 场景1:当把 sinktable 设置为 'connector' = 'print' ,不设置任何主键,可正常关联输出 场景2:当把 sinktable 设置为 'connector' = 'mysql' 则会要求加上 primary key 场景3:在 sinktable 加上 PRIMARY KEY (uid, platform,root_game_id) NOT ENFORCED 则报错,主要报错信息:

Re: Re: 【问题分析】Fink任务无限反压

2021-06-08 Thread yidan zhao
目前遇到个报错,这个有人看得懂不。 org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: readAddress(..) failed: Connection timed out (connection to '10.35.213.143/10.35.213.143:2008') at

Re: ByteSerializationSchema in PyFlink

2021-06-08 Thread Wouter Zorgdrager
Hi Dian, all, The way I resolved right now, is to write my own custom serializer which only maps from bytes to bytes. See the code below: public class KafkaBytesSerializer implements SerializationSchema, DeserializationSchema { @Override public byte[] deserialize(byte[] bytes) throws

Re: Re: Add control mode for flink

2021-06-08 Thread 刘建刚
Thanks for the reply. It is a good question. There are multi choices as follows: 1. We can persist control signals in HighAvailabilityServices and replay them after failover. 2. Only tell the users that the control signals take effect after they are checkpointed. Steven Wu [via

Re:Re: 【问题分析】Fink任务无限反压

2021-06-08 Thread 13631283359
1.不知道这边有没有flink数据链路监控 2.flink反压可能有以下几种原因: 数据倾斜 TaskManager配置内存太小,导致full gc checkpoint太慢 状态太大 在 2021-06-08 10:36:49,"LakeShen" 写道: >你可以先结合你的任务逻辑,以及 Flink Web UI 反压监控,看看到底是什么地方引起反压。 >一般 source 反压,是下游的算子一直反压到 source。可以看看那个算子引起 > >Best, >LakeShen > > >yidan zhao 于2021年6月8日周二

Flink WebUI 重定向taskmanager的log与stdout的显示路径

2021-06-08 Thread jqwang
Flink WebUI默认显示的log和stdout分别显示taskmanager.log与taskmanager.out,但现在希望WebUI能显示其他路径下文件的内容。例如我在路径/home/user/路径下有个test.log文件,希望在WebUI,taskmanager.log里显示test.log的内容。请问有没有办法实现呢?或者有办法taskmanager.log或.out生成在固定的路径下吗? -- Sent from: http://apache-flink.147419.n8.nabble.com/

flinksql ttl不生效

2021-06-08 Thread chenchencc
版本:1.12.2 sql: SELECT id, name, message,ts SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY ts DESC) AS rowNum FROM persons_message_table_kafka WHERE rowNum = 1 过期时间设置:tableEnv.getConfig().setIdleStateRetention(Duration.ofhour(3));

Re: Corrupted unaligned checkpoints in Flink 1.11.1

2021-06-08 Thread Piotr Nowojski
Re-adding user mailing list Hey Alex, In that case I can see two scenarios that could lead to missing files. Keep in mind that incremental checkpoints are referencing previous checkpoints in order to minimise the size of the checkpoint (roughly speaking only changes since the previous checkpoint

Re: Allow setting job name when using StatementSet

2021-06-08 Thread JING ZHANG
I agree with Nico, I just add the link of pipeline.name here. Nicolaus Weidner 于2021年6月7日周一 下午11:46写道: > Hi Yuval, > > I am not familiar with the Table API, but in the fragment you posted, the > generated

Re: State migration for sql job

2021-06-08 Thread JING ZHANG
Hi aitozi, This is a popular demand that many users mentioned, which appears in user mail list for several times. Unfortunately, it is not supported by Flink SQL yet, maybe would be solved in the future. BTW, a few company try to solve the problem in some specified user cases on their internal

Re: Re: Add control mode for flink

2021-06-08 Thread Steven Wu
I can see the benefits of control flow. E.g., it might help the old (and inactive) FLIP-17 side input. I would suggest that we add more details of some of the potential use cases. Here is one mismatch with using control flow for dynamic config. Dynamic config is typically targeted/loaded by one

flink sql cdc数据同步至mysql

2021-06-08 Thread casel.chen
flink 1.12的jdbc connector不支持 sink.parallism 参数,是不是说只能和上游cdc connector保持相同的并行度1?如果是的话,下游sink会有瓶颈,因为表中存有大量的历史数据,将存量数据同步到目标表需要花费不少时间,而新的业务数据也源源不断进来,导致追不上。这要如何解决? flink 1.13的jdbc connector新增 sink.parallism 参数,问题是如果增大并行度的话能否确保同一个主键记录发到同一个sink task呢?SQL正确的写法是什么?

Re: Flink sql 状态过期后,checkpoint 大小没变化

2021-06-08 Thread chenchencc
你好,我也遇到这个问题,flink 1.12.2 sql,想问下 1.有什么方式能本地物理上删除那些ttl过期的数据吗 2.有什么方式能checkpoint时候删除ttl过期的数据吗?让checkpoint数据不再继续增长? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Re: Add control mode for flink

2021-06-08 Thread Xintong Song
+1 on separating the effort into two steps: 1. Introduce a common control flow framework, with flexible interfaces for generating / reacting to control messages for various purposes. 2. Features that leverating the control flow can be worked on concurrently Meantime, keeping