Re: DataStream API: Parquet File Format with Scala Case Classes

2022-02-24 Thread Fabian Paul
Hi Ryan, I guess the ticket you are looking for is the following [1]. AFAIK the work on it hasn't started yet. So we are still appreciating initial designs or ideas. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-25416 On Tue, Feb 22, 2022 at 11:54 PM Ryan van Huuksloot <

Re: java.lang.Exception: Job leader for job id 0efd8681eda64b072b72baef58722bc0 lost leadership.

2022-02-24 Thread Nicolaus Weidner
Hi Jai, Do writes to ValueStates/MapStates have a direct on churn of the Flink > State or is the data buffered in between? > Writes to keyed state go directly to RocksDB. So there shouldn't be any memory issues with buffers overflowing or similar. In general, more memory should increase

Re: Flink metrics via permethous or opentelemerty

2022-02-24 Thread Nicolaus Weidner
Hi Sigalit, first of all, have you read the docs page on metrics [1], and in particular the Prometheus section on metrics reporters [2]? Apart from that, there is also a (somewhat older) blog post about integrating Flink with Prometheus, including a link to a repo with example code [3]. Hope

Re: Flink job recovery after task manager failure

2022-02-24 Thread Afek, Ifat (Nokia - IL/Kfar Sava)
Thanks Zhilong. The first launch of our job is fast, I don’t think that’s the issue. I see in flink job manager log that there were several exceptions during the restart, and the task manager was restarted a few times until it was stabilized. You can find the log here:

Re: Flink Statefun and Feature computation

2022-02-24 Thread Igal Shilman
Hello, For (1) I welcome you to visit our documentions, and many talks online to understand more about the motivation and the value of StateFun. I can say in a nutshell that StateFun provides few building blocks that makes building distributed stateful applications easier. For (2) checkout our

Re: Flink job recovery after task manager failure

2022-02-24 Thread Zhilong Hong
Hi, Afek I've read the log you provided. Since you've set the value of restart-strategy to be exponential-delay and the value of restart-strategy.exponential-delay.initial-backoff is 10s, everytime a failover is triggered, the JobManager will have to wait for 10 seconds before it restarts the

Possible BUG in 1.15 SQL JSON_OBJECT()

2022-02-24 Thread Jonathan Weaver
Using the latest SNAPSHOT BUILD. If I have a column definition as .column( "events", DataTypes.ARRAY( DataTypes.ROW( DataTypes.FIELD("status", DataTypes.STRING().notNull()),

Re: [Flink-1.14.3] Restart of pod due to duplicatejob submission

2022-02-24 Thread Yang Wang
This might be related with FLINK-21928 and seems already fixed in 1.14.0. But it will have some limitations and users need to manually clean up the HA entries. Best, Yang Parag Somani 于2022年2月24日周四 13:42写道: > Hello, > > Recently due to log4j vulnerabilities, we have upgraded to Apache Flink >

pyflink object to java object

2022-02-24 Thread Francis Conroy
Hi all, we're using pyflink for most of our flink work and are sometimes into a java process function. Our new java process function takes an argument in in the constructor which is a Row containing default values. I've declared my Row in pyflink like this: default_row = Row(ep_uuid="",

streaming mode with both finite and infinite input sources

2022-02-24 Thread Jin Yi
so we have a streaming job where the main work to be done is processing infinite kafka sources. recently, i added a fromCollection (finite) source to simply write some state once upon startup. this all seems to work fine. the finite source operators all finish, while all the infinite source

reserved key percentile_cont

2022-02-24 Thread ZHANG YU
hi 我是flink新手,想问下reserved keyword percentile_cont和percentile_disc在可以预见的版本会实现吗?在此之前大家是怎么处理这个计算需求的呀? 获取 Outlook for iOS

Re: Flink数据写入HDFS

2022-02-24 Thread wenjie li
1. 比较简单的思路是可以吧hdfs文件弄成hive表,然后使用 类似下面sql : set hive.merge.mapredfiles = true //在Map-Reduce的任务结束时合并小文件 set hive.merge.size.per.task = 256*1000*1000 //合并文件的大小 set hive.merge.smallfiles.avgsize=1600 insert overwrite table_name select * from table_name1 2. 直接通过spark的coalesce()方法和repartition()方法

flink1.14 注册mysql connector报错

2022-02-24 Thread xiaoyue
flink1.14 注册mysql下车Connector报错,检查多次未发现语法错误,求助! 代码: env = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings Settings = EnvironmentSettings.newInstance().inBatchMode().build(); tEnv = StreamTableEnvironment.create(env, Settings);

Re: Re: flink1.14 注册mysql connector报错

2022-02-24 Thread xiaoyue
好的,成功入库,非常感谢您! xiao...@ysstech.com 发件人: Tony Wei 发送时间: 2022-02-25 14:57 收件人: user-zh 主题: Re: Re: flink1.14 注册mysql connector报错 Hi xiaoyue, 看起來是這行造成的 tEnv.getConfig().setSqlDialect(SqlDialect.HIVE); 你可能需要在執行下沉操作前將 SqlDialect 更換回 SqlDialect.DEFAULT。 best regards, xiaoyue 於 2022年2月25日 週五

Re: Re: flink1.14 注册mysql connector报错

2022-02-24 Thread xiaoyue
Hi tony, 完整代码,是从hive取数据,执行flatmap, aggregate操作后再下沉到mysql。 由于篇幅, 中间的udf function定义过程不完整贴出了,您可以参考下,非常感谢您的帮助,麻烦啦。 代码: # 执行环境 env = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings Settings =

Re: flink1.14 注册mysql connector报错

2022-02-24 Thread Tony Wei
Hi xiaoyue, 請問你的測試代碼有其他額外的代碼或配置嗎?能否分享下完整的代碼和配置文件? 我嘗試在我的 IDE 上運行了下列代碼是能成功運行的。 public static void main(String[] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings Settings =

Re:回复:hive 进行 overwrite 合并数据后文件变大?

2022-02-24 Thread RS
感谢,确定了下是压缩格式的问题, 原hive文件的压缩是SNAPPY压缩,使用Flink SQL合并小文件之后,默认不压缩,导致文件变大了。 Flink默认没有继承原文件的压缩算法。。。 在 2022-02-22 12:08:39,"‪junjie.m...@goupwith.com‬" 写道: 检查下数据格式和压缩格式是否和之前的不一致 原始邮件 发件人: RS 日期: 2022年2月22日周二 09:35 收件人: user-zh@flink.apache.org 主 题: hive 进行 overwrite 合并数据后文件变大?

Re: Re: flink1.14 注册mysql connector报错

2022-02-24 Thread Tony Wei
Hi xiaoyue, 看起來是這行造成的 tEnv.getConfig().setSqlDialect(SqlDialect.HIVE); 你可能需要在執行下沉操作前將 SqlDialect 更換回 SqlDialect.DEFAULT。 best regards, xiaoyue 於 2022年2月25日 週五 下午2:36寫道: > Hi tony, >完整代码,是从hive取数据,执行flatmap, aggregate操作后再下沉到mysql。 由于篇幅, 中间的udf > function定义过程不完整贴出了,您可以参考下,非常感谢您的帮助,麻烦啦。 > >