Re: Issue with Flink SQL using RocksDB backend

2021-07-26 Thread Yuval Itzchakov
Hi Jing, Yes, FIRST is a UDAF. I've been trying to reproduce this locally without success so far. The query itself has more fields and aggregates. Once I can reproduce this locally I'll try to narrow down the problematic field and share more information. On Tue, Jul 27, 2021, 05:17 JING ZHANG

Re: as-variable configuration for state ac

2021-07-26 Thread Yun Tang
Hi Mason, In rocksDB, one state is corresponding to a column family and we could aggregate all RocksDB native metrics per column family. If my understanding is right, are you hoping that all state latency metrics for a particular state could be aggregated per state level? Best Yun Tang

Re: Re: filesystem table parquet 滚动问题

2021-07-26 Thread Jingsong Li
是的,类似的 On Tue, Jul 27, 2021 at 10:42 AM lixin58...@163.com wrote: > 你好, > > 感谢回复,看了下这个文档,提到对于parquet这种列式文件只会使用onCheckpointRollPolicy,也就是只在做检查点时会滚动。flink > filesystem table这块的parquet列式文件写入是不是也这样呢? > >

Re: Re: filesystem table parquet 滚动问题

2021-07-26 Thread lixin58...@163.com
你好, 感谢回复,看了下这个文档,提到对于parquet这种列式文件只会使用onCheckpointRollPolicy,也就是只在做检查点时会滚动。flink filesystem table这块的parquet列式文件写入是不是也这样呢? https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/connectors/streamfile_sink.html lixin58...@163.com 发件人: Jingsong Li 发送时间: 2021-07-27 10:30

Re: filesystem table parquet 滚动问题

2021-07-26 Thread Jingsong Li
parquet因为它会在内存中攒buffer,所以文件的file-size并不能很精确。。只能等它flush了才会生效。 On Sun, Jul 25, 2021 at 9:47 AM lixin58...@163.com wrote: > 大家好, > > 检查点配的是120s,滚动时长800s,滚动大小1kb,并行度配的2 > > > 不过在跑的过程中发现不管写入的多快,同时只存在一个in-progress文件,且最终生成的文件是严格按照120s生成的,这个很奇怪,似乎只有按检查点滚动生效了,与json格式的不一样。真的是这样吗?不过看官方文档没有这样说 > > 求大佬们解惑!

回复:Flink kafka自定义metrics在influxdb上解析失败

2021-07-26 Thread Jimmy Zhang
Hi,caizhi,非常感谢你的回复! 在KafkaDynamicTableFactory.java的createDynamicTableSink(Context context)方法开始,我通过context.getObjectIdentifier().getObjectName()获取到sinkTableName。因为ObjectIdentifier类就唯一标识了一个表,它包括catalogName、databaseName和objectName,分别代表catalog名、数据库名和表名。之后,我通过构造传入到了FlinkKafkaProducer,然后就可以使用了。

Re: TaskManager crash after cancelling a job

2021-07-26 Thread Yangze Guo
Hi, Ivan My gut feeling is that it is related to FLINK-22535. Could @Yun Gao take another look? If that is the case, you can upgrade to 1.13.1. Best, Yangze Guo On Tue, Jul 27, 2021 at 9:41 AM Ivan Yang wrote: > > Dear Flink experts, > > We recently ran into an issue during a job cancellation

Re: Issue with Flink SQL using RocksDB backend

2021-07-26 Thread JING ZHANG
Hi Yuval, I run a similar SQL (without `FIRST` aggregate function), there is nothing wrong. `FIRST` is a custom aggregate function? Would you please check if there is a drawback in `FIRST`? Whether the query could run without `FIRST`? Best, JING ZHANG Yuval Itzchakov 于2021年7月27日周二 上午12:29写道: >

TaskManager crash after cancelling a job

2021-07-26 Thread Ivan Yang
Dear Flink experts, We recently ran into an issue during a job cancellation after upgraded to 1.13. After we issue a cancel (from Flink console or flink cancel {jobid}), a few subtasks stuck in cancelling state. Once it gets to that situation, the behavior is consistent. Those “cancelling

as-variable configuration for state ac

2021-07-26 Thread Mason Chen
We have been using the state backend latency tracking metrics from Flink 1.13. To make metrics aggregation easier, could there be a config to expose something like `state.backend.rocksdb.metrics.column-family-as-variable` that rocksdb provides to do aggregation across column families. In this

Issue with Flink SQL using RocksDB backend

2021-07-26 Thread Yuval Itzchakov
Hi, *Setup:* 1 JM, 1 TM, Flink 1.13.1 RocksDBStateBackend. I have a query with the rough sketch of the following: SELECT CAST(TUMBLE_START(event_time, INTERVAL '2' MINUTE) AS TIMESTAMP) START_TIME CAST(TUMBLE_END(event_time, INTERVAL '2' MINUTE) AS TIMESTAMP) END_TIME

Re: ClassNotFoundException: org.apache.kafka.clients.consumer.ConsumerRecord

2021-07-26 Thread Roman Khachatryan
Hi, It is recommended to package your application with all the dependencies into a single file [1]. And according to the kafka-connector documentation [2]: if you are using Kafka source, flink-connector-base is also required as dependency: org.apache.flink flink-connector-base

Re: Queryable State Lookup Failure

2021-07-26 Thread Roman Khachatryan
Hello, Could you check that TMs didn't fail and therefore unregistered KV states and are still running at the time of the query? Probably after changing the memory settings there is another error that is reported later than the state is unregistered. Regards, Roman On Sat, Jul 24, 2021 at 12:50

Re: flink 1.13.1 sql hive is_generic = false 建表成功后,没有列信息

2021-07-26 Thread Rui Li
1.13不再使用is_generic来标记是不是hive表了 (改成了'connector'='hive') ,文档需要更新一下。不过还是建议DDL操作hive元数据时用hive dialect。 On Mon, Jul 26, 2021 at 5:00 PM Asahi Lee <978466...@qq.com.invalid> wrote: > 我使用flink 1.12.0版本功能是好的 > > > > > --原始邮件-- > 发件人: >

Re:Re:Re:请教各位大佬,flink读取hdfs文件时的疑问

2021-07-26 Thread wanggaoliang
确切的说我已经遇到了这种问题,就是读入了penging或者in-progress状态的文件,然后就报FileNotFoundException了, 是偶现的。那这种情况下除了重新编译FileInputFormat.java文件,过滤掉这两种状态的文件外,flink中还有没有提供什么其他解决办法? 在 2021-07-26 16:34:19,"Michael Ran" 写道: 你尝试下 用java 读取文件过程中,改文件名 或者移动文件。会异常 在 2021-07-26 15:19:39,"wanggaoliang" <18838915...@163.com> 写道:

?????? flink 1.13.1 sql hive is_generic = false ??????????????????????

2021-07-26 Thread Asahi Lee
??flink 1.12.0?? ---- ??: "user-zh"

Re: flink 1.13.1 sql hive is_generic = false 建表成功后,没有列信息

2021-07-26 Thread Rui Li
你好, 是否能尝试一下用hive dialect建表呢? On Mon, Jul 26, 2021 at 2:44 PM Asahi Lee <978466...@qq.com.invalid> wrote: > hi! > 我使用flink 1.13.1版本,通过sql创建hive表,程序正常运行,表创建成功,但是没有列信息;我的程序如下: > 我连接的是hive 2.3.6版本,使用flink-sql-connector-hive-2.3.6依赖包。 > > > package com.meritdata.cloud.flink.test; > > > import

Re:回复:回复:flink sql 依赖隔离

2021-07-26 Thread Michael Ran
额,混用多个UDF没法,本身就依赖冲突了,一般公共的UDF我们都统一的。 私人的和 公共的冲突了- -再单独和他们讨论 在 2021-07-26 11:43:49,"silence" 写道: >就是单独引用的啊,但任务逻辑比较复杂时会同时混用多个udf这个是没法避免的啊 > > >-- >发件人:Michael Ran >发送时间:2021年7月23日(星期五) 17:42 >收件人:user-zh ; silence >主 题:Re:回复:flink

Re:Re:请教各位大佬,flink读取hdfs文件时的疑问

2021-07-26 Thread Michael Ran
你尝试下 用java 读取文件过程中,改文件名 或者移动文件。会异常 在 2021-07-26 15:19:39,"wanggaoliang" <18838915...@163.com> 写道: 在 2021-07-26 14:56:00,"wanggaoliang" <18838915...@163.com> 写道: flink读取hdfs文件时,FileInputFormat.java类中,acceptFile()方法默认过滤掉了以"_"和"."开头的文件,那如果读入了in-progress文件和.pending文件,

sql-client 提交job 疑问

2021-07-26 Thread Peihui He
Hi,all 使用zeppelin提交sqljob的时候发现在 flink jobmanager 中首先会打印以下日志, 2021-07-26 15:54:36,929 [Thread-6575] INFO org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient

Re: FlinkKinesis consumer

2021-07-26 Thread Danny Cranmer
Hello, If you are consuming from a single stream, you can use the shard ID to achieve a better distribution. Since the shard IDs are assigned incrementally like so: - shardId- - shardId-0001 - shardId-0002 - etc You can substring the prefix and

请教各位大佬,flink读取hdfs文件时的疑问

2021-07-26 Thread wanggaoliang
flink读取hdfs文件时,FileInputFormat.java类中,acceptFile()方法默认过滤掉了以"_"和"."开头的文件,那如果读入了in-progress文件和.pending文件, 在执行open()方法时,突然in-progress或.pending文件状态发生了改变而导致原来的文件路径消失,会不会出现什么问题?或者是有没有可能出现这种情况?

flink 1.13.1 sql hive is_generic = false ??????????????????????

2021-07-26 Thread Asahi Lee
hi! ??flink 1.13.1??sqlhive ??hive 2.3.6??flink-sql-connector-hive-2.3.6 package com.meritdata.cloud.flink.test; import org.apache.flink.table.api.EnvironmentSettings; import