回复: flinksql接收到字段值格式为2020-09-23T20:58:24+08:00,如何转成TIMESTAMP

2020-09-23 Thread Joker
不好意思,插入个问题。ts AS TO_TIMESTAMP(FROM_UNIXTIME(create_time / 1000, '-MM-dd HH:mm:ss')) ,我按此方式生成事件时间列,发现watermark一直比北京时间多8小时,比如create_time 为1600926591666,ts计算出来是2020/9/24 13:49:51没问题,但在WebUI上发现提取的watermark为2020/9/24 21:49:51 | | Joker | | gaojintao...@163.com | 签名由网易邮箱大师定制 在2020年09月24日

Re: flinksql接收到字段值格式为2020-09-23T20:58:24+08:00,如何转成TIMESTAMP

2020-09-23 Thread Jark Wu
Flink 的 TO_TIMESTAMP 函数用的是 Java SimpleDateFormat 来解析时间格式的,所以可以看下 SimpleDateFormat 的 javadoc。 你可以试下 to_timestamp('2020-09-23T20:58:24+08:00', '-MM-dd''T''HH:mm:ssXXX') 来解析你的数据。 Best, Jark On Wed, 23 Sep 2020 at 21:08, chenxuying wrote: > flinksql 版本是1.11.2 > source接收到字段是字符串类型的时间 > CREATE

Re: How to disconnect taskmanager via rest api?

2020-09-23 Thread Yang Wang
I think this is an interesting feature, especially when deploying Flink standalone clusters on K8s. The TaskManager pods are started/stopped externally via kubectl or other tools. When we need to stop a TaskManager pod, even though the pod is deleted quickly, we have to wait for a timeout so that

Re: [DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-23 Thread nashcen
+1 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-23 Thread Xintong Song
How many slots do you have on each task manager? Flink uses ChildFirstClassLoader for loading user codes, to avoid dependency conflicts between user codes and Flink's framework. Ideally, after a slot is freed and reassigned to a new job, the user class loaders of the previous job should be

Re: Error on deploying Flink docker image with Kubernetes (minikube) and automatically launch a stream WordCount job.

2020-09-23 Thread Yang Wang
Hi Felipe, Currently, if you want to deploy a standalone job/application Flink cluster on K8s via yamls. You should have your own image with user jar baked located at /opt/flink/usrlib. It could not be specified via config option. Usually, you could add new layer on the official docker image to

Re: [DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-23 Thread Jark Wu
+1 to move it there. On Thu, 24 Sep 2020 at 12:16, Jingsong Li wrote: > Hi devs and users: > > After the 1.11 release, I heard some voices recently: How can't Hive's > documents be found in the "Table & SQL Connectors". > > Actually, Hive's documents are in the "Table API & SQL". Since the

Re: Adaptive load balancing

2020-09-23 Thread Zhijiang
Hi Krishnan, Thanks for discussing this interesting scenario! It makes me remind of a previous pending improvement of adaptive load balance for rebalance partitioner. Since the rebalance mode can emit the data to any nodes without precision consideration, then the data can be emitted based

Re: [DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-23 Thread Jark Wu
+1 to move it there. On Thu, 24 Sep 2020 at 12:16, Jingsong Li wrote: > Hi devs and users: > > After the 1.11 release, I heard some voices recently: How can't Hive's > documents be found in the "Table & SQL Connectors". > > Actually, Hive's documents are in the "Table API & SQL". Since the

[DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-23 Thread Jingsong Li
Hi devs and users: After the 1.11 release, I heard some voices recently: How can't Hive's documents be found in the "Table & SQL Connectors". Actually, Hive's documents are in the "Table API & SQL". Since the "Table & SQL Connectors" document was extracted separately, Hive is a little out of

[DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-23 Thread Jingsong Li
Hi devs and users: After the 1.11 release, I heard some voices recently: How can't Hive's documents be found in the "Table & SQL Connectors". Actually, Hive's documents are in the "Table API & SQL". Since the "Table & SQL Connectors" document was extracted separately, Hive is a little out of

Re: Flink-1.11 sql-client yaml 配置问题

2020-09-23 Thread Rui Li
你好,这个感觉是缺少hive connector的依赖,lib下面添加了哪些jar呢? On Thu, Sep 24, 2020 at 11:00 AM nashcen <2415370...@qq.com> wrote: > 准备通过 命令行工具 $FLINK_HOME/bin/sql-client.sh embedded > 登录 Flink SQL 客户端 去连接 Hive, > > > 我在 Flink-SQL 的配置文件 sql-client-defaults.yaml 里, > 加入了以下参数 > catalogs: > - name: myhive >

Re: kafka增加字段,hive表如何处理

2020-09-23 Thread Rui Li
Hi, 直接给hive表增加字段遇到的具体问题是什么呢?把stacktrace贴一下吧。 On Wed, Sep 23, 2020 at 6:50 PM china_tao wrote: > flink1.11.1,flink sql,已经实现flink sql > 读取kafka,存储到hive。现在的问题是,kafka源增加字段了,flink > sql中的hive如何修改。直接在hive中增加字段的话,每次启动,会报 hive表已经存在,如果drop table if > exists的话,历史数据就会丢。请问大家是如何处理的,谢谢。 > > > > -- > Sent

flink on yarn NM JVM内存

2020-09-23 Thread superainbower
Hi, 大家好 我有个flink任务在yarn上跑,statebackend是rocksdb,由于是测试,所以一段时间内我反复起停了任务,后来我发现在Yarn集群的NodeManger出现GC时间超出阈值报警(没有其他错误日志),此时我查看对应节点的 NodeManger的JVM堆内存几乎占满了(1.5G),从曲线图上看整个堆内存是逐步增加的(和我测试Flink任务的时间基本吻合),GC持续达到30多秒,把flink任务停止后,JVM堆内存始终下不来,只能重启Yarn集群; 想请教大家,flink on yarn给了 taskmanger的内存

?????? flink sql????????

2020-09-23 Thread ang
benchao??config??sql?? ---- ??: "user-zh"

Flink-1.11 sql-client yaml 配置问题

2020-09-23 Thread nashcen
准备通过 命令行工具 $FLINK_HOME/bin/sql-client.sh embedded 登录 Flink SQL 客户端 去连接 Hive, 我在 Flink-SQL 的配置文件 sql-client-defaults.yaml 里, 加入了以下参数 catalogs: - name: myhive type: hive hive-conf-dir: /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hive/conf default-database: dc_stg

答复: 回复:FlinkKafkaConsumer on Yarn 模式下 设置并行度无法提高kafka的消费速度,但是提交两个应用却可以

2020-09-23 Thread 范超
感谢磊哥,后来发现确实是这个问题导致。 Source节点的并行度取决于topic的分区数 -邮件原件- 发件人: 吴磊 [mailto:wuleifl...@foxmail.com] 发送时间: 2020年9月18日 星期五 16:29 收件人: user-zh 主题: 回复:FlinkKafkaConsumer on Yarn 模式下 设置并行度无法提高kafka的消费速度,但是提交两个应用却可以 hello,Source节点并行度的有效性是取决于topic对应的分区数的。比如如果你只有6个分区,那你12个并行度和6个并行度的消费速度是一样的。

答复: FlinkKafkaConsumer on Yarn 模式下 设置并行度无法提高kafka的消费速度,但是提交两个应用却可以

2020-09-23 Thread 范超
谢谢Benchao哥回复。 这几天一直忙着压测这个问题。 经多轮压测(先灌满kafka数据),再去消费。 发现确实是您说的问题中的第三个情况 由于kafka的topic只开了一个partition 所以flinkkafkaconsumer按照一个taskmanger对应了一个kafka的parition的方式进行了处理。从而导致虽然作业并发度够大,但是由于只有一个partition, 其他并发的taskmanager无法获取到更多的partition进行消费,从而导致并行度提升而作业消费能力却无法同比增大。 之后通过建立2个partition的topic,实现了消费能力的翻倍。

Re: Support for gRPC in Flink StateFun 2.x

2020-09-23 Thread Dalmo Cirne
Thank you so much for creating the ticket, Igal. We are looking forward to being able to use it! And thank you for giving a little more context about how StateFun keeps a connection pool and tries to optimize for performance and throughput. With that said, gRPC is an architectural choice we

Reusing Flink SQL Client's Environment for Flink pipelines

2020-09-23 Thread Dan Hill
Has anyone tried to reused the Flink SQL Client's yaml Environment configuration for their production setups? It seems

Re: RichFunctions in Flink's Table / SQL API

2020-09-23 Thread Piyush Narang
Hi Timo, Thanks for getting back and filing the jira. I'll try to see if there's a way we can rework things to take advantage of the aggregate functions. -- Piyush On 9/23/20, 3:55 AM, "Timo Walther" wrote: Hi Piyush, unfortunately, UDFs have no direct access to Flink's

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-23 Thread Lian Jiang
Dawid, Thanks for the fix. I may wait for Flink 1.12 coming out at the end of Oct this year. Meanwhile, I may want to better understand the current solution at the beginning of this thread. My observations: 1. ProcessFunction with streamEnv.getConfig().enableObjectReuse() --> working 2.

Re: How to stop multiple Flink jobs of the same name from being created?

2020-09-23 Thread Yang Wang
Hi Dan, If you are using a K8s job to deploy the "INSERT INTO" SQL jobs into the existing Flink cluster, then you have to manage the lifecycle of these jobs by yourself. I think you could use flink command line or rest API to check the job status first. Best, Yang Dan Hill 于2020年9月23日周三

Poor performance with large keys using RocksDB and MapState

2020-09-23 Thread ירון שני
Hello, I have a poor throughput issue, and I think I managed to reproduce it using the following code: val conf = new Configuration() conf.set(TaskManagerOptions.MANAGED_MEMORY_SIZE, MemorySize.ofMebiBytes(6 * 1000)) conf.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.ofMebiBytes(8 *

[DISCUSS] ReplayableSourceStateBackend

2020-09-23 Thread Theo Diefenthal
Hi there, I just had the idea of a "ReplayableSourceStateBackend". I opened up a JIRA issue where I described the idea about it [1]. I would love to hear your feedback: Do you think it is possible to implement (I am not sure if a pipeline can be fully reconstructed from the source elements

Re: Stateful Functions + ML model prediction

2020-09-23 Thread John Morrow
Thanks very much Igal - that sounds like a good solution! I'm new to StateFun so I'll have to dig into it a bit more, but this sounds like a good direction. Thanks again, John. From: Igal Shilman Sent: Wednesday 23 September 2020 09:06 To: John Morrow Cc:

Fault tolerance: StickyAllocationAndLocalRecoveryTestJob

2020-09-23 Thread cokle
Hello members, I am new to the Apache Flink word and in the last month, I have been exploring the testing scenarios offered by Flink team and different books to learn Flink. Today I was trying to better understand this test that you can find it here:

Re: Flink Statefun Byte Ingress

2020-09-23 Thread Igal Shilman
Hi, For ingress, we don't look at the content at all, we put the bytes "as-is" into the Any's value field, and we set the typeUrl field with whatever was specified in the module.yaml. See here for example:

Re: Support for gRPC in Flink StateFun 2.x

2020-09-23 Thread Igal Shilman
Hi Dalmo, Thanks a lot for sharing this use case! If I understand the requirement correctly, you are mostly concerned with performance. In that case I've created an issue [1] to add a gRPC transport for StateFun, and I believe we would be able to implement it in the upcoming weeks. Just a side

Re: [flink-1.10.2] Blink SQL 超用内存严重

2020-09-23 Thread Tianwang Li
使用的是 `RocksDBStateBackend`, 是什么超用了内存, 配置了“taskmanager.memory.process.size: 4g”, 并且有预留 1G 用于jvm-overhead。 现在超了2.8G,是什么超用的,我想了解一下。 如果控制不了,很容易被资源系统(yarn、k8s等) kill 了。 有没有,其他人有这方面的经验。 Benchao Li 于2020年9月23日周三 下午1:12写道: > 超yarn内存不合理。因为state如果用的是heap,那应该是堆内内存,不会超过配置的JVM的最大heap的内存的, > 只会jvm

Re: Flink Statefun Byte Ingress

2020-09-23 Thread Timothy Bess
Hi Igal, Ah that definitely helps to know for Function -> Function invocations, but when doing Ingress via statefun how would that work? Is there a config I can set in the "module.yaml" to have it just pack arbitrary bytes into the Any? Thanks, Tim On Wed, Sep 23, 2020 at 7:01 AM Igal Shilman

flinksql接收到字段值格式为2020-09-23T20:58:24+08:00,如何转成TIMESTAMP

2020-09-23 Thread chenxuying
flinksql 版本是1.11.2 source接收到字段是字符串类型的时间 CREATE TABLE sourceTable ( `time` STRING ) WITH( ... ); sink如下 CREATE TABLE sinktable ( `time1` STRING, `time` TIMESTAMP(3) ) WITH ( 'connector' = 'print' ); insert语句,不知道怎么正确修改TO_TIMESTAMP默认的格式 insert into sinktable select

Re: Stateful Functions + ML model prediction

2020-09-23 Thread Igal Shilman
Hi John, Thank you for sharing your interesting use case! Let me start from your second question: > Are stateful functions available to all Flink jobs within a cluster? Yes, the remote functions are some logic exposed behind an HTTP endpoint, and Flink would forward any message addressed to

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-23 Thread Claude M
It was mentioned that this issue may be fixed in 1.10.3 but there is no 1.10.3 docker image here: https://hub.docker.com/_/flink On Wed, Sep 23, 2020 at 7:14 AM Claude M wrote: > In regards to the metaspace memory issue, I was able to get a heap dump > and the following is the output: > >

Re: 编译Flink时找不到scala-maven-plugin:3.1.4

2020-09-23 Thread zilong xiao
Hi Natasha, 在mvn命令中加上这两个参数试试看 -Dscala-2.12 -Pscala-2.12 Natasha <13631230...@163.com> 于2020年9月23日周三 下午4:00写道: > Hi All, > 很高兴加入Flink这个大家庭!但是有个问题困扰了我好久! > 当我导入Flink到IDEA中准备进行编译,输入“mvn clean install -Drat.skip=true > -Dmaven.test.skip=true -Dmaven.javadoc.skip=true

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-23 Thread Claude M
In regards to the metaspace memory issue, I was able to get a heap dump and the following is the output: Problem Suspect 1 One instance of *"java.lang.ref.Finalizer"* loaded by *""* occupies *4,112,624 (11.67%)* bytes. The instance is referenced by *sun.misc.Cleaner @ 0xb5d6b520* , loaded by

Ignoring invalid values in KafkaSerializationSchema

2020-09-23 Thread Yuval Itzchakov
Hi, I'm using a custom KafkaSerializationSchema to write records to Kafka using FlinkKafkaProducer. The objects written are Rows coming from Flink's SQL API. In some cases, when trying to convert the Row object to a byte[], serialization will fail due to malformed values. In such cases, I would

Re:查询hbase sink结果表,有时查到数据,有时查不到

2020-09-23 Thread izual
hbase写入时会有buffer [1],按照时间或者数据量写入 [2],可以看下是不是调整过? 1. https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hbase/src/main/java/org/apache/flink/connector/hbase/sink/HBaseSinkFunction.java 2.

Flink Statefun Byte Ingress

2020-09-23 Thread Igal Shilman
Hi Tim, You are correct, currently the argument to a remote function must be a Protobuf Any, however StateFun doesn't interpret the contents of that Any, and it would be passed as-is to the remote function. As you mentioned in your email you can interpret the bytes as the bytes of a JSON string.

kafka增加字段,hive表如何处理

2020-09-23 Thread china_tao
flink1.11.1,flink sql,已经实现flink sql 读取kafka,存储到hive。现在的问题是,kafka源增加字段了,flink sql中的hive如何修改。直接在hive中增加字段的话,每次启动,会报 hive表已经存在,如果drop table if exists的话,历史数据就会丢。请问大家是如何处理的,谢谢。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Efficiently processing sparse events in a time windows

2020-09-23 Thread Steven Murdoch
Hello, I am trying to do something that seems like it should be quite simple but I haven’t found an efficient way to do this with Flink and I expect I’m missing something obvious here. The task is that I would like to process a sequence of events when a certain number appear within a keyed

查询hbase sink结果表,有时查到数据,有时查不到

2020-09-23 Thread kandy.wang
insert into hive.temp_dw.day_order_index select rowkey, ROW(orderN,) from ( select order_date as rowkey, count(distinct parent_sn) as orderN, group by order_date ) 通过sql查hbase时,有时查到数据,有时候查不到数据。是不是group操作,会有下游算子 发送撤回消息,导致在delete hbase的某条rowkey数据,导致客户端查不到数据? 我理解 hbase sink

Re: Better way to share large data across task managers

2020-09-23 Thread Dongwon Kim
Hi Kostas, Thanks for the input! BTW, I guess you assume that the broadcasting occurs just once for bootstrapping, huh? My job needs not only bootstrapping but also periodically fetching a new version of data from some external storage. Thanks, Dongwon > 2020. 9. 23. 오전 4:59, Kostas Kloudas

Flink Kerberos认证问题

2020-09-23 Thread zhangjunj
您好: 因为业务需要,需要Flink连接CDK(带有kerberos环境下的Kafka Topic)。 同一集群,Flink on Yarn模式,在kerberos环境下申请yarn-session资源通过:yarn-session.sh -n 2 -d -jm 2048 -tm 4096 -qu root.__ -D security.kerberos.login.keytab=AAA.keytab -D security.kerberos.login.principal=AAA,

flink sql grouping sets语义中NOT NULL不生效

2020-09-23 Thread kandy.wang
sql如下: select (case when act_name is not null then act_name else 'default_value' end) as act_name, (case when fst_plat is not null then fst_plat else 'default_value' end) as fst_plat, sum(amount) as saleN from

Re: flink sql延迟数据

2020-09-23 Thread Benchao Li
你是用的Blink planner的TUMBLE window么,如果是的话,可以通过设置state retention[1]时间来处理late数据的。 具体的allow lateness的时间就是你设置的min retention time [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/query_configuration.html#idle-state-retention-time ang <806040...@qq.com> 于2020年9月23日周三

Re: flink pb转json性能问题

2020-09-23 Thread Benchao Li
Hi kandy, 关于第1个问题,目前社区有计划做一个内置的pb format[1],可能大概率赶不上1.12了,不过应该1.13差不多。 [1] https://issues.apache.org/jira/browse/FLINK-18202 kandy.wang 于2020年9月23日周三 下午4:55写道: > 因flink目前不支持pb format,调用了,protobuf-java-util >

flink pb转json性能问题

2020-09-23 Thread kandy.wang
因flink目前不支持pb format,调用了,protobuf-java-util com.google.protobuf.utilJsonFormat.printer().preservingProtoFieldNames().print(message) 先再pb 转成json 再套用 JsonRowDataDeserializationSchema处理json, 发现处理的性能就只能达到20w左右的tps,而如果是处理json格式的数据,tps是可以达到50-60w的tps. 想问一下,1、flink要是处理pb格式的数据,有什么好的办法? 2 、社区对pb format

Re: 编译Flink时找不到scala-maven-plugin:3.1.4

2020-09-23 Thread tison
从日志看你的 scala 是 2.10 版本的,比较新版本的 flink 应该都只支持 2.11 和 2.12 Best, tison. Natasha <13631230...@163.com> 于2020年9月23日周三 下午4:00写道: > Hi All, > 很高兴加入Flink这个大家庭!但是有个问题困扰了我好久! > 当我导入Flink到IDEA中准备进行编译,输入“mvn clean install -Drat.skip=true > -Dmaven.test.skip=true -Dmaven.javadoc.skip=true

Re: How to disconnect taskmanager via rest api?

2020-09-23 Thread Luan Cooper
thanks I'll create a new issue for this feature on github On Mon, Sep 21, 2020 at 11:51 PM Timo Walther wrote: > Hi Luan, > > this sound more of a new feature request to me. Maybe you can already > open an issue for it. > > I will loop in Chesnay in CC if there is some possibility to achieve >

flink sql????????

2020-09-23 Thread ang
hi ??flink sqlkafka??event time5s??5s??waterwark?? WATERMARK FOR ts AS ts - INTERVAL '5' SECODND

Re: Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-23 Thread jun su
hi danny & godfrey 看debug日志99%是CalcMergeRule , 我看blink用的是FlinkCalcMergeRule , 在matches方法里加了些对none-deterministic表达式的过滤,, 于是我将CalcMergeRule替换成FlinkCalcMergeRule, 并在FlinkRuleSets里做了更新 , 重跑后debug日志是99%是更新过的FlinkCalcMergeRule Danny Chan 于2020年9月23日周三 下午12:32写道: > 应该是碰到节点 cycle 引用了,导致优化 rule

编译Flink时找不到scala-maven-plugin:3.1.4

2020-09-23 Thread Natasha
Hi All, 很高兴加入Flink这个大家庭!但是有个问题困扰了我好久! 当我导入Flink到IDEA中准备进行编译,输入“mvn clean install -Drat.skip=true -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -Dcheckstyle.skip=true”后, 报错“Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.4:testCompile (scala-test-compile) on project

Re: RichFunctions in Flink's Table / SQL API

2020-09-23 Thread Timo Walther
Hi Piyush, unfortunately, UDFs have no direct access to Flink's state. Aggregate functions are the only type of functions that can be stateful at the moment. Aggregate functions store their state in an accumulator that is serialized/deserialized on access, but an accumulator field can be

回复:[flink-1.10.2] Blink SQL 超用内存严重

2020-09-23 Thread 郑斌斌
谢谢Peidian ,我试一下 -- 发件人:Peidian Li 发送时间:2020年9月23日(星期三) 14:02 收件人:user-zh ; 郑斌斌 主 题:Re: [flink-1.10.2] Blink SQL 超用内存严重 我也遇到过类似的问题,也是使用rocksdb,flink 1.10版本,我理解的是block

Flink stateful functions and Event Driven microservices

2020-09-23 Thread Mazen Ezzeddine
Hello, What are the differences between Flink stateful functions and Event driven microservices are they almost the same concept? Indeed I am aware that flink stateful functions provide out of the box functionalities like Exaclty once processing gurantees on Failure and recovery, stateful middle

Re: Back pressure with multiple joins

2020-09-23 Thread Dan Hill
When I use DataStream and implement the join myself, I can get 50x the throughput. I assume I'm doing something wrong with Flink's Table API and SQL interface. On Tue, Sep 22, 2020 at 11:21 PM Dan Hill wrote: > Hi! > > My goal is to better understand how my code impacts streaming throughput. >

Back pressure with multiple joins

2020-09-23 Thread Dan Hill
Hi! My goal is to better understand how my code impacts streaming throughput. I have a streaming job where I join multiple tables (A, B, C, D) using interval joins. Case 1) If I have 3 joins in the same query, I don't hit back pressure. SELECT ... FROM A LEFT JOIN B ON... LEFT JOIN C ON...

Re: [flink-1.10.2] Blink SQL 超用内存严重

2020-09-23 Thread Peidian Li
我也遇到过类似的问题,也是使用rocksdb,flink 1.10版本,我理解的是block cache超用,我这边的解决办法是增大了 taskmanager.memory.jvm-overhead.fraction ,如果仍然出现内存超用这个问题,可以尝试调大taskmanager.memory.task.off-heap.size