回复:[flink-1.10.2] Blink SQL 超用内存严重

2020-09-22 Thread 郑斌斌
谢谢Benchao的回答 ,下面是我的日志,我用的是rocksdb 增量保存state, checkpoint数据量大时有好几个G。rocksdb使用堆外内存, 感觉好像与这块有关。但不知道用什么办法能诊断出来 java.lang.Exception: [2020-09-14 09:27:20.431]Container [pid=10644,containerID=container_e91_1597199405327_9343_01_000298] is running 36864B beyond the 'PHYSICAL' memory limit. Current

Re: [SQL] parse table name from sql statement

2020-09-22 Thread Harold.Miao
thx silence 于2020年9月22日周二 上午11:54写道: > 写过一个类似的可以参考一下 > > private static List lookupSelectTable(SqlNode sqlNode) { > List list = new ArrayList<>(); > if (sqlNode instanceof SqlSelect) { > SqlNode from = ((SqlSelect) sqlNode).getFrom(); >

Re: [flink-1.10.2] Blink SQL 超用内存严重

2020-09-22 Thread Benchao Li
超yarn内存不合理。因为state如果用的是heap,那应该是堆内内存,不会超过配置的JVM的最大heap的内存的, 只会jvm oom。超过yarn内存限制,那是因为还有jvm其他的overhead,加起来总量超了。 郑斌斌 于2020年9月23日周三 下午12:29写道: > 我这边也是遇到同样的问题,简单的双流 interval join SQL 跑4-5天就会有发生超用内存,然后container 被YARN > KILL 。 > 单流跑的话,比较正常。 > JOB的内存是4G。版本1.11.1 >

Re: [flink-1.10.2] Blink SQL 超用内存严重

2020-09-22 Thread Tianwang Li
Hi Benchao, 感谢你的回复 我使用的RocksDB,内存 overhead 太多了, 什么地方超用了那么多内存,好像不受控制了。 另外,我也测试过hop窗口,也是超用内存比较。没有使用增量checkpoint。 最后,我这边的interval join 是 inner join ,使用的 b 表的rowtime作为时间,没有观察到延迟数据的情况。 [image: image.png] Benchao Li 于2020年9月23日周三 上午10:50写道: > Hi Tianwang, > >

Re: Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-22 Thread Danny Chan
应该是碰到节点 cycle 引用了,导致优化 rule 一直重复重复触发,可以将 debug 日志打开,看下是哪个 rule 被频繁触发了,之前修过一个类似的问题[1],可以参考下 [1] https://issues.apache.org/jira/browse/CALCITE-3121 Best, Danny Chan 在 2020年9月23日 +0800 AM10:23,jun su ,写道: > hi godfrey, > 方便说下是哪些rule fix了这个问题么? 我对这个比较好奇 , 想看下是什么原因导致的 > > godfrey he 于2020年9月23日周三

回复:[flink-1.10.2] Blink SQL 超用内存严重

2020-09-22 Thread 郑斌斌
我这边也是遇到同样的问题,简单的双流 interval join SQL 跑4-5天就会有发生超用内存,然后container 被YARN KILL 。 单流跑的话,比较正常。 JOB的内存是4G。版本1.11.1 -- 发件人:Benchao Li 发送时间:2020年9月23日(星期三) 10:50 收件人:user-zh 主 题:Re: [flink-1.10.2] Blink SQL 超用内存严重 Hi Tianwang,

Re: Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-22 Thread godfrey he
例如 calc merge rule,还有calc,agg等其他相关rule,点比较散。得具体看 jun su 于2020年9月23日周三 上午10:22写道: > hi godfrey, > 方便说下是哪些rule fix了这个问题么? 我对这个比较好奇 , 想看下是什么原因导致的 > > godfrey he 于2020年9月23日周三 上午10:09写道: > > > Hi Jun, > > > > 可能是old planner缺少一些rule导致遇到了corner case, > > blink planner之前解过一些类似的案例。 > > > > jun su

Re: Debugging "Container is running beyond physical memory limits" on YARN for a long running streaming job

2020-09-22 Thread Xintong Song
Hi Shubham, Concerning FLINK-18712, thanks for the pointer. I was not aware of this issue before. Running on Kubernetes or Yarn should not affect this issue. I cannot tell whether this issue is the cause of your problem. The simplest way to confirm this is probably just try the solution to see if

Flink Statefun Byte Ingress

2020-09-22 Thread Timothy Bess
Hi, So most of the examples of "module.yaml" files I've seen focus on protobuf ingress, but is there a way to just get bytes from Kafka? I want to integrate this with the rest of my codebase which uses JSON, but don't want to migrate to protobuf just yet. I'm not totally sure how it would work

Re: [flink-1.10.2] Blink SQL 超用内存严重

2020-09-22 Thread Benchao Li
Hi Tianwang, 不知道你的DataStream是怎么实现的,只是从SQL的角度来看,有两个地方会导致状态的量会增加 1. time interval join会将watermark delay之后再发送,也就是说下游的hop窗口的状态会因为上游 join算子延迟了watermark发送而状态比正常情况下大一些。目前watermark的延迟是 `Math.max(leftRelativeSize, rightRelativeSize) + allowedLateness`,根据你的SQL,这个值应该是6h 2. time interval

Re: Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-22 Thread jun su
hi godfrey, 方便说下是哪些rule fix了这个问题么? 我对这个比较好奇 , 想看下是什么原因导致的 godfrey he 于2020年9月23日周三 上午10:09写道: > Hi Jun, > > 可能是old planner缺少一些rule导致遇到了corner case, > blink planner之前解过一些类似的案例。 > > jun su 于2020年9月23日周三 上午9:53写道: > > > hi godfrey, > > > > 刚看了下, blink应该也会用hep , 上文说错了 > > > > jun su

flink从mongdb中读取数据

2020-09-22 Thread GeGe
您好! 我刚开始接触flink,现在准备用flink读取mongdb中的数据。mongodb中每秒都会有一个新增数据,重写RichSourceFunction来连接读取mongodb中数据。但是flink不能像kafka一样,持续不断显示数据,是否mongodb不能持续显示数据呢?还是有一些额外设置? 十分感谢! Best wishes, Gege

Re: Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-22 Thread godfrey he
Hi Jun, 可能是old planner缺少一些rule导致遇到了corner case, blink planner之前解过一些类似的案例。 jun su 于2020年9月23日周三 上午9:53写道: > hi godfrey, > > 刚看了下, blink应该也会用hep , 上文说错了 > > jun su 于2020年9月23日周三 上午9:19写道: > > > hi godfrey, > > 我用了最新代码的blink没这个问题, 我看代码flink是先用hep然后进valcano, 而blink貌似没用hep, > >

Re: Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-22 Thread jun su
hi godfrey, 刚看了下, blink应该也会用hep , 上文说错了 jun su 于2020年9月23日周三 上午9:19写道: > hi godfrey, > 我用了最新代码的blink没这个问题, 我看代码flink是先用hep然后进valcano, 而blink貌似没用hep, > 我将hep代码注释后valcano的迭代次数会大幅减少, 语句嵌套10层基本在4000次左右能获取最佳方案,我再debug看下原因 > > godfrey he 于2020年9月22日周二 下午8:58写道: > >> blink planner

Re: Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-22 Thread jun su
hi godfrey, 我用了最新代码的blink没这个问题, 我看代码flink是先用hep然后进valcano, 而blink貌似没用hep, 我将hep代码注释后valcano的迭代次数会大幅减少, 语句嵌套10层基本在4000次左右能获取最佳方案,我再debug看下原因 godfrey he 于2020年9月22日周二 下午8:58写道: > blink planner 有这个问题吗? > > jun su 于2020年9月22日周二 下午3:27写道: > > > hi all, > > > > 环境: flink-1.9.2 flink table

Re: Flink Table SQL and writing nested Avro files

2020-09-22 Thread Dan Hill
Nice! I'll try that. Thanks, Dawid! On Mon, Sep 21, 2020 at 2:37 AM Dawid Wysakowicz wrote: > Hi Dan, > > I think the best what I can suggest is this: > > SELECT > > ROW(left.field0, left.field1, left.field2, ...), > > ROW(right.field0, right.field1, right.field2, ...) > > FROM ... >

Re: How to stop multiple Flink jobs of the same name from being created?

2020-09-22 Thread Dan Hill
Hi Yang! The multiple "INSERT INTO" jobs all go to the same Flink cluster. I'm using this Helm chart (which looks like the standalone option). I deploy the job using a simple k8 Job. Sounds like I should do this myself.

Stateful Functions + ML model prediction

2020-09-22 Thread John Morrow
Hi Flink Users, I'm using Flink to process a stream of records containing a text field. The records are sourced from a message queue, enriched as they flow through the pipeline based on business rules and finally written to a database. We're using the Ververica platform so it's running on

Re: Better way to share large data across task managers

2020-09-22 Thread Kostas Kloudas
Hi Dongwon, If you know the data in advance, you can always use the Yarn options in [1] (e.g. the "yarn.ship-directories") to ship the directories with the data you want only once to each Yarn container (i.e. TM) and then write a udf which reads them in the open() method. This will allow the data

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-22 Thread Claude M
Thanks for your responses. 1. There were no job re-starts prior to the metaspace OEM. 2. I tried increasing the CPU request and still encountered the problem. Any configuration change I make to the job manager, whether it's in the flink-conf.yaml or increasing the pod's CPU/memory request,

RichFunctions in Flink's Table / SQL API

2020-09-22 Thread Piyush Narang
Hi folks, We were looking to cache some data using Flink’s MapState in one of our UDFs that are called by Flink SQL queries. I was trying to see if there’s a way to set up these state objects via the basic FunctionContext [1] we’re provided in the Table / SQL UserDefinedFunction class [2] but

Adaptive load balancing

2020-09-22 Thread Navneeth Krishnan
Hi All, We are currently using flink in production and use keyBy for performing a CPU intensive computation. There is a cache lookup for a set of keys and since keyBy cannot guarantee the data is sent to a single node we are basically replicating the cache on all nodes. This is causing more

Re: Debugging "Container is running beyond physical memory limits" on YARN for a long running streaming job

2020-09-22 Thread Shubham Kumar
Hi Xintong, Thanks for your insights, they are really helpful. I understand now that it most certainly is a native memory issue rather than a heap memory issue and about not trusting Flink's Non-Heap metrics. I do believe that our structure of job is so simple that I couldn't find any use of

Re: Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-22 Thread godfrey he
blink planner 有这个问题吗? jun su 于2020年9月22日周二 下午3:27写道: > hi all, > > 环境: flink-1.9.2 flink table planner > 现象: 代码一直在 VolcanoPlanner.findBestExp()方法中出不来, 直到OOM > > 发现在嵌套4层时 findBestExp方法中while(true)会循环3w多次后成功退出, 嵌套5层会达到几十万级别, 导致进程OOM > --- > 代码: > >

keyedstate TTL 清理状态如何触发

2020-09-22 Thread smq
大家好,现在有个疑问,TTL如果设成1min,那么是时间到了之后,该state自动清除吗

Re: How to stop multiple Flink jobs of the same name from being created?

2020-09-22 Thread Yang Wang
Hi Dan, First, I want to get more information about your submission so that we could make the question clear. Are you using TableEnvironment to execute multiple "INSERT INTO" sentences and find that each one will be executed in a separated Flink cluster? It is really strange, and I want to know

[flink-1.10.2] Blink SQL 超用内存严重

2020-09-22 Thread Tianwang Li
使用 Blink SQL 实现 interval join + hop window 超用内存比较严重。 【join】 > SELECT `b`.`rowtime`, > `a`.`c_id`, > `b`.`openid` > FROM `test_table_a` AS `a` > INNER JOIN `test_table_b` AS `b` ON `a`.`RoomID` = `b`.`RoomID` > AND `a`.`openid` = `b`.`openid` > AND `b`.`rowtime` BETWEEN ASYMMETRIC `a`.`rowtime`

Re: [DKIM Failure] Re: [DKIM Failure] Re: flink-CDC client 一对多问题

2020-09-22 Thread Jark Wu
1. 你用的是什么版本呢?我在最新的版本上试了下,没有报错呢。试下最新版本? 2. {123123,ff=1, 123123,kk=2} 就是 MULTISET 打印出来的结构哈, 就是 {key1=cnt, key2=cnt} 其中 123123,ff 是 row#toString。 Best, Jark On Tue, 22 Sep 2020 at 19:28, Li,Qian(DXM,PB) wrote: > 你好:我按照上述方法进行测试,往es插入数据的时候一直报这个问题,是哪里有问题么? > > create view uu AS select

Re: App gets stuck in Created State

2020-09-22 Thread Arpith P
All the job manager logs have been deleted from the cluster. I'll have to work with the infra team to get it back, once I have it i'll post it here. Arpith On Mon, Sep 21, 2020 at 5:50 PM Zhu Zhu wrote: > Hi Arpith, > > All tasks in CREATED state indicates no task is scheduled yet. It is >

Re: [DKIM Failure] Re: [DKIM Failure] Re: flink-CDC client 一对多问题

2020-09-22 Thread Li,Qian(DXM,PB)
你好:我按照上述方法进行测试,往es插入数据的时候一直报这个问题,是哪里有问题么? create view uu AS select collect(userBankTime) as userBank ,name from ( select name, row(name,description) as userBankTime from tt_haha) group by name; // 创建视图,描述userBank类型是MULTITYPE Flink SQL> desc uu; root |-- userBank: MULTISET NOT NULL> NOT NULL

flink on yarn容器异常退出

2020-09-22 Thread Dream-底限
hi 我正在使用flink1.11.1 on yarn以分离模式运行任务,但在任务提交的时候,任务在分配完applicationid后,容器经常异常退出,先前以为是yarn环境问题,但是在两个集群测都有遇到这种情况,请问这是一个已知的bug吗

Re: Zookeeper connection loss causing checkpoint corruption

2020-09-22 Thread Arpith P
I created a ticket with all my findings. https://issues.apache.org/jira/browse/FLINK-19359. Thanks, Arpith On Tue, Sep 22, 2020 at 12:16 PM Timo Walther wrote: > Hi Arpith, > > is there a JIRA ticket for this issue already? If not, it would be great > if you can report it. This sounds like a

答复: Flink-1.11.1 Kafka Table API BigInt 问题

2020-09-22 Thread 刘首维
Hi, 试一下java的BigInteger呢 发件人: nashcen <2415370...@qq.com> 发送时间: 2020年9月22日 16:29:41 收件人: user-zh@flink.apache.org 主题: Flink-1.11.1 Kafka Table API BigInt 问题 *我的代码如下* 其中 updateTime 字段,是时间戳,用的BigInt。如果只查询其他String类型的字段,程序正常。但是加上这个字段,就会报错。 package

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

2020-09-22 Thread Rui Li
Hi Timo, I believe the blocker for this feature is that we don't support dynamically adding user jars/resources at the moment. We're able to read the path to the function jar from Hive metastore, but we cannot load the jar after the user session is started. On Tue, Sep 22, 2020 at 3:43 PM Timo

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

2020-09-22 Thread Husky Zeng
Hi Timo, Thanks for your attention,As what I say in this comment, this feature can surely solve our problem, but it seems that the workload is much larger than the solution in my scenario. Our project urgently needs to solve the problem of reusing hive UDF in hive metastore, so we are more

Re: Does Flink support such a feature currently?

2020-09-22 Thread Marta Paes Moreira
Hi, Roc. *Note:* in the future, please send this type of questions to the user mailing list instead (user@flink.apache.org)! If I understand your question correctly, this is possible using the LIKE clause and a registered catalog. There is currently no implementation for the MySQL JDBC catalog,

Flink-1.11.1 Kafka Table API BigInt 问题

2020-09-22 Thread nashcen
*我的代码如下* 其中 updateTime 字段,是时间戳,用的BigInt。如果只查询其他String类型的字段,程序正常。但是加上这个字段,就会报错。 package com.athub.dcpoints.scala.connector.table.hive import com.athub.dcpoints.java.model.KafkaDcpoints import org.apache.flink.streaming.api.scala._ import org.apache.flink.table.api._ import

Re: 答复: 关于FLinkSQL如何创建类json无限扩展的表结构问题

2020-09-22 Thread chuyuan
好勒,这种方案已经成功了,非常感谢。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink DynamoDB stream connector losing records

2020-09-22 Thread Jiawei Wu
Hi Ying and Danny, Sorry for the late reply, I just got back from vacation. Yes I'm running Flink in Kinesis Data Analytics with Flink 1.8, and checkpoint is enabled. This fully managed solution limits my access to Flink logs, so far I didn't get any logs related to throttle or fail over. The

Re: Is there a way to avoid submit hive-udf's resources when we submit a job?

2020-09-22 Thread Timo Walther
Hi Husky, I guess https://issues.apache.org/jira/browse/FLINK-14055 is what is needed to make this feature possible. @Rui: Do you know more about this issue and current limitations. Regards, Timo On 18.09.20 09:11, Husky Zeng wrote: When we submit a job which use udf of hive , the job

[ANNOUNCE] Weekly Community Update 2020/38

2020-09-22 Thread Konstantin Knauf
Dear community, happy to share a brief community update for the past week. A lot of FLIP votes are currently ongoing on the dev@ mailing list. I've covered this FLIP previously, so skipping those this time. Besides that, a couple of release related updates and again multiple new Committers.

Calcite在嵌套多层包含where条件的sql语句时优化器OOM

2020-09-22 Thread jun su
hi all, 环境: flink-1.9.2 flink table planner 现象: 代码一直在 VolcanoPlanner.findBestExp()方法中出不来, 直到OOM 发现在嵌套4层时 findBestExp方法中while(true)会循环3w多次后成功退出, 嵌套5层会达到几十万级别, 导致进程OOM --- 代码: fbTableEnv.registerTableSource("source",orcTableSource) val select =

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-22 Thread Dawid Wysakowicz
Hi Lian, Thank you for sending the full code for the pojo. It clarified a lot! I learnt that Avro introduced yet another mechanism for retrieving conversions for logical types in Avro 1.9.x. I was not aware they create a static SpecificData field with registered logical conversions if a logical

Re: hourly counter

2020-09-22 Thread Timo Walther
Hi Lian, you are right that timers are not available in a ProcessWindowFunction but the state store can be accessed. So given that your window width is 1 min, you could maintain an additional state value for counting the minutes and updating your counter once this value reached 60.

Re: Zookeeper connection loss causing checkpoint corruption

2020-09-22 Thread Timo Walther
Hi Arpith, is there a JIRA ticket for this issue already? If not, it would be great if you can report it. This sounds like a critical priority issue to me. Thanks, Timo On 22.09.20 06:25, Arpith P wrote: Hi Peter, I have recently had a similar issue where I could not load from the

Re: FlinkKafkaConsumer问题

2020-09-22 Thread 赵一旦
这个问题和flink关系不大。Kafka本身就是这么个特点,指定group,如果是订阅方式,会是你想象的那样,分享消息。但,如果是通过assign方式指定了消费哪个分区,则不受到group中消费者共享消息的限制。 SmileSmile 于2020年9月5日周六 下午4:51写道: > hi,这种现象是在开checkpoint才出现的吗,还是没有开启也会? > > Best > > > > > | | > a511955993 > | > | > 邮箱:a511955...@163.com > | > > 签名由 网易邮箱大师 定制 > > 在2020年09月04日 14:11,op

Re: Flink-1.11 Table API &符号 语法问题

2020-09-22 Thread nashcen
多谢,引入以下包解决了我的问题 import org.apache.flink.table.api._ -- Sent from: http://apache-flink.147419.n8.nabble.com/