Re:Re: flink1.9 blink planner table ddl 使用问题

2019-08-25 Thread hb
使用了你的ddl语句,还是报一样的错误. 我是在idea里面执行的,maven 配置的依赖. 在 2019-08-26 11:22:20,"Jark Wu" 写道: >Hi, > >初步看下来你的 DDL 中有这几部分定义的有问题。 > >1. 缺少format properties >2. 缺少 connector.version >3. bootstrap.severs 的配置方式写的不对... > > >你可以参考下面这个作为example: > > >CREATE TABLE kafka_json_source ( >rowtime TIMESTAMP, >

Re: flink 1.9.0 StreamTableEnvironment 编译错误

2019-08-25 Thread ddwcg
按照两位的方法修改后已经可以了,谢谢两位 > 在 2019年8月26日,12:28,Jark Wu 写道: > > Hi, > > 关于 Expression的问题,你需要额外加入 import org.apache.flink.table.api._ 的 import。 > > See release note for more details: >

Re: type error with generics ..

2019-08-25 Thread Debasish Ghosh
oh .. and I am using Flink 1.8 .. On Mon, Aug 26, 2019 at 12:09 AM Debasish Ghosh wrote: > Thanks for the feedback .. here are the details .. > > Just to give u some background the original API is a Scala API as follows > .. > > final def readStream[In: TypeInformation:

kafka消费倾斜问题

2019-08-25 Thread zq wang
大家好,请教一个问题 我的程序以kafka为数据源 去重清洗后入kafka sink,使用的是DataStreamAPI,on-yarn模式,flink版本1.8。程序消费三个topic,以List方式传入的如下 > public FlinkKafkaConsumer010(List topics, > KafkaDeserializationSchema deserializer, Properties props) > > 三个topic因为历史原因导致每个partition数据分配不均匀。我采用了 > DataStream> dataStream = >

Re: flink 1.9.0 StreamTableEnvironment 编译错误

2019-08-25 Thread Jark Wu
Hi, 关于 Expression的问题,你需要额外加入 import org.apache.flink.table.api._ 的 import。 See release note for more details: https://ci.apache.org/projects/flink/flink-docs-release-1.9/release-notes/flink-1.9.html#scala-expression-dsl-for-table-api-moved-to-flink-table-api-scala

Re: flink1.9 blink planner table ddl 使用问题

2019-08-25 Thread Jark Wu
Hi, 初步看下来你的 DDL 中有这几部分定义的有问题。 1. 缺少format properties 2. 缺少 connector.version 3. bootstrap.severs 的配置方式写的不对... 你可以参考下面这个作为example: CREATE TABLE kafka_json_source ( rowtime TIMESTAMP, user_name VARCHAR, event ROW ) WITH ( 'connector.type' = 'kafka', 'connector.version' =

Re: FLINK TABLE API 自定义聚合函数UDAF从check point恢复任务报状态序列化不兼容问题( For heap backends, the new state serializer must not be incompatible)

2019-08-25 Thread Jark Wu
Hi Qi, 你在checkpoint 恢复任务前,有修改过 UDAF 的 ACC 结构吗? 另外,如果可以的话,最好也能带上 UDAF 的实现,和 SQL。 Best, Jark > 在 2019年8月23日,11:08,orlando qi 写道: > > > at

Re: flink 1.9.0 StreamTableEnvironment 编译错误

2019-08-25 Thread Zili Chen
不应该呀,我看到仍然有 def registerDataStream[T](name: String, dataStream: DataStream[T], fields: Expression*): Unit 这个方法的,你能提供完整一点的上下文和报错吗? Best, tison. ddwcg <3149768...@qq.com> 于2019年8月26日周一 上午11:38写道: > 感谢您的回复,确实是这个原因。但是后面注册表的时候不能使用 Expression > 总是感觉 java api 和scala api有点混乱了 > > > 在

Re: flink 1.9.0 StreamTableEnvironment 编译错误

2019-08-25 Thread ddwcg
感谢您的回复,确实是这个原因。但是后面注册表的时候不能使用 Expression 总是感觉 java api 和scala api有点混乱了 > 在 2019年8月26日,11:22,Zili Chen 写道: > > 试试把 > > import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment > > 换成 > > import org.apache.flink.table.api.scala.StreamExecutionEnvironment > > 应该是意外 import

Re: flink 1.9.0 StreamTableEnvironment 编译错误

2019-08-25 Thread Zili Chen
试试把 import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment 换成 import org.apache.flink.table.api.scala.StreamExecutionEnvironment 应该是意外 import 了不同包下的同名类的缘故 Best, tison. ddwcg <3149768...@qq.com> 于2019年8月26日周一 上午11:12写道: > 大家好, > 我周末升级到了1.9.0,但是在初始化table

Re: 任务内存增长

2019-08-25 Thread Yun Tang
hi 张坤 使用的是RocksDBStateBackend么,一般被YARN的node manager内存超用而kill是native 内存超用导致的。可以在Flink 参数env.java.opts.taskmanager里面加上 -XX:NativeMemoryTracking=detail [1],这样可以观察内存是否增长。另外你使用的内存配置和被kill时候的YARN的日志分别是什么呢,可以考虑增大JVM heap 申请的资源来变相加大向YARN申请的总内存,某种程度上可以缓解被kill的概率。 [1]

FLINK WEEKLY 2019/34

2019-08-25 Thread Zili Chen
很高兴和各位分享 FLINK 社区上周的发展。上周 FLINK 1.9.0[1] 正式发布了,本次发布的重大更新包括细粒度的恢复机制(FLIP-1)、State 处理 API(FLIP-43)、提供强一致性保证的 stop-with-savepoint(FLIP-43)以及利用 Angular 7.x 重写的 FLINK WebUI 等。此外,本次发布还包括了一系列正在开发中的供用户预览的特性,例如 Blink 的 SQL Query Processor,Hive 的整合,以及新的 Python Table API(FLIP-38)。欢迎大家下载 FLINK 1.9.0 尝试新功能!

任务内存增长

2019-08-25 Thread 张坤
Hi:    最近在使用Flink(1.7.2)提交任务到yarn(per job),任务在yarn上运行几个小时就会被kill掉,观察到任务的内存一直在增长,任务提交时有内存参数设置,任务逻辑为kafka数据简单处理后,注册成table,使用窗口聚合, 大家有没有遇到类似的问题,原因是什么?怎么解决或者优化?谢谢!

flink1.9 blink planner table ddl 使用问题

2019-08-25 Thread hb
flink1.9 blink planner table 使用ddl 语句,创建表不成功,不知道是少了 定义属性还是 需要实现TableSourceFactory,还是其他什么. 提示: Exception in thread "main" org.apache.flink.table.api.ValidationException: SQL validation failed. findAndCreateTableSource failed. Caused by:

Re: OVER operator filtering out records

2019-08-25 Thread Vinod Mehra
[image: image.png] When there are new events the old events just get stuck for many hours (more than a day). So if there is a buffering going on it seems it is not time based but size based (?). Looks like unless the buffered events exceed a certain threshold they don't get flushed out (?). Is

Re: type error with generics ..

2019-08-25 Thread Debasish Ghosh
Thanks for the feedback .. here are the details .. Just to give u some background the original API is a Scala API as follows .. final def readStream[In: TypeInformation: DeserializationSchema](inlet: CodecInlet[In]): DataStream[In] = context.readStream(inlet) and the *Scala version of the

Re: type error with generics ..

2019-08-25 Thread Rong Rong
I am not sure how the function `readStream` is implemented (also which version of Flink are you using?). Can you share more information on your code blocks and exception logs? Also to answer your question, DataStream return type is determined by its underlying transformation, so you cannot set it

tumbling event time window , parallel

2019-08-25 Thread Hanan Yehudai
I have an issue with tumbling windows running in parallel. I run a Job on a set of CSV files. When the parallelism is set to 1. I get the proper results. While it runs in parallel. I get no output. Is it due to the fact the parallel streams take the MAX(watermark) from all the parallel

Re: Using shell environment variables

2019-08-25 Thread Vishwas Siravara
You can also link at runtime by providing the path to the dylib by adding -Djava.library.path= in jvm options in the task manager On Sat, Aug 24, 2019 at 9:11 PM Zhu Zhu wrote: > Hi Abhishek, > > You need to export the environment variables on all the worker > machines(not the machine to submit

Error while using catalog in .yaml file

2019-08-25 Thread Yebgenya Lazarkhosrouabadi
Hello, I'm trying to use hivecatalog in flink1.9. I modified the yaml file like this: catalogs: - name: mynewhive type: hive hive-conf-dir: /home/user/Downloads/apache-hive-1.2.2-bin/conf default-database: myhive But when I try to run ./sql-client.sh embedded I get this error:

Re: Use logback instead of log4j

2019-08-25 Thread Vishwas Siravara
Any idea on how I can use log back instead ? On Fri, Aug 23, 2019 at 1:22 PM Vishwas Siravara wrote: > Hi , > From the flink doc , in order to use logback instead of log4j " Users > willing to use logback instead of log4j can just exclude log4j (or delete > it from the lib/ folder)." >

Re: Externalized checkpoints

2019-08-25 Thread Vishwas Siravara
Got it.Thank you On Thu, Aug 22, 2019 at 8:54 PM Congxian Qiu wrote: > Hi, Vishwas > > As Zhu Zhu said, you can set "state.checkpoints.num-retained"[1] to > specify the maximum number of completed checkpoints to retain. > maybe you can also ref the external checkpoint cleanup type[2] config for

RE: timeout error while connecting to Kafka

2019-08-25 Thread Eyal Pe'er
Nope, I submitted it throw the flink job master itself by running flink run -c sandbox.jar Best regards Eyal Peer / Data Platform Developer [cid:image001.png@01D55B62.D5E98030] From: miki haiat Sent: Sunday, August 25, 2019 4:21 PM To: Eyal Pe'er Cc: user Subject: Re: timeout error while

Re: timeout error while connecting to Kafka

2019-08-25 Thread miki haiat
I'm trying to understand. Did you submitted your jar throw the flink web UI , And then you got the time out error ? On Sun, Aug 25, 2019, 16:10 Eyal Pe'er wrote: > What do you mean by “remote cluster”? > > I tried to run dockerized Flink version ( >

RE: timeout error while connecting to Kafka

2019-08-25 Thread Eyal Pe'er
What do you mean by “remote cluster”? I tried to run dockerized Flink version (https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html) on a remote machine and to submit a job that supposed to communicate with Kafka, but still I cannot access the topic. Best regards

Re: timeout error while connecting to Kafka

2019-08-25 Thread miki haiat
Did you try to submit it to remote cluster ? On Sun, Aug 25, 2019 at 2:55 PM Eyal Pe'er wrote: > BTW, the exception that I see in the log is: ERROR > org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception > occurred in REST handler… > > Best regards > > Eyal Peer */ *Data

RE: timeout error while connecting to Kafka

2019-08-25 Thread Eyal Pe'er
Replication factor is 1. In most of my topics this is the case. Is it a problem to consume events from non-replicated topics ? Best regards Eyal Peer / Data Platform Developer [cid:image001.png@01D55B59.352FCE00] From: Yitzchak Lieberman Sent: Sunday, August 25, 2019 3:13 PM To: Eyal Pe'er Cc:

Re: timeout error while connecting to Kafka

2019-08-25 Thread Yitzchak Lieberman
What is the topic replication factor? how many kafka brokers do you have? I were facing the same exception when one of my brokers was down and the topic had no replica (replication_factor=1) On Sun, Aug 25, 2019 at 2:55 PM Eyal Pe'er wrote: > BTW, the exception that I see in the log is: ERROR >

RE: timeout error while connecting to Kafka

2019-08-25 Thread Eyal Pe'er
BTW, the exception that I see in the log is: ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler… Best regards Eyal Peer / Data Platform Developer [cid:image001.png@01D55B55.21C20990] From: Eyal Pe'er Sent: Sunday, August 25, 2019 2:20 PM

RE: timeout error while connecting to Kafka

2019-08-25 Thread Eyal Pe'er
Hi, I removed that dependency, but it still fails. The reason why I used Kafka 1.5.0 is because I followed a training which used it (https://www.baeldung.com/kafka-flink-data-pipeline). If needed, I can change it. I’m not sure, but maybe in order to consume events from Kafka 0.9 I need to