Re: How to make two SQLs use the same KafkaTableSource?

2019-08-08 Thread Zhenghua Gao
Blink planner support lazy translation for multiple SQLs, and the common nodes will be reused in a single job. The only thing you need note here is the unified TableEnvironmentImpl do not support conversions between Table(s) and Stream(s). U must use pure SQL api (DDL/DML by sqlUpdate, DQL by

Re: How to make two SQLs use the same KafkaTableSource?

2019-08-08 Thread Tony Wei
forgot to send to user mailing list. Tony Wei 於 2019年8月9日 週五 下午12:36寫道: > Hi Zhenghua, > > I didn't get your point. It seems that `isEagerOperationTranslation` is > always return false. Is that > means even I used Blink planner, the sql translation is still in a lazy > manner? > > Or do you

Re: 如何获取Flink table api/sql code gen 代码

2019-08-08 Thread Zhenghua Gao
Currently Flink DO NOT provides a direct way to get code gen code. But there are indirect ways to try. 1) debug in IDE Flink use Janino to compile all code gen code, and there is a single entry point [1]

Re: Re: CsvTableSink 目录没有写入具体的数据

2019-08-08 Thread wangl...@geekplus.com.cn
抱歉,是我搞错了。 实际上是写入数据的。我在 windows 下做测试,刷新下文件的大小始终是 0 , 只有编辑看下那个文件显示的文件大小才会变更。 wangl...@geekplus.com.cn Sender: Alec Chen Send Time: 2019-08-09 10:17 Receiver: user-zh Subject: Re: Re: CsvTableSink 目录没有写入具体的数据 没数据是因为没有trigger执行, 参考sample code from doc(

Re: Flink sql join问题

2019-08-08 Thread Zhenghua Gao
可以试下最新flink 1.9 blink planner的firstRow/lastRow优化[1]能否满足你的需求,目前的限制是只能基于procTime来去重。 * e.g. * 1. {{{ * SELECT a, b, c FROM ( * SELECT a, b, c, proctime, * ROW_NUMBER() OVER (PARTITION BY a ORDER BY proctime ASC) as row_num * FROM MyTable * ) WHERE row_num <= 1 * }}} will be converted

如何讓兩個 SQL 使用相同的 KafkaTableSource

2019-08-08 Thread Tony Wei
Hi 我在我的 flink job 中透過 `flinkTableEnv.connect(new Kafka()...).registerTableSource(...)` 註冊了 一張 kafka table。但從文件上我才知道 SQL 只會在特定的條件下才會真正的轉為 DataStream,比 如說呼叫了Table#toRetractStream`。 因為如此,我發現當我嘗試在同一個 flink job 中使用了不同的 SQL 時,他們會同時產生各自的 kafka source operator。從 flink 的角度來說可能不是什麼大問題,各自獨立的 operator 會各自管理

How to make two SQLs use the same KafkaTableSource?

2019-08-08 Thread Tony Wei
Hi, I used `flinkTableEnv.connect(new Kafka()...).registerTableSource(...)` to register my kafka table. However, I found that because SQL is a lazy operation, it will convert to DataStream under some criteria. For example, `Table#toRetractStream`. So, when I used two SQLs in one application job,

Re: Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-08 Thread Yun Gao
Congratulations Hequn! Best, Yun -- From:Congxian Qiu Send Time:2019 Aug. 8 (Thu.) 21:34 To:Yu Li Cc:Haibo Sun ; dev ; Rong Rong ; user Subject:Re: Re: [ANNOUNCE] Hequn becomes a Flink committer Congratulations Hequn! Best,

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-08-08 Thread Zhu Zhu
Hi Subramanyam, Could you share more information? including: 1. the URL pattern 2. the detailed exception and the log around it 3. the cluster the job is running on, e.g. standalone, yarn, k8s 4. it's session mode or per job mode This information would be helpful to identify the failure cause.

Re: Re: CsvTableSink 目录没有写入具体的数据

2019-08-08 Thread Alec Chen
没数据是因为没有trigger执行, 参考sample code from doc( https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/common.html ) // get a StreamTableEnvironment, works for BatchTableEnvironment equivalentlyStreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); // create a

Flink fs s3 shaded hadoop: KerberosAuthException when using StreamingFileSink to S3

2019-08-08 Thread Achyuth Narayan Samudrala
Hi, We are trying to use StreamingFileSink to write to a S3 bucket. Its a simple job which reads from Kafka and sinks to S3. The credentials for s3 are configured in the flink cluster. We are using flink 1.7.2 without pre bundled hadoop. As suggested in the documentation we have added the

Re: Capping RocksDb memory usage

2019-08-08 Thread Congxian Qiu
Hi Maybe FLIP-49[1] "Unified Memory Configuration for TaskExecutors" can give some information here [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-49-Unified-Memory-Configuration-for-TaskExecutors-td31436.html Best, Congxian Cam Mach 于2019年8月9日周五 上午4:59写道: >

Re: flink-1.8.1 yarn per job模式使用

2019-08-08 Thread Zili Chen
刚发现 user-zh 是有 archive[1] 的,上面提到过的跟你类似的问题是这个 thread[2]。 Best, tison. [1] https://lists.apache.org/list.html?user-zh@flink.apache.org [2] https://lists.apache.org/thread.html/061d8e48b091b27e797975880c193838f2c37894c2a90aa6a6e83d36@%3Cuser-zh.flink.apache.org%3E Yuhuan Li 于2019年8月7日周三 下午7:57写道:

Re: Re: CsvTableSink 目录没有写入具体的数据

2019-08-08 Thread wangl...@geekplus.com.cn
我接入了一个 RocketMQ 的流作为输入。 DataStream> ds = env.addSource(new RocketMQSource( System.out.println(res); return res; } }); tableEnv.registerDataStream("t_pick_task", ds, "pick_task_id,

Re: Capping RocksDb memory usage

2019-08-08 Thread Cam Mach
Hi Biao, Yun and Ning. Thanks for your response and pointers. Those are very helpful! So far, we have tried with some of those parameters (WriterBufferManager, write_buffer_size, write_buffer_count, ...), but still continuously having issues with memory. Here are our cluster configurations:

NoClassDefFoundError in failing-restarting job that uses url classloader

2019-08-08 Thread Subramanyam Ramanathan
Hello, I'm currently using flink 1.7.2. I'm trying to run a job that's submitted programmatically using the ClusterClient API. public JobSubmissionResult run(PackagedProgram prog, int parallelism) The job makes use of some jars which I add to the packaged program through the

Re: Can Flink help us solve the following use case

2019-08-08 Thread Yoandy Rodríguez
Hello Biao, There's a legacy component that expect this "time slices" and tags to be set on our operational data store. Right now I would like to just have the tags set properly on each record, after some reading I came out with the idea of setting multiple sliding windows but there's still an

Re: Capping RocksDb memory usage

2019-08-08 Thread Yun Tang
Hi Cam I think FLINK-7289 [1] might offer you some insights to control RocksDB memory, especially the idea using write buffer manager [2] to control the total write buffer memory. If you do not have too many sst files, write buffer memory usage would consume much more space than index and

Re: Capping RocksDb memory usage

2019-08-08 Thread Ning Shi
Hi Cam, This blog post has some pointers in tuning RocksDB memory usage that might be of help. https://klaviyo.tech/flinkperf-c7bd28acc67 Ning On Wed, Aug 7, 2019 at 1:28 PM Cam Mach wrote: > > Hello everyone, > > What is the most easy and efficiently way to cap RocksDb's memory usage? > >

Re: Capping RocksDb memory usage

2019-08-08 Thread Cam Mach
Thanks for your response, Biao. On Wed, Aug 7, 2019 at 11:41 PM Biao Liu wrote: > Hi Cam, > > AFAIK, that's not an easy thing. Actually it's more like a Rocksdb issue. > There is a document explaining the memory usage of Rocksdb [1]. It might be > helpful. > > You could define your own option

Re: Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-08 Thread Congxian Qiu
Congratulations Hequn! Best, Congxian Yu Li 于2019年8月8日周四 下午2:02写道: > Congratulations Hequn! Well deserved! > > Best Regards, > Yu > > > On Thu, 8 Aug 2019 at 03:53, Haibo Sun wrote: > >> Congratulations! >> >> Best, >> Haibo >> >> At 2019-08-08 02:08:21, "Yun Tang" wrote: >>

Re: CsvTableSink 目录没有写入具体的数据

2019-08-08 Thread Alec Chen
完整代码发一下 wangl...@geekplus.com.cn 于2019年8月8日周四 下午7:37写道: > > 我按官网上的 > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sql.html#specifying-a-query > 例子写的代码 > 但运行后 CsvTableSink 指定的目录只生成了空文件,没有具体的内容,这是为什么呢? > > > > wangl...@geekplus.com.cn >

Re: need help

2019-08-08 Thread Biao Liu
你好, 异常里可以看出 AskTimeoutException, 可以调整两个参数 akka.ask.timeout 和 web.timeout 再试一下,默认值如下 akka.ask.timeout: 10 s web.timeout: 1 PS: 搜 “AskTimeoutException Flink” 可以搜到很多相关答案 Thanks, Biao /'bɪ.aʊ/ On Thu, Aug 8, 2019 at 7:33 PM 陈某 wrote: > > > -- Forwarded message - > 发件人: 陈某

CsvTableSink 目录没有写入具体的数据

2019-08-08 Thread wangl...@geekplus.com.cn
我按官网上的 https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sql.html#specifying-a-query 例子写的代码 但运行后 CsvTableSink 指定的目录只生成了空文件,没有具体的内容,这是为什么呢? wangl...@geekplus.com.cn

Fwd: need help

2019-08-08 Thread 陈某
-- Forwarded message - 发件人: 陈某 Date: 2019年8月8日周四 下午7:25 Subject: need help To: 你好,我是一个刚接触flink的新手,在搭建完flink on yarn集群后,依次启动zookeeper,hadoop,yarn,flkink集群,并提交认识到yarn上时运行遇到问题,网上搜索相关问题,暂未找到解决方式,希望能得到帮助,谢谢。 使用的运行指令为: [root@flink01 logs]# flink run -m yarn-cluster

Re: Operator state

2019-08-08 Thread Yun Tang
Hi When talking about sharing state, broadcast state [1][2] might be a choice. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html#provided-apis [2] https://flink.apache.org/2019/06/26/broadcast-state.html Best Yun Tang

Re: flink 结合canal统计订单gmv

2019-08-08 Thread Alec Chen
Hi, 截图无法显示, 不知道你是使用FlinkSQL还是DataStreamAPI实现, 前者可以参考UDTF, 后者可以参考FlatMap "Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words" https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/ 王飞 于2019年8月8日周四 下午4:53写道: > hi 你好

Re: 关于event-time的定义与产生时间戳位置的问题。

2019-08-08 Thread Alec Chen
Hi, Q: event time这个时间戳是在什么时候打到数据上面去的, A: event time按字面意思理解为event发生的时间, 如果产生数据的设备提供了记录时间的字段, 并且业务逻辑也需要使用这个时间, 则可以将该时间作为event time. 更多信息可以参考 https://ci.apache.org/projects/flink/flink-docs-stable/dev/event_time.html 关于event time, processing time的描述 zhaoheng.zhaoh...@qq.com 于2019年8月8日周四 下午4:36写道:

Re: Consuming data from dynamoDB streams to flink

2019-08-08 Thread Vinay Patil
Hello, For anyone looking for setting up alerts for flink application ,here is good blog by Flink itself : https://www.ververica.com/blog/monitoring-apache-flink-applications-101 So, for dynamoDb streams we can set the alert on millisBehindLatest Regards, Vinay Patil On Wed, Aug 7, 2019 at

flink 结合canal统计订单gmv

2019-08-08 Thread 王飞
hi 你好 需要用flink 解析mysql的binlog 统计订单表 产品维度的gmv, 但是一个insert的binlog 会出现同时购买多个订单 会出现一个集合的订单集合 但是我想统计订单里面产品维度的gmv,如下图 返回的是一个list的订单集合,但是我想取每个订单里面的产品id进行维度统计 ,请问flink 有什么算子 可以把一个list数据的流数据 变成多条流 谢谢

Re: Capping RocksDb memory usage

2019-08-08 Thread Biao Liu
Hi Cam, AFAIK, that's not an easy thing. Actually it's more like a Rocksdb issue. There is a document explaining the memory usage of Rocksdb [1]. It might be helpful. You could define your own option to tune Rocksdb through "state.backend.rocksdb.options-factory" [2]. However I would suggest not

回复:Re: Re: submit jobGraph error on server side

2019-08-08 Thread 王智
感谢大神, 是我配置的资源太少导致响应慢,导致akka 超时。 现在我换了一个k8s 集群,调大了资源,已经不再配到邮件中的发生的异常。 原始邮件 发件人:"Zili Chen"< wander4...@gmail.com ; 发件时间:2019/8/7 15:32 收件人:"王智"< ben.wa...@foxmail.com ; 抄送人:"user-zh"< user-zh@flink.apache.org ; 主题:Re: Re: submit jobGraph error on server side

Re: Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-08 Thread Yu Li
Congratulations Hequn! Well deserved! Best Regards, Yu On Thu, 8 Aug 2019 at 03:53, Haibo Sun wrote: > Congratulations! > > Best, > Haibo > > At 2019-08-08 02:08:21, "Yun Tang" wrote: > >Congratulations Hequn. > > > >Best > >Yun Tang > > > >From: Rong Rong >