Re: Modelling time for complex events generated out of simple ones

2020-04-19 Thread Salva Alcántara
In my case, the relationship between input and output events is that output events are generated out of some rules based on input events. Essentially, output events correspond to specific patterns / sequences of input events. You can think of output events as detecting certain anomalies or

Re: Flink Conf "yarn.flink-dist-jar" Question

2020-04-19 Thread Yang Wang
Hi tison, I think i get your concerns and points. Take both FLINK-13938[1] and FLINK-14964[2] into account, i will do in the following steps. * Enrich "-yt/--yarnship" to support HDFS directory * Enrich "-yt/--yarnship" to specify local resource visibility. It is "APPLICATION" by default. It

Re:Re: multi-sql checkpoint fail

2020-04-19 Thread forideal
Hi Tison, Jark Wu: Thanks for your reply !!! What's the statebackend are you using? Is it Heap statebackend? rocksdb backend uses incremental checkpoint. Could you share the stack traces? I looked at the flame chart myself and found that it was stuck at the end of the

Re: 问题请教-flinksql的kafkasource方面

2020-04-19 Thread Benchao Li
应该是不会的。分配不到partition的source会标记为idle状态。 Sun.Zhu <17626017...@163.com> 于2020年4月20日周一 上午10:28写道: > Hi,benchao,source并发度大于partition数的话,会导致不做checkpoint的问题吧 > > > > > | | > Sun.Zhu > | > | > 邮箱:17626017...@163.com > | > > Signature is customized by Netease Mail Master > > 在2020年04月19日 22:43,人生若只如初见

Re: how to send back result via job manager to client

2020-04-19 Thread Eleanore Jin
Hi Kurt, 谢谢, 我了解过后如果有问题再请教 Best Eleanore On Sun, Apr 19, 2020 at 7:18 PM Kurt Young wrote: > 可以看下这个jira:https://issues.apache.org/jira/browse/FLINK-14807 > > Best, > Kurt > > > On Mon, Apr 20, 2020 at 7:07 AM Eleanore Jin > wrote: > > > Hi, > > 刚刚读到一篇关于Flink 在OLAP 上的使用案例 ( > > >

??????????????-flinksql??kafkasource????

2020-04-19 Thread Sun.Zhu
Hi,benchao??source??partition??checkpoint | | Sun.Zhu | | ??17626017...@163.com | Signature is customized by Netease Mail Master ??2020??04??19?? 22:43 ?? ?? ---- ??:"Benchao

Re: Job manager URI rpc address:port

2020-04-19 Thread Jeff Zhang
Glad to hear that. Som Lima 于2020年4月20日周一 上午8:08写道: > I will thanks. Once I had it set up and working. > I switched my computers around from client to server to server to client. > With your excellent instructions I was able to do it in 5 .minutes > > On Mon, 20 Apr 2020, 00:05 Jeff Zhang,

Re: how to send back result via job manager to client

2020-04-19 Thread Kurt Young
可以看下这个jira:https://issues.apache.org/jira/browse/FLINK-14807 Best, Kurt On Mon, Apr 20, 2020 at 7:07 AM Eleanore Jin wrote: > Hi, > 刚刚读到一篇关于Flink 在OLAP 上的使用案例 ( > https://ververica.cn/developers/olap-engine-performance-optimization-and-application-cases/), > 其中一点提到了: > [image: image.png] >

Re: Job manager URI rpc address:port

2020-04-19 Thread Som Lima
I will thanks. Once I had it set up and working. I switched my computers around from client to server to server to client. With your excellent instructions I was able to do it in 5 .minutes On Mon, 20 Apr 2020, 00:05 Jeff Zhang, wrote: > Som, Let us know when you have any problems > > Som

Re: Job manager URI rpc address:port

2020-04-19 Thread Jeff Zhang
Som, Let us know when you have any problems Som Lima 于2020年4月20日周一 上午2:31写道: > Thanks for the info and links. > > I had a lot of problems I am not sure what I was doing wrong. > > May be conflicts with setup from apache spark. I think I may need to > setup users for each development. > > >

how to send back result via job manager to client

2020-04-19 Thread Eleanore Jin
Hi, 刚刚读到一篇关于Flink 在OLAP 上的使用案例 ( https://ververica.cn/developers/olap-engine-performance-optimization-and-application-cases/), 其中一点提到了: [image: image.png] 这部分优化,源于 OLAP 的一个特性:OLAP 会将最终计算结果发给客户端,通过JobManager 转发给 Client。 想请问一下这一点是如何实现的: 通过JobManager 把结果转发给Client. 谢谢! Eleanore

Problem getting watermark right with event time

2020-04-19 Thread Sudan S
Hi, I am having a problem getting watermark right. The setup is - I have a Flink Job which reads from a Kafka topic, uses Protobuf Deserialization, uses Sliding Window of (120seconds, 30 seconds), sums up the value and finally returns the result. The code is pasted below. The problem here is,

Re: Job manager URI rpc address:port

2020-04-19 Thread Som Lima
Thanks for the info and links. I had a lot of problems I am not sure what I was doing wrong. May be conflicts with setup from apache spark. I think I may need to setup users for each development. Anyway I kept doing fresh installs about four altogether I think. Everything works fine now

Re: Modelling time for complex events generated out of simple ones

2020-04-19 Thread Yun Tang
Hi Salva I think this depends on what the relationship between you output and input events. If the output ones are just simple wrapper of input ones, e.g. adding some simple properties or just read from one place and write to another place, I think the output events could hold time which is

Modelling time for complex events generated out of simple ones

2020-04-19 Thread Salva Alcántara
My flink application generates output (complex) events based on the processing of (simple) input events. The generated output events are to be consumed by other external services. My application works using event-time semantics, so I am bit in doubt regarding what should I use as the output

Re: 关于StreamingFileSink

2020-04-19 Thread Yun Gao
Hello~ 想再确认一下预期的行为:现在是希望后面重新写之后,用新写过的part-xx来覆盖之前生成的文件么~? -- From:酷酷的浑蛋 Send Time:2020 Apr. 18 (Sat.) 20:32 To:user-zh Subject:关于StreamingFileSink 我在用StreamingFileSink

Re: 1.10任务执行过程--源码的一些疑问

2020-04-19 Thread 祝尚
hi,tison,jiacheng感谢解答,按照你说的又仔细看了一遍,确实如此,在实例化MailboxProcessor有把processInput当做参数传进去,在MailboxProcessor#runMailboxLoop中会去执行defaultAction方法 while (processMail(localMailbox)) { mailboxDefaultAction.runDefaultAction(defaultActionContext); // lock is acquired inside default action as needed } 再次感谢

?????? ????????-flinksql??kafkasource????

2020-04-19 Thread ??????????????
?? ---- ??:"Benchao Li"

Re: Job manager URI rpc address:port

2020-04-19 Thread Jeff Zhang
Hi Som, You can take a look at flink on zeppelin, in zeppelin you can connect to a remote flink cluster via a few configuration, and you don't need to worry about the jars. Flink interpreter will ship necessary jars for you. Here's a list of tutorials. 1) Get started

Re: 问题请教-flinksql的kafkasource方面

2020-04-19 Thread Benchao Li
如果是这种情况,可以让你的source的并发度大于等于kafka partition的数量来避免一下。 Jark Wu 于2020年4月19日周日 下午8:22写道: > Hi, > > 根据你描述的现象,以及提供的代码。我觉得原因应该是数据乱序导致的。 > 根据你的 Java 代码,数据的 event time 不是单调递增的,会有一定程度的乱序,这种乱序在作业正常运行时影响不大(watermark > 能容忍 5s 乱序). > 但是在追数据时,由于 flink 目前还没有做到event time 对齐,所以会导致追数据时某些 partition 进度比某些 partition

??????1.10????????????--??????????????

2020-04-19 Thread ??????(Jiacheng Jiang)
??MailboxProcessorstreamtask??processInputMailboxDefaultAction??MailboxProcessorInputStatus status = inputProcessor.processInput();inputProcessor??StreamOneInputProcessor??InputStatus status =

Re: 问题请教-flinksql的kafkasource方面

2020-04-19 Thread Jark Wu
Hi, 根据你描述的现象,以及提供的代码。我觉得原因应该是数据乱序导致的。 根据你的 Java 代码,数据的 event time 不是单调递增的,会有一定程度的乱序,这种乱序在作业正常运行时影响不大(watermark 能容忍 5s 乱序). 但是在追数据时,由于 flink 目前还没有做到event time 对齐,所以会导致追数据时某些 partition 进度比某些 partition 进度快很多的现象, 导致乱序程度拉大(如原先迟到最久的数据时4s,现在可能是10s),所以会导致丢弃的数据更多,也就造成了追数据时,统计值偏低的现象。 完美的解决方案还需要等 FLIP-27

Re: Job manager URI rpc address:port

2020-04-19 Thread Zahid Rahman
Hi Tison, I think I may have found what I want in example 22. https://www.programcreek.com/java-api-examples/?api=org.apache.flink.configuration.Configuration I need to create Configuration object first as shown . Also I think flink-conf.yaml file may contain configuration for client rather

Re: Job manager URI rpc address:port

2020-04-19 Thread Som Lima
Thanks. flink-conf.yaml does allow me to do what I need to do without making any changes to client source code. But RemoteStreamEnvironment constructor expects a jar file as the third parameter also. RemoteStreamEnvironment

Re: Job manager URI rpc address:port

2020-04-19 Thread tison
You can change flink-conf.yaml "jobmanager.address" or "jobmanager.port" options before run the program or take a look at RemoteStreamEnvironment which enables configuring host and port. Best, tison. Som Lima 于2020年4月19日周日 下午5:58写道: > Hi, > > After running > > $ ./bin/start-cluster.sh > > The

Job manager URI rpc address:port

2020-04-19 Thread Som Lima
Hi, After running $ ./bin/start-cluster.sh The following line of code defaults jobmanager to localhost:6123 final ExecutionEnvironment env = Environment.getExecutionEnvironment(); which is same on spark. val spark = SparkSession.builder.master(local[*]).appname("anapp").getOrCreate

Re: 1.10任务执行过程--源码的一些疑问

2020-04-19 Thread tison
invokable 一般是 StreamTask 或者它的子类 StreamSourceTask,具体的 UDF 在 StreamTask 里,有几层包装。 MailBox 那些其实是一个简单的 EventLoop 实现,或者你理解为 Actor Model 的实现也行,可以参考这些名词的解释文章一一对应。 Best, tison. 祝尚 <17626017...@163.com> 于2020年4月19日周日 下午5:43写道: > Hi,all > 在阅读1.10有关job执行过程相关源码时遇到一些疑问,我在看到Task#doRun()方法 >

1.10任务执行过程--源码的一些疑问

2020-04-19 Thread 祝尚
Hi,all 在阅读1.10有关job执行过程相关源码时遇到一些疑问,我在看到Task#doRun()方法 invokable.invoke();具体执行过程应该在这个方法里吧? 进一步看了StreamTask#invoke()->runMailboxLoop();继续往下深入也没发现最终调用udf的入口 问题1:MailboxProcessor、Mailbox、Mail这些概念什么意思,什么作用? 然而在另一处实例化AbstractInvokable时,比如StreamTask构造函数里会调用processInput方法,这个就类似1.9之前的实现方式了

Re: How to to in Flink to support below HIVE SQL

2020-04-19 Thread Rui Li
Hey Xiaohua & Jark, I'm sorry for overlooking the email. Adding to Jark's answers: DISTRIBUTE BY => the functionality and syntax are not supported. We can consider this as a candidate feature for 1.12. named_struct => you should be able to call this function with Hive module LATERAL VIEW => the

Re: Flink Serialization as stable (kafka) output format?

2020-04-19 Thread Robert Metzger
Hey Theo, we recently published a blog post that answers your request for a comparison between Kryo and Avro in Flink: https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html On Tue, Mar 10, 2020 at 9:27 AM Arvid Heise wrote: > Hi Theo, > > I strongly discourage the use