In my case, the relationship between input and output events is that output
events are generated out of some rules based on input events. Essentially,
output events correspond to specific patterns / sequences of input events.
You can think of output events as detecting certain anomalies or
Hi tison,
I think i get your concerns and points.
Take both FLINK-13938[1] and FLINK-14964[2] into account, i will do in the
following steps.
* Enrich "-yt/--yarnship" to support HDFS directory
* Enrich "-yt/--yarnship" to specify local resource visibility. It is
"APPLICATION" by default. It
Hi Tison, Jark Wu:
Thanks for your reply !!!
What's the statebackend are you using? Is it Heap statebackend?
rocksdb backend uses incremental checkpoint.
Could you share the stack traces?
I looked at the flame chart myself and found that it was stuck at the end
of the
应该是不会的。分配不到partition的source会标记为idle状态。
Sun.Zhu <17626017...@163.com> 于2020年4月20日周一 上午10:28写道:
> Hi,benchao,source并发度大于partition数的话,会导致不做checkpoint的问题吧
>
>
>
>
> | |
> Sun.Zhu
> |
> |
> 邮箱:17626017...@163.com
> |
>
> Signature is customized by Netease Mail Master
>
> 在2020年04月19日 22:43,人生若只如初见
Hi Kurt,
谢谢, 我了解过后如果有问题再请教
Best
Eleanore
On Sun, Apr 19, 2020 at 7:18 PM Kurt Young wrote:
> 可以看下这个jira:https://issues.apache.org/jira/browse/FLINK-14807
>
> Best,
> Kurt
>
>
> On Mon, Apr 20, 2020 at 7:07 AM Eleanore Jin
> wrote:
>
> > Hi,
> > 刚刚读到一篇关于Flink 在OLAP 上的使用案例 (
> >
>
Hi,benchao??source??partition??checkpoint
| |
Sun.Zhu
|
|
??17626017...@163.com
|
Signature is customized by Netease Mail Master
??2020??04??19?? 22:43 ??
??
----
??:"Benchao
Glad to hear that.
Som Lima 于2020年4月20日周一 上午8:08写道:
> I will thanks. Once I had it set up and working.
> I switched my computers around from client to server to server to client.
> With your excellent instructions I was able to do it in 5 .minutes
>
> On Mon, 20 Apr 2020, 00:05 Jeff Zhang,
可以看下这个jira:https://issues.apache.org/jira/browse/FLINK-14807
Best,
Kurt
On Mon, Apr 20, 2020 at 7:07 AM Eleanore Jin wrote:
> Hi,
> 刚刚读到一篇关于Flink 在OLAP 上的使用案例 (
> https://ververica.cn/developers/olap-engine-performance-optimization-and-application-cases/),
> 其中一点提到了:
> [image: image.png]
>
I will thanks. Once I had it set up and working.
I switched my computers around from client to server to server to client.
With your excellent instructions I was able to do it in 5 .minutes
On Mon, 20 Apr 2020, 00:05 Jeff Zhang, wrote:
> Som, Let us know when you have any problems
>
> Som
Som, Let us know when you have any problems
Som Lima 于2020年4月20日周一 上午2:31写道:
> Thanks for the info and links.
>
> I had a lot of problems I am not sure what I was doing wrong.
>
> May be conflicts with setup from apache spark. I think I may need to
> setup users for each development.
>
>
>
Hi,
刚刚读到一篇关于Flink 在OLAP 上的使用案例 (
https://ververica.cn/developers/olap-engine-performance-optimization-and-application-cases/),
其中一点提到了:
[image: image.png]
这部分优化,源于 OLAP 的一个特性:OLAP 会将最终计算结果发给客户端,通过JobManager 转发给 Client。
想请问一下这一点是如何实现的: 通过JobManager 把结果转发给Client.
谢谢!
Eleanore
Hi,
I am having a problem getting watermark right. The setup is
- I have a Flink Job which reads from a Kafka topic, uses Protobuf
Deserialization, uses Sliding Window of (120seconds, 30 seconds), sums up
the value and finally returns the result.
The code is pasted below.
The problem here is,
Thanks for the info and links.
I had a lot of problems I am not sure what I was doing wrong.
May be conflicts with setup from apache spark. I think I may need to setup
users for each development.
Anyway I kept doing fresh installs about four altogether I think.
Everything works fine now
Hi Salva
I think this depends on what the relationship between you output and input
events. If the output ones are just simple wrapper of input ones, e.g. adding
some simple properties or just read from one place and write to another place,
I think the output events could hold time which is
My flink application generates output (complex) events based on the
processing of (simple) input events. The generated output events are to be
consumed by other external services. My application works using event-time
semantics, so I am bit in doubt regarding what should I use as the output
Hello~ 想再确认一下预期的行为:现在是希望后面重新写之后,用新写过的part-xx来覆盖之前生成的文件么~?
--
From:酷酷的浑蛋
Send Time:2020 Apr. 18 (Sat.) 20:32
To:user-zh
Subject:关于StreamingFileSink
我在用StreamingFileSink
hi,tison,jiacheng感谢解答,按照你说的又仔细看了一遍,确实如此,在实例化MailboxProcessor有把processInput当做参数传进去,在MailboxProcessor#runMailboxLoop中会去执行defaultAction方法
while (processMail(localMailbox)) {
mailboxDefaultAction.runDefaultAction(defaultActionContext); // lock is
acquired inside default action as needed
}
再次感谢
??
----
??:"Benchao Li"
Hi Som,
You can take a look at flink on zeppelin, in zeppelin you can connect to a
remote flink cluster via a few configuration, and you don't need to worry
about the jars. Flink interpreter will ship necessary jars for you. Here's
a list of tutorials.
1) Get started
如果是这种情况,可以让你的source的并发度大于等于kafka partition的数量来避免一下。
Jark Wu 于2020年4月19日周日 下午8:22写道:
> Hi,
>
> 根据你描述的现象,以及提供的代码。我觉得原因应该是数据乱序导致的。
> 根据你的 Java 代码,数据的 event time 不是单调递增的,会有一定程度的乱序,这种乱序在作业正常运行时影响不大(watermark
> 能容忍 5s 乱序).
> 但是在追数据时,由于 flink 目前还没有做到event time 对齐,所以会导致追数据时某些 partition 进度比某些 partition
??MailboxProcessorstreamtask??processInputMailboxDefaultAction??MailboxProcessorInputStatus
status =
inputProcessor.processInput();inputProcessor??StreamOneInputProcessor??InputStatus
status =
Hi,
根据你描述的现象,以及提供的代码。我觉得原因应该是数据乱序导致的。
根据你的 Java 代码,数据的 event time 不是单调递增的,会有一定程度的乱序,这种乱序在作业正常运行时影响不大(watermark
能容忍 5s 乱序).
但是在追数据时,由于 flink 目前还没有做到event time 对齐,所以会导致追数据时某些 partition 进度比某些 partition
进度快很多的现象,
导致乱序程度拉大(如原先迟到最久的数据时4s,现在可能是10s),所以会导致丢弃的数据更多,也就造成了追数据时,统计值偏低的现象。
完美的解决方案还需要等 FLIP-27
Hi Tison,
I think I may have found what I want in example 22.
https://www.programcreek.com/java-api-examples/?api=org.apache.flink.configuration.Configuration
I need to create Configuration object first as shown .
Also I think flink-conf.yaml file may contain configuration for client
rather
Thanks.
flink-conf.yaml does allow me to do what I need to do without making any
changes to client source code.
But
RemoteStreamEnvironment constructor expects a jar file as the third
parameter also.
RemoteStreamEnvironment
You can change flink-conf.yaml "jobmanager.address" or "jobmanager.port"
options before run the program or take a look at RemoteStreamEnvironment
which enables configuring host and port.
Best,
tison.
Som Lima 于2020年4月19日周日 下午5:58写道:
> Hi,
>
> After running
>
> $ ./bin/start-cluster.sh
>
> The
Hi,
After running
$ ./bin/start-cluster.sh
The following line of code defaults jobmanager to localhost:6123
final ExecutionEnvironment env = Environment.getExecutionEnvironment();
which is same on spark.
val spark =
SparkSession.builder.master(local[*]).appname("anapp").getOrCreate
invokable 一般是 StreamTask 或者它的子类 StreamSourceTask,具体的 UDF 在 StreamTask
里,有几层包装。
MailBox 那些其实是一个简单的 EventLoop 实现,或者你理解为 Actor Model 的实现也行,可以参考这些名词的解释文章一一对应。
Best,
tison.
祝尚 <17626017...@163.com> 于2020年4月19日周日 下午5:43写道:
> Hi,all
> 在阅读1.10有关job执行过程相关源码时遇到一些疑问,我在看到Task#doRun()方法
>
Hi,all
在阅读1.10有关job执行过程相关源码时遇到一些疑问,我在看到Task#doRun()方法
invokable.invoke();具体执行过程应该在这个方法里吧?
进一步看了StreamTask#invoke()->runMailboxLoop();继续往下深入也没发现最终调用udf的入口
问题1:MailboxProcessor、Mailbox、Mail这些概念什么意思,什么作用?
然而在另一处实例化AbstractInvokable时,比如StreamTask构造函数里会调用processInput方法,这个就类似1.9之前的实现方式了
Hey Xiaohua & Jark,
I'm sorry for overlooking the email. Adding to Jark's answers:
DISTRIBUTE BY => the functionality and syntax are not supported. We can
consider this as a candidate feature for 1.12.
named_struct => you should be able to call this function with Hive module
LATERAL VIEW => the
Hey Theo,
we recently published a blog post that answers your request for a
comparison between Kryo and Avro in Flink:
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
On Tue, Mar 10, 2020 at 9:27 AM Arvid Heise wrote:
> Hi Theo,
>
> I strongly discourage the use
30 matches
Mail list logo