Re: Flink ML feature

2019-12-11 Thread Till Rohrmann
Hi guys, it is true that we dropped Flink-ML with 1.9. The reason is that the community started working on a new ML library which you can find under flink-ml-parent [1]. This module contains the framework for building ML pipelines but not yet too many algorithms iirc. The plan is to extend this

Re: Flink 'Job Cluster' mode Ui Access

2019-12-11 Thread Chesnay Schepler
Would it be possible for you to provide us with full debug log file? On 10/12/2019 18:07, Jatin Banger wrote: Yes, I did. On Tue, Dec 10, 2019 at 3:47 PM Arvid Heise > wrote: Hi Jatin, just to be sure. Did you increase the log level to debug [1]

Re: Need help using AggregateFunction instead of FoldFunction

2019-12-11 Thread Arvid Heise
Hi Devin, for event-time based windows, you need to give Flink two types of information: - timestamp of records, which I assume is in your case already embedded into the Pulsar records - and a watermark assigner. The watermarks help Flink to determine when windows can be closed in respect to

Re: Thread access and broadcast state initialization in BroadcastProcessFunction

2019-12-11 Thread Timo Walther
1. Yes, methods will only be called by one thread. The FLink API aims to abstract all concurrency topics away when using the provided methods and state. 2. The open() method should always be the first method being called. If this is not the case, this is definitely a bug. Which Flink version

Re: Interval Join Late Record Metrics

2019-12-11 Thread Congxian Qiu
Hi Chris >From the code[1], currently, IntervalJoin will ignore the late data silently, maybe you can create an issue to track this. [1]

Open Method is not being called in case of AggregateFunction UDFs

2019-12-11 Thread Arujit Pradhan
Hi all, So we are creating some User Defined Functions of type AggregateFunction. And we want to send some static metrics from the *open()* method of the UDFs as we can get *MetricGroup *by *FunctionContext *which is only exposed in the open method. Our code looks something like this(Which is an

Re: Open Method is not being called in case of AggregateFunction UDFs

2019-12-11 Thread Dawid Wysakowicz
Hi Arujit, Could you also share the query where you use this UDF? It would also help if you said which version of Flink you are using and which planner. Best, Dawid On 11/12/2019 10:21, Arujit Pradhan wrote: > Hi all, > > So we are creating some User Defined Functions of type >

Re: Thread access and broadcast state initialization in BroadcastProcessFunction

2019-12-11 Thread KristoffSC
Hi Vino, Thank you for your response and provided links. So just to clarify and small follow up. 1. Methods will be called only by one thread right? 2. The links you provided are tackling a case when we got a "fast stream" element before we received broadcast stream element. In my case we had

Re: Request for removal from subscription

2019-12-11 Thread Tom Blackwood
Please send a message to: user-unsubscr...@flink.apache.org for unsubscribing. On Wed, Dec 11, 2019 at 1:39 PM L Jainkeri, Suman (Nokia - IN/Bangalore) < suman.l_jaink...@nokia.com> wrote: > Unsubscribe >

Re: Processing Events by custom rules kept in Broadcast State

2019-12-11 Thread Timo Walther
Hi, I think when it comes to the question "What data type should I put in state?", this question should usually be answered with a well-defined data structure that allows for future state upgrades. Like defining a database schema. So I would not put "arbirary" classes such as Jackson's

Re: Apache Flink - Clarifications about late side output

2019-12-11 Thread Timo Walther
Little mistake: The key must be any constant instead of `e`. On 11.12.19 11:42, Timo Walther wrote: Hi Mans, I would recommend to create a little prototype to answer most of your questions in action. You can simple do: stream = env.fromElements(1L, 2L, 3L, 4L)   

Re: Open Method is not being called in case of AggregateFunction UDFs

2019-12-11 Thread Timo Walther
At least I hope it has been fixed. Which version and planner are you using? On 11.12.19 11:47, Arujit Pradhan wrote: Hi Timo, Thanks for the bug reference. You mentioned that this bug has been fixed. Is the fix available for flink 1.9+ and default query planner. Thanks and regards,

Re: Apache Flink - Clarifications about late side output

2019-12-11 Thread Timo Walther
Hi Mans, I would recommend to create a little prototype to answer most of your questions in action. You can simple do: stream = env.fromElements(1L, 2L, 3L, 4L) .assignTimestampsAndWatermarks( new AssignerWithPunctuatedWatermarks{ extractTimestamp(e) = e,

Re: Order events by filed that does not represent time

2019-12-11 Thread David Anderson
Krzysztof, Note that if you want to have Flink treat these sequence numbers as event time timestamps, you probably can, so long as they are generally increasing, and there's some bound on how out-of-order they can be. The advantage to doing this is that you might be able to use Flink SQL's event

Re: Apache Flink - Clarifications about late side output

2019-12-11 Thread David Anderson
I'll attempt to answer your questions. If we have allowed lateness to be greater than 0 (say 5), then if an event > which arrives at window end + 3 (within allowed lateness), > (a) it is considered late and included in the window function as a > late firing ? > An event with a timestamp that

Re: Order events by filed that does not represent time

2019-12-11 Thread Timo Walther
Hi Krzysztof, first of all Flink does not sort events based on timestamp. The concept of watermarks just postpones the triggering of a time operation until the watermark says all events until a time t have arrived. For your problem, you can simply use a ProcessFunction and buffer the events

Re: Localenvironment jobcluster ha High availability

2019-12-11 Thread Gary Yao
Hi Eric, What you say should be possible because your job will be executed in a MiniCluster [1] which has HA support. I have not tried this out myself, and I am not aware that people are doing this in production. However, there are integration tests that use MiniCluster + ZooKeeper [2]. Best,

Re: Apache Flink - Retries for async processing

2019-12-11 Thread M Singh
Thanks Zhu for your advice.  Mans On Tuesday, December 10, 2019, 09:32:01 PM EST, Zhu Zhu wrote: Hi M Singh, I think you would be able to know the request failure cause and whether it is recoverable or not.You can handle the error as you like. For example, if you think the error is

Re: Apache Flink - Clarifications about late side output

2019-12-11 Thread M Singh
Thanks Timo for your answer.  I will try the prototype but was wondering if I can find some theoretical documentation to give me a sound understanding. Mans On Wednesday, December 11, 2019, 05:44:15 AM EST, Timo Walther wrote: Little mistake: The key must be any constant instead of

Re: Open Method is not being called in case of AggregateFunction UDFs

2019-12-11 Thread Timo Walther
I remember that we fixed some bug around this topic recently. The legacy planner should not be affected. There is another user reporting this: https://issues.apache.org/jira/browse/FLINK-15040 Regards, Timo On 11.12.19 10:34, Dawid Wysakowicz wrote: Hi Arujit, Could you also share the query

Scala case class TypeInformation and Serializer

2019-12-11 Thread 杨光
Hi, I'm working on write a flink stream job with scala api , how should I find out which class is serialied by flink type serializer and which is falled back to generic Kryo serializer. And if one class falls back to Kryo serializer, how can I make some extend the TypeInfo classes of Flink or

Re: Apache Flink - Clarifications about late side output

2019-12-11 Thread David Anderson
> > If we have allowed lateness to be greater than 0 (say 5), then if an event > which arrives at window end + 3 (within allowed lateness), > > (a) it is considered late and included in the window function as a > late firing ? > An event with a timestamp that falls within the window's

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Jark Wu
Thanks Hequn for helping out this release and being the release manager. Great work! Best, Jark On Thu, 12 Dec 2019 at 15:02, Jeff Zhang wrote: > Great work, Hequn > > Dian Fu 于2019年12月12日周四 下午2:32写道: > >> Thanks Hequn for being the release manager and everyone who contributed >> to this

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Wei Zhong
Thanks Hequn for being the release manager. Great work! Best, Wei > 在 2019年12月12日,15:27,Jingsong Li 写道: > > Thanks Hequn for your driving, 1.8.3 fixed a lot of issues and it is very > useful to users. > Great work! > > Best, > Jingsong Lee > > On Thu, Dec 12, 2019 at 3:25 PM jincheng sun

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Jeff Zhang
Great work, Hequn Dian Fu 于2019年12月12日周四 下午2:32写道: > Thanks Hequn for being the release manager and everyone who contributed to > this release. > > Regards, > Dian > > 在 2019年12月12日,下午2:24,Hequn Cheng 写道: > > Hi, > > The Apache Flink community is very happy to announce the release of Apache >

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-11 Thread ouywl
HI yang,   Could you give more info detail? log4j.properties content, and The k8s yaml. Is use the dockerfile in flink-container? When I test it use the default per-job yaml in flick-container? It is only show logs in docker infos. And not logs in /opt/flink/log.

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread jincheng sun
Thanks for being the release manager and the great work Hequn :) Also thanks to the community making this release possible! Best, Jincheng Jark Wu 于2019年12月12日周四 下午3:23写道: > Thanks Hequn for helping out this release and being the release manager. > Great work! > > Best, > Jark > > On Thu, 12

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Jingsong Li
Thanks Hequn for your driving, 1.8.3 fixed a lot of issues and it is very useful to users. Great work! Best, Jingsong Lee On Thu, Dec 12, 2019 at 3:25 PM jincheng sun wrote: > Thanks for being the release manager and the great work Hequn :) > Also thanks to the community making this release

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Dian Fu
Thanks Hequn for being the release manager and everyone who contributed to this release. Regards, Dian > 在 2019年12月12日,下午2:24,Hequn Cheng 写道: > > Hi, > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.8.3, which is the third bugfix release for the

Re: Scala case class TypeInformation and Serializer

2019-12-11 Thread Yun Tang
Hi Would you please give related code? I think it might due to insufficient hint to type information. Best Yun Tang From: 杨光 Date: Wednesday, December 11, 2019 at 7:20 PM To: user Subject: Scala case class TypeInformation and Serializer Hi, I'm working on write a flink stream job with

Re: Open Method is not being called in case of AggregateFunction UDFs

2019-12-11 Thread Jark Wu
Hi Arujit, Thanks for reporting this. Are you using this UDF in window aggregation in old planner ? AFAIK, open() method of UDAF is only not called in window aggregation in old planner, because old planner uses DataStream WindowOperator which will not call open() on AggregateFunction [1]. I also

Re: Apache Flink - Clarifications about late side output

2019-12-11 Thread M Singh
Thanks David for your detailed answers.   Mans On Wednesday, December 11, 2019, 08:12:51 AM EST, David Anderson wrote: If we have allowed lateness to be greater than 0 (say 5), then if an event which arrives at window end + 3 (within allowed lateness),      (a) it is considered late

Re: Interval Join Late Record Metrics

2019-12-11 Thread Chris Gillespie
Thanks Congxian, I made a JIRA to track this request. https://issues.apache.org/jira/browse/FLINK-15202 On Wed, Dec 11, 2019 at 12:56 AM Congxian Qiu wrote: > Hi Chris > > From the code[1], currently, IntervalJoin will ignore the late data > silently, maybe you can create an issue to track

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-11 Thread Li Peng
Hey Yang, here are the commands: "/opt/flink/bin/taskmanager.sh", "start-foreground", "-Djobmanager.rpc.address={{ .Chart.Name }}-job-manager", "-Dtaskmanager.numberOfTaskSlots=1" "/opt/flink/bin/standalone-job.sh", "start-foreground", "-Djobmanager.rpc.address={{ .Chart.Name }}-job-manager",

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-11 Thread Yang Wang
Hi Peng, What i mean is to use `docker exec` into the running pod and `ps` to get the real command that is running for jobmanager. Do you have checked the /opt/flink/conf/log4j.properties is right? I have tested standalone per-job on my kubernetes cluster, the logs show up as expected. Best,

Re:回复:窗口去重

2019-12-11 Thread yanggang_it_job
我觉得可以这样处理:1:首先把你的stream流注册为表(不管是一个还是多个stream)2:然后对这个表使用FLINKSQL进行业务表达3:最后使用FLINK SQL提供的开窗函数指定想要去重的字段注意:控制state的大小参考文档:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html#deduplication 在 2019-12-11 15:53:00,"Jimmy Wong" 写道: >属于不同的window,是window内去重,window间不去重 > > >| |

Re: Flink State 过期清除 TTL 问题

2019-12-11 Thread 陈帅
我们也遇到过类似的问题,有可能是进来的数据量带来的状态增长速度大于状态过期清理速度。另外想问一下有没有metrics监控到每次清理过期状态的大小和时间? Yun Tang 于2019年12月10日周二 下午8:30写道: > Hi 王磊 > > Savepoint目录中的数据的时间戳不会在恢复的时候再更新为当前时间,仍然为之前的时间,从代码上看如果你配置了cleanupFullSnapshot就会生效的,另外配置 > cleanupInRocksdbCompactFilter >

Re: flink savepoint checkpoint

2019-12-11 Thread 陈帅
flink 1.9里面支持cancel job with savepoint功能 https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#cancel-job-with-savepoint checkpoint可能是增量的,但savepoint是全量的。具体区别可以参考 https://www.ververica.com/blog/differences-between-savepoints-and-checkpoints-in-flink lucas.wu

flink持续查询过去30分钟登录网站的人数

2019-12-11 Thread 陈帅
例如,用户在以下时间点登录:无, 12:02, 12:13, 12:15, 12:31, 12:40, 12:45, 无 那么我期望在以下时间点(实际查询可能在任意时间点)获取到的结果数为 12:01 (0), 12:03:(1), 12:14 (2), 12:16(3), 12:30 (4), 12:35 (4), 12:41 (5), 12:46 (4), 13:16 (0) 即每个元素进来就会设一个30分钟过期时间,窗口状态是当前还未过期元素集合。 用flink stream api和flink

回复: flink持续查询过去30分钟登录网站的人数

2019-12-11 Thread Yuan,Youjun
首先通过一个自定义表函数(table function),将每条输入的消息变成3条消息,假设输入消息的时间为ts,产出的三条消息时间分别为:(ts-1, 0), (ts+1, 1), (ts+31, 0), 然后再用30分钟的range over窗口对前面的数值部分(0或者1)求SUM 袁尤军 -邮件原件- 发件人: 陈帅 发送时间: Wednesday, December 11, 2019 9:31 PM 收件人: user-zh@flink.apache.org 主题: flink持续查询过去30分钟登录网站的人数 例如,用户在以下时间点登录:无,