Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread vino yang
Congratulations! highfei2...@126.com 于2019年8月7日周三 下午7:09写道: > Congrats Hequn! > > Best, > Jeff Yang > > > Original Message > Subject: Re: [ANNOUNCE] Hequn becomes a Flink committer > From: Piotr Nowojski > To: JingsongLee > CC: Biao Liu ,Zhu Zhu ,Zili Chen ,Jeff Zhang ,Paul Lam

Re: [ANNOUNCE] Kinesis connector becomes part of Flink releases

2019-09-03 Thread vino yang
Good news! Thanks for your efforts, Bowen! Best, Vino Yu Li 于2019年9月2日周一 上午6:04写道: > Great to know, thanks for the efforts Bowen! > > And I believe it worth a release note in the original JIRA, wdyt? Thanks. > > Best Regards, > Yu > > > On Sat, 31 Aug 2019 at 11:01, Bowen Li wrote: > >> Hi all

Re: Comparing Storm and Flink resource requirements

2019-10-22 Thread vino yang
Hi Gyula, Based on our previous experience switching from Storm to Flink. For the same business, resources of the same size are completely sufficient, and the performance indicators are slightly better than Storm. As you said, this may be related to using some of Flink's special features like stat

Re: multi-tenancy without a kafka partition per tenant

2019-10-23 Thread vino yang
Hi Constantinos, I think your analysis is correct, if you have a multi-tenant scenario, but there is no distinction in Kafka. Then Flink can't treat different tenants differently. It is easy to form a data hotspot problem for the difference in the data volume of different tenants. A compromise is

Re: Docker flink 1.9.1

2019-10-23 Thread vino yang
Hi Farouk, Not long after Flink 1.9.1 was released, the community may not have time to provide the corresponding Dockerfiles. I can give you some information: Flink's official docker file is maintained in this repository. [1] I have seen many versions of docker files contributed by patricklucas[

Re: How many events can Flink process each second

2019-10-23 Thread vino yang
Hi A.V. Add a few more points to the previous two answers. There is no clear answer to this question. In addition to resource issues, it depends on the size of the messages you are dealing with and the complexity of the logic. If you don't consider a lot of extra factors, look at the performance o

Re: Docker flink 1.9.1

2019-10-24 Thread vino yang
OK, sounds good to me. Farouk 于2019年10月24日周四 下午3:23写道: > Hi > > I checked there is 2 PR on github for Flink 1.9.1. > > It should come soon enough :) > > Thanks > > Farouk > > Le mer. 23 oct. 2019 à 17:46, vino yang a écrit : > >> Hi Farouk, >>

Re: How to create an empty test stream

2019-10-24 Thread vino yang
Hi Dmitry, Perhaps an easy way is to customize a source function. Then in the run method, start an empty loop? But I don't understand the meaning of starting a stream pipeline without generating data. Best, Vino Dmitry Minaev 于2019年10月24日周四 上午6:16写道: > Hi everyone, > > I have a pipeline where

Re: How to create an empty test stream

2019-10-25 Thread vino yang
erything from the stream. > > -- Dmitry > > On Thu, Oct 24, 2019 at 2:29 AM vino yang wrote: > >> Hi Dmitry, >> >> Perhaps an easy way is to customize a source function. Then in the run >> method, start an empty loop? But I don't understand the meaning of starti

Re: Can flink 1.9.1 use flink-shaded 2.8.3-1.8.2

2019-10-25 Thread vino yang
Hi Jeff, Maybe @Chesnay Schepler could tell you the answer. Best, Vino Jeff Zhang 于2019年10月25日周五 下午3:54写道: > Hi all, > > There's no new flink shaded release for flink 1.9, so I'd like to confirm > with you can flink 1.9.1 use flink-shaded 2.8.3-1.8.2 ? Or the flink shaded > is not necessary f

Re: Cannot modify parallelism (rescale job) more than once

2019-10-28 Thread vino yang
Hi Pankaj, It seems it is a bug. You can report it by opening a Jira issue. Best, Vino Pankaj Chand 于2019年10月28日周一 上午10:51写道: > Hello, > > I am trying to modify the parallelism of a streaming Flink job (wiki-edits > example) multiple times on a standalone cluster (one local machine) having > t

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread vino yang
Hi Michael, You may need to know `KeyedOneInputStreamOperatorTestHarness` test class. You can consider `WindowTranslationTest#testAggregateWithWindowFunctionEventTime` or `WindowTranslationTest#testAggregateWithWindowFunctionProcessingTime`[1](both of them call `processElementAndEnsureOutput`) as

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread vino yang
erWithClientResource in order to use > StreamExecutionEnvironment? > > > > > > Best, > > Michael > > > > > > *From: *vino yang > *Date: *Monday, October 28, 2019 at 1:32 AM > *To: *Michael Nguyen > *Cc: *"user@flink.apache.org" > *Subjec

Re: Add custom fields into Json

2019-10-29 Thread vino yang
Hi, The exception shows your SQL statement has grammatical errors. Please check it again or provide the whole SQL statement here. Best, Vino srikanth flink 于2019年10月29日周二 下午2:51写道: > Hi there, > > I'm querying json data and is working fine. I would like to add custom > fields including the que

Re: Flink checkpointing behavior

2019-10-29 Thread vino yang
Hi Amran, See my inline answers. Best, Vino amran dean 于2019年10月30日周三 上午2:59写道: > Hello, > Exact semantics for checkpointing/task recovery are still a little > confusing to me after parsing docs: so a few questions. > > - What does Flink consider a task failure? Is it any exception that the >

Re: Flink batch app occasionally hang

2019-10-29 Thread vino yang
Hi Caio, Because it involves interaction with external systems. It would be better if you can provide the full logs. Best, Vino Caio Aoque 于2019年10月30日周三 上午8:31写道: > Hi, I've been running some flink scala applications on an AWS EMR cluster > (version 5.26.0 with flink 1.8.0 for scala 2.11) for

Re: flink run jar throw java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

2019-10-30 Thread vino yang
Hi Franke, >From the information provided by Alex: >> mvn build jar include com.mysql.jdbc.Driver. it seems he has packaged a fat jar? Best, Vino Jörn Franke 于2019年10月30日周三 下午2:47写道: > > > You can create a fat jar (also called Uber jar) that includes all > dependencies in your application ja

Re: Flink S3 error

2019-10-30 Thread vino yang
Hi Harrison, So did you check whether the file exists or not? And what's your question? Best, Vino Harrison Xu 于2019年10月31日周四 上午5:24写道: > I'm seeing this exception with the S3 uploader - it claims a previously > part file was not found. Full jobmanager logs attached. (Flink 1.8) > > java.io.Fi

Re: Sending custom statsd tags

2019-10-31 Thread vino yang
Hi Prakhar, You need to customize StatsDReporter[1] in the Flink source. If you want to flexibly get configurable tags from the configuration file[2], you can refer to the implementation of DatadogHttpReporter#open[3] (for reference only how to get the tag). Best, Vino [1]: https://github.com/a

Re: Stateful functions presentation code (UI part)

2019-10-31 Thread vino yang
Hi Flavio, Please see this link.[1] Best, Vino [1]: https://github.com/ververica/stateful-functions/tree/master/stateful-functions-examples/stateful-functions-ridesharing-example Flavio Pompermaier 于2019年10月31日周四 下午4:53写道: > Hi to all, > yould it be possible to provide also the source code of

Re: Async operator with a KeyedStream

2019-11-01 Thread vino yang
Hi Bastien, Your analysis of using KeyedStream in Async I/O is correct. It will not figure out the key. In your scene, the good practice about interacting with DB is async I/O + thread pool[1] + connection Pool. You can use a connection pool to reuse and limit the mysql connection. Best, Vino

Re: Checkpoint in FlinkSQL

2019-11-04 Thread vino yang
Hi Simon, Absolutely, yes. Before using Flink SQL, you need to initialize a StreamExecutionEnvirnoment instance[1], then call StreamExecutionEnvirnoment#setStateBackend or StreamExecutionEnvirnoment#enableCheckpointing to specify the information what you want. [1]: https://ci.apache.org/projects/

Re: Partitioning based on key flink kafka sink

2019-11-06 Thread vino yang
Hi Vishwas, You should pay attention to the other args. The constructor provided by you has a `KeyedSerializationSchema` arg, while the comments of the constructor which made you confused only has a `SerializationSchema` arg. That's their difference. Best, Vino Vishwas Siravara 于2019年11月6日周三 上

Re: Limit max cpu usage per TaskManager

2019-11-06 Thread vino yang
Hi Lu, When using Flink on YARN, it will rely on YARN's resource management capabilities, and Flink cannot currently limit CPU usage. Also, what version of Flink do you use? As far as I know, since Flink 1.8, the -yn parameter will not work. Best, Vino Lu Niu 于2019年11月6日周三 下午1:29写道: > Hi, > >

Re: RocksDB and local file system

2019-11-06 Thread vino yang
Hi Jaqie, For testing, you can use the local file system pattern (e.g. "file:///"). Technically speaking, it's OK to specify the string path provided by you. However, in the production environment, we do not recommend using the local file system. Because it does not provide high availability. Be

Re: What is the slot vs cpu ratio?

2019-11-06 Thread vino yang
Hi srikanth, Referred from the official document: "Each Flink TaskManager provides processing slots in the cluster. The number of slots is typically proportional to the number of available CPU cores of each TaskManager. As a general recommendation, the number of available CPU cores is a good defa

Re: When using udaf, the startup job has a “Cannot determine simple type name 'com' ” exception(Flink version 1.7.2)

2019-11-06 Thread vino yang
Hi mailtolrl, Can you share more context about your program and UDAF. Best, Vino mailtolrl 于2019年11月7日周四 下午3:05写道: > My flink streaming job use a udaf, set 60 parallelisms,submit job in yarn > cluster mode,and then happens every time I start. > > > >

Re: flink's hard dependency on zookeeper for HA

2019-11-07 Thread vino yang
Hi Vishwas, In the standalone cluster HA mode, Flink heavily depends on ZooKeeper. Not only for leader election, but also for: - Checkpoint metadata info; - JobGraph store; - So you should make sure your ZooKeeper Cluster works normally. More details please see[1][2]. Best, Vino

Re: Retrieving Flink job ID/YARN Id programmatically

2019-11-07 Thread vino yang
Hi Lei Nie, You can use `StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID` to get the job id. Best, Vino Lei Nie 于2019年11月8日周五 上午8:38写道: > Hello, > I am currently executing streaming jobs via StreamExecutionEnvironment. Is > it possible to retrieve the Flink job ID/YARN ID within

Re: static table in flink

2019-11-09 Thread vino yang
Hi Jaqie, If I understand your question correctly, it seems you are finding a solution about the Stream table and Dim table(you called static table) join. There were many users who asked this question. Linked some reply here[1][2] to let you consider. Best, Vino [1]: http://apache-flink-mailing

Re: Job Distribution Strategy On Cluster.

2019-11-11 Thread vino yang
Hi srikanth, What's your job's parallelism? In some scenes, many operators are chained with each other. if it's parallelism is 1, it would just use a single slot. Best, Vino srikanth flink 于2019年11月6日周三 下午10:03写道: > Hi there, > > I'm running Flink with 3 node cluster. > While running my jobs(

Re: Document an example pattern that makes sources and sinks pluggable in the production code for testing

2019-11-11 Thread vino yang
Hi Hung, Your suggestion is reasonable. Giving an example of a pluggable source and sink can make it more user-friendly, you can open a JIRA issue to see if there is anyone who wants to improve this. IMO, it's not very difficult to implement it. Because the source and sink in Flink has two unifie

Re: Flink (Local) Environment Thread Leaks?

2019-11-13 Thread vino yang
Hi Theo, If you think there is a thread leakage problem. You can create a JIRA issue and write a detailed description. Ping @Gary Yao and @Zhu Zhu to help to locate and analyze this problem? Best, Vino Theo Diefenthal 于2019年11月14日周四 上午3:16写道: > I included a Solr End2End test in my project,

Re: Initialization of broadcast state before processing main stream

2019-11-13 Thread vino yang
Hi Vasily, Currently, Flink did not do the coordination between a general stream and broadcast stream, they are both streams. Your scene of using the broadcast state is a special one. In a more general scene, the states need to be broadcasted is an unbounded stream, the state events may be broadca

Re: Flink on Yarn resource arrangement

2019-11-13 Thread vino yang
Hi Alex, Which Flink version are you using? AFAIK, since Flink 1.8+, the config option: "-yn" for Flink on YARN job cluster mode does not take effect(always 1 and would be overridden). So, the config option "-ys" and "-p" will decide the number of TM. The first example: -p(20)/-ys(3) should be

Re: slow checkpoints

2019-11-15 Thread vino yang
Hi Yubraj, So the frequent job failure is the root reason, you need to fix it. Yes, when too many messages are squashed into the message system. If the messages can not be consumed normally, there would exist catchup consuming which will cause your streaming system more pressure than usual. Best,

Re: how to setup a ha flink cluster on k8s?

2019-11-16 Thread vino yang
Hi Rock, I searched by Google and found a blog[1] talk about how to config JM HA for Flink on k8s. Do not know whether it suitable for you or not. Please feel free to refer to it. Best, Vino [1]: http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/ Rock 于2019年11月16日周六 上

Re: Flink configuration at runtime

2019-11-18 Thread vino yang
Hi Amran, Change the config option at runtime? No, Flink does not support this feature currently. However, for Flink on Yarn job cluster mode, you can specify different config options for different jobs via program or flink-conf.yaml(copy a new flink binary package then change config file). Best

Re: [ANNOUNCE] Launch of flink-packages.org: A website to foster the Flink Ecosystem

2019-11-19 Thread vino yang
Thanks Robert. Great job! The web site looks great. In the future, we can also add my Kylin Flink cube engine[1] to the ecosystem projects list. [1]: https://github.com/apache/kylin/tree/engine-flink Best, Vino Oytun Tez 于2019年11月19日周二 上午12:09写道: > Congratulations! This is exciting. > > > --

Re: [ANNOUNCE] Launch of flink-packages.org: A website to foster the Flink Ecosystem

2019-11-19 Thread vino yang
Hi Robert, Just added it under the "Tools" category[1]. [1]: https://flink-packages.org/packages/kylin-flink-cube-engine Best, Vino Robert Metzger 于2019年11月19日周二 下午4:33写道: > Thanks. > You can add Kylin whenever you think it is ready. > > On Tue, Nov 19, 2019 at 9:

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
Hi Piper, Can you share more reason and details of your requirements. Best, Vino Piper Piper 于2019年11月21日周四 上午5:48写道: > Hi, > > How can I make Flink's Resource Manager request YARN to spin up new (or > destroy/reclaim existing) TaskManagers in YARN containers? > > Preferably at runtime (i.e. d

Re: Completed job wasn't saved to archive

2019-11-20 Thread vino yang
If everything is OK(your config options about archive dir and history server is correct), Flink should archive the completed job. You said you did not find any exceptions in the log about failing to archive. But any other exceptions? Can you share the logs about your scene? Best, Vino Pavel Pots

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
i.e. Flink will > have persistent TMs/containers and request YARN for more TMs/containers > when needed (or release TMs/containers back to YARN). > > Thank you, > > Piper > > On Wed, Nov 20, 2019 at 9:39 PM vino yang wrote: > >> Hi Piper, >> >> C

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
er. > > Best, > Jingsong Lee > > > > On Thu, Nov 21, 2019 at 2:25 PM vino yang wrote: > >> Hi Piper, >> >> The understanding of two deploy modes For Flink on Yarn is right. >> >> AFAIK, The single job (job cluster) mode is more popular th

Re: Retrieving Flink job ID/YARN Id programmatically

2019-11-21 Thread vino yang
al job ID (from webUI): 1357d21be640b6a3b8a86a063f4bba8a > > > > > > > > On Thu, Nov 7, 2019 at 6:56 PM vino yang wrote: > > > > > > Hi Lei Nie, > > > > > > You can use > `StreamExecutionEnvironment#getStreamGraph#getJobGraph#getJobID`

Re: Flink on YARN: frequent "container released on a *lost* node"

2019-11-21 Thread vino yang
Hi Amran, Did you monitor or have a look at your memory metrics(e.g. full GC) of your TM. There is a similar thread that a user reported the same question due to full GC, the link is here[1]. Best, Vino [1]: http://mail-archives.apache.org/mod_mbox/flink-user/201709.mbox/%3cfa4068d3-2696-4f29-8

Re: How to submit two jobs sequentially and view their outputs in .out file?

2019-11-24 Thread vino yang
Hi Komal, Since you use the Flink standalone deployment mode, the tasks of the jobs which print information to the STDOUT may randomly deploy in any task manager of the cluster. Did you check other Task Managers out file? Best, Vino Komal Mariam 于2019年11月22日周五 下午6:59写道: > Dear all, > > Thank y

Re: flink session cluster ha on k8s

2019-11-24 Thread vino yang
Hi 祥才, You can refer to the reply of this old thread[1]. Best, Vino [1]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Re-how-to-setup-a-ha-flink-cluster-on-k8s-td31089.html 曾祥才 于2019年11月25日周一 上午9:28写道: > hi, is there any example about ha on k8s for flink sessi

Re: Side output from Flink Sink

2019-11-24 Thread vino yang
Hi Victor, Firstly, you can get your side output stream via OutputTag. Please refer to the official documentation[1]. Then, specify a sink for your side output stream. Of course, you can specify a Kafka sink. Best, Vino [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_

Re: Side output from Flink Sink

2019-11-24 Thread vino yang
strategies? > > I know the Sink is usually the "last" portion of a data stream as its name > indicates, but I was wondering if for some reason something can't be sinked > (after retries, etc), what is the usual way to deal with such cases? > > Thanks again for your

Re: How to submit two jobs sequentially and view their outputs in .out file?

2019-11-25 Thread vino yang
; > Thank you! That's exactly what's happening. Is there any way to force it > write to a specific .out of a TaskManager? > > > Best Regards, > Komal > > > > On Mon, 25 Nov 2019 at 11:10, vino yang wrote: > >> Hi Komal, >> >> Since you us

Re: Idiomatic way to split pipeline

2019-11-25 Thread vino yang
Hi Avi, As the doc of DataStream#split said, you can use the "side output" feature to replace it.[1] [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html Best, Vino Avi Levi 于2019年11月25日周一 下午4:12写道: > Hi, > I want to split the output of one of the operators

Re: Idiomatic way to split pipeline

2019-11-25 Thread vino yang
ue etc' ) . I thought there is more idiomatic but if this is > it, than I will go with that. > > On Mon, Nov 25, 2019 at 10:42 AM vino yang wrote: > >> *This Message originated outside your organization.* >> -- >> Hi Avi, >>

Re: Flink Kudu Connector

2019-11-25 Thread vino yang
Hi Rahul, Only found some resources from the Internet you can consider.[1][2] Best, Vino [1]: https://bahir.apache.org/docs/flink/current/flink-streaming-kudu/ [2]: https://www.slideshare.net/0xnacho/apache-flink-kudu-a-connector-to-develop-kappa-architectures Rahul Jain 于2019年11月25日周一 下午6:32写

Re: Pre-process data before it hits the Source

2019-11-25 Thread vino yang
Hi Vijay, IMO, the semantics of the source is not changeless. It can contain integrate with third-party systems and consume events. However, it can also contain more business logic about your data pre-process after consuming events. Maybe it needs some customization. WDYT? Best, Vino Vijay Bala

Re: Question about to modify operator state on Flink1.7 with State Processor API

2019-11-25 Thread vino yang
Hi Kaihao, Ping @Aljoscha Krettek @Tzu-Li (Gordon) Tai to give more professional suggestions. What's more, we may need to give a statement about if the state processor API can process the snapshots generated by the old version jobs. WDYT? Best, Vino Kaihao Zhao 于2019年11月25日周一 下午11:39写道: >

Re: Flink behavior as a slow consumer - out of Heap MEM

2019-11-25 Thread vino yang
Hi Hanan, Sometimes, the behavior depends on your implementation. Since it's not a built-in connector, it would be better to share your customized source with the community so that the community would be better to help you figure out where is the problem. WDYT? Best, Vino Hanan Yehudai 于2019年

Re: Pre-process data before it hits the Source

2019-11-26 Thread vino yang
.blogspot.com>* > > > On Tue, Nov 26, 2019 at 2:57 AM vino yang wrote: > >> Hi Vijay, >> >> IMO, the semantics of the source is not changeless. It can contain >> integrate with third-party systems and consume events. However, it can also >> contain mo

Re: Pre-process data before it hits the Source

2019-11-26 Thread vino yang
; Best, > Felipe > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Tue, Nov 26, 2019 at 10:09 AM vino yang wrote: > >> Hi Felipe, >> &g

Re: SQL Performance

2019-11-26 Thread vino yang
Hi Nick, Can you provide more details? Are you using JDBCOutputFormat? If yes, can `JDBCOutputFormatBuilder#setBatchInterval` help you? Best, Vino Nicholas Walton 于2019年11月26日周二 下午9:20写道: > I’m streaming records down to an Embedded Derby database, at a rate of > around 200 records per second.

Re: Flink 'Job Cluster' mode Ui Access

2019-11-27 Thread vino yang
Hi Jatin, Flink web UI does not depend on any deployment mode. You should check if there are error logs in the log file and the job status is running state. Best, Vino Jatin Banger 于2019年11月28日周四 下午3:43写道: > Hi, > > It seems there is Web Ui for Flink Session cluster, But for Flink Job > Clust

Re: Flink 'Job Cluster' mode Ui Access

2019-11-28 Thread vino yang
09 CST"} > > # curl :4081/jobs > {"jobs":[{"id":"___job_Id_____","status":"RUNNING"}]} > > Which shows the state of the job as running. > > What else can we do ? > > Best regards, > Jatin > > On Thu, Nov 28, 2019

Re: JobGraphs not cleaned up in HA mode

2019-11-28 Thread vino yang
Hi, Why do you not use HDFS directly? Best, Vino 曾祥才 于2019年11月28日周四 下午6:48写道: > > anyone have the same problem? pls help, thks > > > > -- 原始邮件 -- > *发件人:* "曾祥才"; > *发送时间:* 2019年11月28日(星期四) 下午2:46 > *收件人:* "Vijay Bhaskar"; > *抄送:* "User-Flink"; > *主题:* 回复: JobGra

Re: Auto Scaling in Flink

2019-11-28 Thread vino yang
Hi Akash, You can use Pravega connector to integrate with Flink, the source code is here[1]. In short, relying on its rescalable state feature[2] flink supports scalable streaming jobs. Currently, the mainstream solution about auto-scaling is Flink + K8S, I can share some resources with you[3].

Re: Read multiline JSON/XML

2019-11-29 Thread vino yang
Hi Flavio, IMO, it would take more effect to ask this question in the Spark user mailing list. WDYT? Best, Vino Flavio Pompermaier 于2019年11月29日周五 下午7:09写道: > Hi to all, > is there any out-of-the-box option to read multiline JSON or XML like in > Spark? > It would be awesome to have something

Re: Read multiline JSON/XML

2019-12-01 Thread vino yang
Also, say sorry to Flavio! Best, Vino vino yang 于2019年12月2日周一 上午10:29写道: > Hi Chesnay, > > Sorry, yes, I lost the "like" keyword. I mistakenly thought he wanted to > ask how to use Spark to accomplish this job. > > Best, > Vino > > Chesnay Schepler 于2

Re: Firing timers on ProcessWindowFunction

2019-12-02 Thread vino yang
Hi Avi, Firstly, let's clarify that the "timer" you said is the timer of the window? Or a timer you want to register to trigger some action? Best, Vino Avi Levi 于2019年12月2日周一 下午4:11写道: > Hi, > Is there a way to fire timer in a ProcessWindowFunction ? I would like to > mutate the global state

Re: Access to CheckpointStatsCounts

2019-12-02 Thread vino yang
Hi min, If it is only for monitoring purposes, you can just use checkpoint REST API[1] to do this work. [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/rest_api.html#jobs-jobid-checkpoints Best, Vino 于2019年12月2日周一 下午5:01写道: > Hi, > > > > Just wonder how to access t

Re: [DISCUSS] Drop RequiredParameters and OptionType

2019-12-03 Thread vino yang
+1, One concern: these two classes are marked with `@publicEvolving` annotation. Shall we mark them with `@Deprecated` annotation firstly? Best, Vino Dian Fu 于2019年12月3日周二 下午8:56写道: > +1 to remove them. It seems that we should also drop the class Option as > it's currently only used in Require

Re: Building with Hadoop 3

2019-12-03 Thread vino yang
cc @Chesnay Schepler to answer this question. Foster, Craig 于2019年12月4日周三 上午1:22写道: > Hi: > > I don’t see a JIRA for Hadoop 3 support. I see a comment on a JIRA here > from a year ago that no one is looking into Hadoop 3 support [1]. Is there > a document or JIRA that now exists which would poi

Re: Auto Scaling in Flink

2019-12-03 Thread vino yang
ure, when the new scheduling architecture is implemented >> https://issues.apache.org/jira/browse/FLINK-10407 . >> >> You can do it externally by cancel the job with a savepoint, update the >> parallelism, and restart the job, according to the rate of data. like what >> prave

Re: Building with Hadoop 3

2019-12-04 Thread vino yang
Hi Marton, Thanks for your explanation. Personally, I look forward to your contribution! Best, Vino Márton Balassi 于2019年12月4日周三 下午5:15写道: > Wearing my Cloudera hat I can tell you that we have done this exercise for > our distros of the 3.0 and 3.1 Hadoop versions. We have not contributed > t

Re: [DISCUSS] Drop Kafka 0.8/0.9

2019-12-04 Thread vino yang
+1 jincheng sun 于2019年12月5日周四 上午10:26写道: > +1 for drop it, and Thanks for bring up this discussion Chesnay! > > Best, > Jincheng > > Jark Wu 于2019年12月5日周四 上午10:19写道: > >> +1 for dropping, also cc'ed user mailing list. >> >> >> Best, >> Jark >> >> On Thu, 5 Dec 2019 at 03:39, Konstantin Knauf

Re: Need help using AggregateFunction instead of FoldFunction

2019-12-04 Thread vino yang
Hi devinbost, Sharing two example links with you : - the example code of official documentation[1]; - a StackOverflow answer of a similar question[2]; [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/windows.html#aggregatefunction [2]: https://stackove

Re: KeyBy/Rebalance overhead?

2019-12-08 Thread vino yang
Hi Komal, KeyBy(Hash Partition, logically partition) and rebalance(physical partition) are both one of the partitions been supported by Flink.[1] Generally speaking, partitioning may cause network communication(network shuffles) costs which may cause more time cost. The example provided by you ma

Re: Need help using AggregateFunction instead of FoldFunction

2019-12-08 Thread vino yang
Hi dev, The time of the window may have different semantics. In the session window, it's only a time gap, the size of the window is driven via activity events. In the tumbling or sliding window, it means the size of the window. For more details, please see the official documentation.[1] Best, Vi

Re: Sample Code for querying Flink's default metrics

2019-12-09 Thread vino yang
Hi Pankaj, > Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters? What's your real requirement? Can you use code to call REST API? Why does it not match your requirements?

Re: KeyBy/Rebalance overhead?

2019-12-09 Thread vino yang
you @vino yang for the reply. I suspect > keyBy will beneficial in those cases where my subsequent operators are > computationally intensive. Their computation time being > than network > reshuffling cost. > > Regards, > Komal > > On Mon, 9 Dec 2019 at 15:23, vino yang wro

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread vino yang
Hi Li, A potential reason could be conflicting logging frameworks. Can you share the log in your .out file and let us know if the print format of the log is the same as the configuration file you gave. Best, Vino Li Peng 于2019年12月10日周二 上午10:09写道: > Hey folks, I noticed that my kubernetes flink

Re: Flink ML feature

2019-12-09 Thread vino yang
Hi Chandu, AFAIK, there is a project named Alink[1] which is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform. FYI Best, Vino [1]: https://github.com/alibaba/Alink Tom Blackwood 于2019年12月10日周二 下午2:07写道: > You may try Spark ML, whi

Re: Flink ML feature

2019-12-10 Thread vino yang
t; > > On Tue, Dec 10, 2019 at 7:11 AM vino yang wrote: > >> Hi Chandu, >> >> AFAIK, there is a project named Alink[1] which is the Machine Learning >> algorithm platform based on Flink, developed by the PAI team of Alibaba >> computing platform. FYI >>

Re: Thread access and broadcast state initialization in BroadcastProcessFunction

2019-12-10 Thread vino yang
Hi kristoffSC, >> I've noticed that all methods are called by the same thread. Would it be always the case, or could those methods be called by different threads? No, open/processXXX/close methods are called in the different stages of a task thread's life cycle. The framework must keep the call o

Re: Processing Events by custom rules kept in Broadcast State

2019-12-10 Thread vino yang
Hi KristoffSC, It seems the main differences are when to parse your rules and what could be put into the broadcast state. IMO, multiple solutions all can take effect. I prefer option 3. I'd like to parse the rules ASAP and let them be real rule event stream (not ruleset stream) in the source. The

Re: Flink client trying to submit jobs to old session cluster (which was killed)

2019-12-12 Thread vino yang
Hi Pankaj, Can you tell us what's Flink version do you use? And can you share the Flink client and job manager log with us? This information would help us to locate your problem. Best, Vino Pankaj Chand 于2019年12月12日周四 下午7:08写道: > Hello, > > When using Flink on YARN in session mode, each Flin

Re: State Processor API: StateMigrationException for keyed state

2019-12-12 Thread vino yang
Hi pwestermann, Can you share the relevant detailed exception message? Best, Vino pwestermann 于2019年12月13日周五 上午2:00写道: > I am trying to get the new State Processor API but I am having trouble with > keyed state (this is for Flink 1.9.1 with RocksDB on S3 as the backend). > I can read keyed sta

Re: How to understand create watermark for Kafka partitions

2019-12-13 Thread vino yang
Hi Alex, >> But why also say created watermark for each Kafka topic partitions ? IMO, the official documentation has explained the reason. Just copied here: When using Apache Kafka as a data source, each Kafka par

Re: TypeInformation problem

2019-12-15 Thread vino yang
Hi Nick, >From StackOverflow, I see a similar issue which answered by @Till Rohrmann . [1] FYI. Best, Vino [1]: https://stackoverflow.com/questions/38214958/flink-error-specifying-keys-via-field-positions-is-only-valid-for-tuple-data-ty Nicholas Walton 于2019年12月14日周六 上午12:01写道: > I was refac

Re: Fw: Metrics based on data filtered from DataStreamSource

2019-12-15 Thread vino yang
Hi Sidney, Firstly, the `open` method of UDF's instance is always invoked when the task thread starts to run. >From the second code snippet image that you provided, I guess you are trying to get a dynamic handler with reflection technology, is that correct? I also guess that you want to get a dyn

Re: Questions about taskmanager.memory.off-heap and taskmanager.memory.preallocate

2019-12-15 Thread vino yang
Hi Ethan, For now, my suggestion is that you can set "preallocate" to false. The description(the link provided by you) of "taskmanager.memory.preallocate" says: "When taskmanager.memory.off-heap is set to true, then it is advised that this configuration is also set to true." Best, Vino Ethan Li

Re: Documentation tasks for release-1.10

2019-12-16 Thread vino yang
+1 for centralizing all the documentation issues so that the community can take more effective to fix them. Best, Vino Xintong Song 于2019年12月16日周一 下午6:02写道: > Thank you Kostas. > Big +1 for keeping all the documentation related issues at one place. > > I've added the documentation task for reso

Re: Fw: Metrics based on data filtered from DataStreamSource

2019-12-16 Thread vino yang
r* */* Data Platform Developer > M: +972.528197720 */* Skype: sidney.feiner.startapp > > [image: emailsignature] > > -- > *From:* vino yang > *Sent:* Monday, December 16, 2019 7:56 AM > *To:* Sidney Feiner > *Cc:* user@flink.apache.org > *

Re: Questions about taskmanager.memory.off-heap and taskmanager.memory.preallocate

2019-12-17 Thread vino yang
Hi Ethan, Share two things: - I have found "taskmanager.memory.preallocate" config option has been removed in the master codebase. - After researching git history, I found the description of " taskmanager.memory.preallocate" was written by @Chesnay Schepler (from 1.8 branch). So

Re: RichAsyncFunction Timeout

2019-12-17 Thread vino yang
Hi Polarisary, IMO, firstly, it would be better to monitor the OS and Flink/HBase metrics. For example: - Flink and HBase cluster Network I/O metrics; - Flink TM CPU/Memory/Backpressure metrics and so on; You can view these metrics to find some potential reasons. If you can not figure it

Re: [Question] How to use different filesystem between checkpoint data and user data sink

2019-12-18 Thread vino yang
Hi ouywl, *>>Thread.currentThread().getContextClassLoader();* What does this statement mean in your program? In addition, can you share your implementation of the customized file system plugin and the related exception? Best, Vino ouywl 于2019年12月18日周三 下午4:59写道: > Hi all, > We have im

Re: Cannot get value from ValueState in ProcessingTimeTrigger

2019-12-18 Thread vino yang
Hi Utopia, The behavior may be correct. First, the default value is null. It's the correct value. `ValueStateDescriptor` has multiple constructors, some of them can let you specify a default value. However, these constructors are deprecated. And the doc does not recommend them.[1] For the other c

Re: Cannot get value from ValueState in ProcessingTimeTrigger

2019-12-18 Thread vino yang
he ValueState before > update? > > before update value : 3 >> >> after update value: 4 >> >> > What’s more, How can I stored the previous value so that I can get the > value when next element come in and invoke the onElement method? > > > > Best r

Re: Apache Flink - Flink Metrics - How to distinguish b/w metrics for two job manager on the same host

2019-12-18 Thread vino yang
Hi Mans, IMO, one job manager represents one Flink cluster and one Flink cluster has a suite of Flink configuration e.g. metrics reporter. Some metrics reporters support tag feature, you can specify it to distinguish different Flink cluster.[1] [1]: https://ci.apache.org/projects/flink/flink-doc

Re: DataStream API min max aggregation on other fields

2019-12-19 Thread vino yang
Hi weizheng, IMHO, I do not know where is not clear to you? Is the result not correct? Can you share the correct result based on your understanding? The "keyBy" specifies group field and min/max do the aggregation in the other field based on the position you specified. Best, Vino Lu Weizheng 于

Re: Can trigger fire early brefore specific element get into ProcessingTimeSessionWindow

2019-12-19 Thread vino yang
Hi Utopia, Flink provides a high scalability window mechanism.[1] For your scene, you can customize your window assigner and trigger. [1]: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html Best, Vino Utopia 于2019年12月19日周四 下午5:56写道: > Hi, > > I want to f

Re: Unit testing filter function in flink

2019-12-19 Thread vino yang
Hi Vishwas, Apache Flink provides some test harness to test your application code on multiple levels of the testing pyramid. You can use them to test your UDF. Please see more examples offered by the official documentation[1]. Best, Vino [1]: https://ci.apache.org/projects/flink/flink-docs-stab

  1   2   3   4   5   >