Re: Global State and Scaling

2017-08-21 Thread Elias Levy
Looks like Gerard asked something along similar lines just last month and that there is a JIRA

Re: Deleting files in continuous processing

2017-08-21 Thread Mohit Anchlia
Just checking to see if there is a way to purge files after it's processed. On Tue, Aug 15, 2017 at 5:11 PM, Mohit Anchlia wrote: > Is there a way to delete a file once it has been processed? > > streamEnv > > .readFile(format, args[0],

Global State and Scaling

2017-08-21 Thread Elias Levy
I am implementing a control stream. The stream communicates a global configuration value for the whole job. It uses DataStream.broadcast() to communicate this to all parallel operator instances. I would like to save this value in state so that it can be recovered when the job restarts/recovers.

Re: Issue with Scala API when using CEP's "notFollowedBy"

2017-08-21 Thread Ted Yu
Please take a look at FLINK-7306, fixed for 1.4.0 release. On Mon, Aug 21, 2017 at 1:03 PM, Gehad Elrobey wrote: > Hi there, > > I have an issue using the Scala API for the CEP library, the notFollowedBy API > call doesn't return a Pattern it returns a Unit instead, So I

Issue with Scala API when using CEP's "notFollowedBy"

2017-08-21 Thread Gehad Elrobey
Hi there, I have an issue using the Scala API for the CEP library, the notFollowedBy API call doesn't return a Pattern it returns a Unit instead, So I am not able to chain additional filters. I tried the Java API and it works fine, So I believe there is something wrong with the Scala API or I

Re: [Survey] How many people use Flink with AWS Kinesis sink

2017-08-21 Thread Bowen Li
Hi Stephan, It's just Kinesis Producer in KPL (Kinesis Producer Library) causing LOTS of trouble. flink-connector-kinesis uses Kinesis Producer to write output results to Kinesis. On the other hand, Kinesis Consumer (KCL) is fine. If there are any successful use cases of Flink + KPL, I'd love to

Re: Prioritize DataStream

2017-08-21 Thread Elias Levy
Flink folks, A response to the question below? On Sat, Aug 19, 2017 at 11:02 AM, Elias Levy wrote: > I believe the answer to this question is "no", but I figure I might as > well ask. Is there a way to prioritize a stream? > > The use case is prioritization of a

Re: [Survey] How many people use Flink with AWS Kinesis sink

2017-08-21 Thread Stephan Ewen
Hi! I cannot speak for the full survey, only from observation on the mailing list and some users I have chatted to directly. I do not really know about the Kinesis Producer (don't know a specific case there), but the Kinesis Consumer seems to be used quite a bit. Do your observations pertain to

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-21 Thread Hao Sun
Thanks Shannon for the https://github.com/coreos/zetcd tips, I will check that out and share my results if we proceed on that path. Thanks Stephan for the details, this is very useful, I was about to ask what exactly is stored into zookeeper, haha. On Mon, Aug

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-21 Thread Stephan Ewen
Hi! That is a very interesting proposition. In cases where you have a single master only, you may bet away with quite good guarantees without ZK. In fact, Flink does not store significant data in ZK at all, it only uses locks and counters. You can have a setup without ZK, provided you have the

Re: Flink multithreading, CoFlatMapFunction, CoProcessFunction, internal state

2017-08-21 Thread Nico Kruber
Hi Chao, what I meant by "per-record base" was actually supposed to be "per-event base" (event = one entity of whatever the stream contains). As from the API: processing is supposed to be one event at a time and this is what is performed internally, too. Nico On Thursday, 17 August 2017

Re: Aggregation by key hierarchy

2017-08-21 Thread Nico Kruber
Hi Basant, no, you cannot add data streams or re-wire your program during runtime. As for any other program changes, you'd have to take a savepoint (to keep operator state and exactly-once semantics) and restart the new program code from there. For a few combinations, I'd probably choose the

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-21 Thread Shannon Carey
Zookeeper should still be necessary even in that case, because it is where the JobManager stores information which needs to be recovered after the JobManager fails. We're eyeing https://github.com/coreos/zetcd as a way to run Zookeeper on top of Kubernetes' etcd cluster so that we don't have

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

2017-08-21 Thread Raja . Aravapalli
Thanks Gordon. Regards, Raja. From: "Tzu-Li (Gordon) Tai" Date: Thursday, August 17, 2017 at 11:47 PM To: Raja Aravapalli , "user@flink.apache.org" Subject: Re: [EXTERNAL] Re: Fink application failing with kerberos

RE: Great number of jobs and numberOfBuffers

2017-08-21 Thread Gwenhael Pasquiers
Hi, 1/ Yes, the loop is part of the application I run on yarn. Something like : public class MyFlinkApp { public static void main(String[] args){ // parse arguments etc for(String datehour:datehours){ ExecutionEnvironment env =

[Survey] How many people use Flink with AWS Kinesis sink

2017-08-21 Thread Bowen Li
Hi guys, We want to have a more accurate idea of how many people are writing Flink's computation result to AWS Kinesis, and how many people had successful Flink deployment against Kinesis? The reason I ask for the survey is because we have been trying to make our Flink jobs and Kinesis

Re: Memory Issue

2017-08-21 Thread Jörn Franke
One would need to look at your code and possible on some heap statistics. Maybe something wrong happens when you cache them (do you use a 3rd party library or your own implementation?). Do you use a stable version of your protobuf library (not necessarily the most recent). You also may want to