KeyBy State

2017-07-31 Thread Govindarajan Srinivasaraghavan
Hi, I have a keyby state but the key can change quite frequently for the same user and I need the previous keyBy state value for the user if there is a key change. Right now I'm using redis cache for the global state. Is there a way to achieve this within flink?

Re: Odd flink behaviour

2017-07-31 Thread Fabian Hueske
Do you set reached to false in open()? Am 01.08.2017 2:44 vorm. schrieb "Mohit Anchlia" : And here is the inputformat code: public class PDFFileInputFormat extends FileInputFormat { /** * */ private static final long serialVersionUID = -4137283038479003711L;

Re: GRPC source in Flink

2017-07-31 Thread Chen Qin
Hi Basanth, Given the fact that Flink put failure recovery garantee on checkpointing and source rewinding. I can imagine a lossless rpc source would be tricky. In essence, any rpc source needs to provide rewind api which can buffer at least to last success checkpoint. In production use cases, put

Re: Access Sliding window

2017-07-31 Thread Raj Kumar
Thanks Fabian. That helps. I have one more question. In the second step since I am using window function apply, The average calculated will be a running average or it will be computed at the end of 6hrs window ?? -- View this message in context:

Re: Odd flink behaviour

2017-07-31 Thread Mohit Anchlia
And here is the inputformat code: public class PDFFileInputFormat extends FileInputFormat { /** * */ private static final long serialVersionUID = -4137283038479003711L; private static final Logger logger = LoggerFactory .getLogger(PDFInputFormat.class.getName()); private boolean

GRPC source in Flink

2017-07-31 Thread Basanth Gowda
Hi, Is there a way to get data from GRPC source in flink. If we can how we guarantee that events are not lost once submitted to Flink. thank you

Flatbuffers and Flink

2017-07-31 Thread Basanth Gowda
Hi, This is 1 of 3 questions I had for Flink. Didn't want to club all of them together, as this might be useful for some one else in the future. Do we have Flatbuffers support in Flink ? If there is no support, is there a way to implement it ? Trying to see if we could use the byte[] that has

Re: Using FileInputFormat - org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit

2017-07-31 Thread Mohit Anchlia
I even tried existing format but still same error: FileInputFormat fileInputFormat = *new* TextInputFormat(*new* Path( args[0])); fileInputFormat.setNestedFileEnumeration(*true*); streamEnv.readFile(fileInputFormat, args[0], FileProcessingMode.*PROCESS_CONTINUOUSLY*, 1L).print(); [main]

Re: Can i use lot of keyd states or should i use 1 big key state.

2017-07-31 Thread shashank agarwal
Ok if i am taking it as right for an example : if i am creating a keyed state with name "total count by email" for key(project id + email) than it will create a single hash-table or column family "total count by email" and all the unique email id's will be rows of that single hash-table or

Re: data loss after implementing checkpoint

2017-07-31 Thread Kostas Kloudas
Hi Sridhar, Stephan already covered the correct sequence of actions in order for your second program to know its correct starting point. As far as the active/inactive rules are concerned, as Nico pointed out you have to somehow store in the backend which rules are active and which are not

Re: data loss after implementing checkpoint

2017-07-31 Thread Stephan Ewen
Maybe to clear up some confusion here: - Flink recovers from the latest checkpoint after a failure - If you stopping/cancelling a Flink job and submit the job again, it does not automatically pick up the latest checkpoint. Flink does not know that the second program is a continuation of the

Re: Can i use lot of keyd states or should i use 1 big key state.

2017-07-31 Thread Stephan Ewen
Each keyed state in Flink is a hashtable or a column family in RocksDB. Having too many of those is not memory efficient. Having fewer states is better, if you can adapt your schema that way. I would also look into "MapState", which is an efficient way to have "sub keys" under a keyed state.

Re: Flink CLI cannot submit job to Flink on Mesos

2017-07-31 Thread Stephan Ewen
Hi Francisco! Can you drop the explicit address of the jobmanager? The client should pick up that address automatically from ZooKeeper as well (together with the HA leader session ID). Please check if you have the ZooKeeper HA config entries in the config used by the CLI. Stephan On Mon, Jul

Re: Invalid path exception

2017-07-31 Thread Stephan Ewen
Hmm, looks like a bug then... Could you open a JIRA issue for that? @Chesnay are you aware of Path issues on Windows? On Mon, Jul 31, 2017 at 8:01 PM, Mohit Anchlia wrote: > I tried that as well but same result > > format.setFilePath("file:/c:/proj/test/a.txt.txt"); > >

Using FileInputFormat - org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit

2017-07-31 Thread Mohit Anchlia
In trying to use this code I get the following error. Is it asking me to implement additional interface? streamEnv.readFile(format, args[0], FileProcessingMode. *PROCESS_CONTINUOUSLY*, 2000).print(); [main] INFO com.s.flink.example.PDFInputFormat - Start streaming [main] INFO

Re: Access Sliding window

2017-07-31 Thread Fabian Hueske
You can compute the average and std-dev in a WindowFunction that iterates over all records in the window (6h / 15min = 24). WIndowFunction [1] and CoProcessFunction [2] are described in the docs. [1]

Re: Customer inputformat

2017-07-31 Thread Mohit Anchlia
Thanks! When I give path to a directory flink is only reading 2 files. It seems to be picking these 2 files randomly. On Mon, Jul 31, 2017 at 12:05 AM, Fabian Hueske wrote: > Hi Mohit, > > as Ted said, there are plenty of InputFormats which are based on > FileInputFormat. >

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Fabian Hueske
Having an operator that updates state from one stream and queries it to process the other stream is actually a common pattern. As I said, I don't know your use case but I don't think that a CoProcessFunction would result in a mess. QueryableState will have quite a bit of overhead because the

Re: Invalid path exception

2017-07-31 Thread Mohit Anchlia
I tried that as well but same result format.setFilePath("file:/c:/proj/test/a.txt.txt"); Caused by: *java.nio.file.InvalidPathException*: Illegal char <:> at index 2: /c:/proj/test/a.txt.txt On Mon, Jul 31, 2017 at 6:04 AM, Stephan Ewen wrote: > I think that on Windows,

Re: Access Sliding window

2017-07-31 Thread Raj Kumar
Thanks Fabian. Can you provide more details about the implementation for step 2 and step 3. How to calculate the average and standard deviation ? How does the coprocess function work ? Can you provide details about these two. -- View this message in context:

Re: Flink CLI cannot submit job to Flink on Mesos

2017-07-31 Thread Francisco Gonzalez Barea
Hi again, On the other hand, we are running the following flink CLI command: ./flink run -d -m ${jobmanager.rpc.address}:${jobmanager.rpc.port} ${our-program-jar} ${our-program-params} Maybe is the command what we are using wrongly? Thank you On 28 Jul 2017, at 11:07, Till Rohrmann

Re: Flink CLI cannot submit job to Flink on Mesos

2017-07-31 Thread Francisco Gonzalez Barea
Hi Till, Thanks for your answer. We have reviewed the configuration and everything seems fine in our side… But we´re still getting the message: “Discard message LeaderSessionMessage(----,SubmitJob(JobGraph(jobId: 041b67c7ef765c2f61bd69c2b9dacbce),DETACHED))

Can i use lot of keyd states or should i use 1 big key state.

2017-07-31 Thread shashank agarwal
Hello, I have to compute results on basis of lot of history data, parameters like total transactions in last 1 month, last 1 day, last 1 hour etc. by email id, ip, mobile, name, address, zipcode etc. So my question is this right approach to create keyed state by email, mobile, zipcode etc. or

Re: Flink on YARN - tmp directory

2017-07-31 Thread Aljoscha Krettek
Hi Chris, I think in this case we need to change what is passed as "-Djava.io .tmpdir" to the JVMs that run the TaskManagers. You should be able to achieve this via env.java.opts or more specifically env.java.opts.taskmanager [1]. The directory specified via task

Re: data loss after implementing checkpoint

2017-07-31 Thread Nico Kruber
Hi Sridhar, sorry for not coming back to you earlier and tbh, I'm no expert on this field either. I don't see this enabling/disabling of rules in the CEP library overview at [1]. How do you do this? You'll probably have to create a stateful operator [2] to store this state in Flink. Maybe

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Biplob Biswas
Hi Fabian, Thanks for the insight, I am currently exploring QueryableStateClient and would attempt to get the value for a corresponding key using the getkvstate() function, I was confused about the jobId but I am expecting this would provide me with the jobid of the current job -

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Fabian Hueske
I am not sure that this is impossible, but it is not the use case queryable state was designed for. I don't know the details of your application, but you could try to merge the updating and the querying operators into a single one. You could connect two streams with connect() and use a keyed

Re: Invalid path exception

2017-07-31 Thread Stephan Ewen
I think that on Windows, you need to use "file:/c:/proj/..." with just one slash after the scheme. On Mon, Jul 31, 2017 at 1:24 AM, Mohit Anchlia wrote: > This is what I tired and it doesn't work. Is this a bug? > > format.setFilePath("file:///c:/proj/test/a.txt.txt");

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Biplob Biswas
Hi Fabian, I read about the process function and it seems a perfect fit for my requirement. Although while reading more about queryable-state I found that its not meant to do lookups within job (Your comment in the following link).

Re: Fink: KafkaProducer Data Loss

2017-07-31 Thread Tzu-Li (Gordon) Tai
Hi! Thanks a lot for providing this. I'll try to find some time this week to look into this using your example code. Cheers, Gordon On 29 July 2017 at 4:46:57 AM, ninad (nni...@gmail.com) wrote: Hi Gordon, I was able to reproduce the data loss on standalone flink cluster also. I have stripped

Re: AW: Is watermark used by joining two streams

2017-07-31 Thread G.S.Vijay Raajaa
Hi Fabian, Thanks for the reply. I shall try the CoProcessFunction implementation. Currently, I am trying to assign watermark on the keyed stream. Please find a snippet of the code for better understanding; List < String > names = new ArrayList < > (); names.add("stream_a");

Re: AW: Is watermark used by joining two streams

2017-07-31 Thread Fabian Hueske
Hi Vijay, there are many ways to implement joins with a stateful CoProcessFunction. It gives you access to the timestamps of records and you can register timers that trigger when a certain time is reached. It is basically up to you how you join and emit data. You can drop late data or emit it.

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Biplob Biswas
Hi Fabian, Thanks a lot for pointing that out would read about it and give it a try. Regards, Biplob -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-QueryableState-with-Sliding-Window-on-RocksDB-tp14514p14551.html Sent from the

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Fabian Hueske
Hi Biplob, given these requirements, I would rather not use a window but implement the functionality with a stateful ProcessFunction. A ProcessFunction can register timers, e.g., to remove inactive state. The state of a ProcessFunction can be made queryable. Best, Fabian 2017-07-31 9:52

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Biplob Biswas
Thanks Fabian for the reply, I was reconsidering my design and the requirement and what I mentioned already is partially confusing. I realized that using a sessionwindow is better in this scenario where I want a value to be updated per key and the session resets to wait for the gap period with

Re: AW: Is watermark used by joining two streams

2017-07-31 Thread G.S.Vijay Raajaa
My bad. I meant only join. I am currently using keyBy on a timestamp common across the streams. Regards, Vijay Raajaa GS On Mon, Jul 31, 2017 at 1:16 PM, Fabian Hueske wrote: > Hi, > > @Wei: You can implement very different behavior using a CoProcessFunction. > However, if

Re: AW: Is watermark used by joining two streams

2017-07-31 Thread Fabian Hueske
Hi, @Wei: You can implement very different behavior using a CoProcessFunction. However, if your operator is time-based, the logical time of the operator will be the minimum time of both streams (time of the "slower" watermark). @Vijay: I did not understand what your requirements are. Do you want

Re: Flink QueryableState with Sliding Window on RocksDB

2017-07-31 Thread Fabian Hueske
Hi Biplob, What do you mean by "creating a sliding window on top of a state"? Sliding windows are typically defined on streams (data in motion) and not on state (data at rest). It seems that UpdatedTxnState always holds the last record that was received per key. Do you want to compute the

??????re: How can I submit a flink job to YARN/Cluster from java code?

2017-07-31 Thread ????
ok,thank you. -- -- ??: "z...@zjdex.com";; : 2017??7??31??(??) 9:31 ??: ""; "user"; : re: How can I submit a flink job to YARN/Cluster from java code? You can

Re: CEP condition expression and its event consuming strategy

2017-07-31 Thread Dawid Wysakowicz
Ad. 1 Yes it returns and Iterable to support times and oneOrMore patterns(which can accept more than one event). Ad. 2 Some use case for not discarding used events could be e.g. looking for some shapes in our data, e.g. W-shapes. In this case one W-shape could start on the middle peak of the

Re: Customer inputformat

2017-07-31 Thread Fabian Hueske
Hi Mohit, as Ted said, there are plenty of InputFormats which are based on FileInputFormat. FileInputFormat also supports reading all files in a directory. Simply specify the path of the directory. Check StreamExecutionEnvironment.createFileInput() which takes a several parameters such as a

Re: Access Sliding window

2017-07-31 Thread Fabian Hueske
Hi, I would first compute the 15 minute counts. Based on these counts, you compute the threshold by computing average and std-dev and then you compare the counts with the threshold. In pseudo code this could look as follows: DataStream requests = ... DataStream counts = requests.timeWindow(15