Window apply problem

2016-03-07 Thread Marcela Charfuelan
hello, I want to make a function for counting items (per type) in windows of size N; For example for N=5 and the stream: 1 2 4 3 4 3 4 5 4 6 7 3 3 6 1 1 3 2 4 6 I would like to generate the tuples: w(1 2 4 3 4) -> (1,1)(2,1)(4,2)(3,1) w(3 4 5 4 6) -> (1,1)(2,1)(4,4)(3,2)(5,1)(6,1) w(7 3 3 6

Submit Flink Jobs to YARN running on AWS

2016-03-07 Thread Bajaj, Abhinav
Hi, I am a newbie to Flink and trying to use it in AWS. I have created a YARN cluster on AWS EC2 machines. Trying to submit Flink job to the remote YARN cluster using the Flink Client running on my local machine. The Jobmanager start successfully on the YARN container but the client is not

Re: Checkpoint

2016-03-07 Thread Ufuk Celebi
Hey Vijay! On Mon, Mar 7, 2016 at 8:42 PM, Vijay Srinivasaraghavan wrote: > 3) How can I simulate and verify backpressure? I have introduced some delay > (Thread Sleep) in the job before the sink but the "backpressure" tab from UI > does not show any indication of whether

Re: Machine Learning on Apache Fink

2016-03-07 Thread Dmitriy Lyubimov
still in the works (for mahout). but soon. On Sat, Jan 9, 2016 at 3:46 AM, Ashutosh Kumar wrote: > I see lot of study materials and even book available for ml on spark. Spark > seems to be more mature for analytics related work as of now. Please > correct me if I am

NoSuchMethodError flatMap

2016-03-07 Thread Vishnu Viswanath
Hi All, After successfully writing the wordcount program, I was trying to create a streaming application but is getting below error when submitting the job in local mode. Vishnus-MacBook-Pro:flink vishnu$ flink run target/scala-2.11/flink-vishnu_2.11-1.0.jar java.lang.NoSuchMethodError:

Re: EventTimeSourceFunction class

2016-03-07 Thread Stephan Ewen
Hi! In Flink 1.0.0, all functions can act as EventTimeSourceFunctions - all functions can call "emitWithTimestamp(value, timestamp)" on the source context. You can also get timestamps onto values via a Timestamp Assigner. Have a look here:

Re: realtion between operator and task

2016-03-07 Thread Stephan Ewen
Hi! A "task" is something that is deployed as one unit to the TaskManager and runs in one thread. A task can have multiple "operators" chained together, usually one per user function, for example "Map -> Filter -> FlatMap -> AssignTimestamps -> ..." Stephan On Mon, Mar 7, 2016 at 7:36 PM,

Re: Flink on YARN: long-running session vs. one-off jobs

2016-03-07 Thread Maximilian Michels
For the per-job cluster: Yes, the JobManager is started exclusively for the job. For the Yarn session: No, the JobManager stays alive during the entire session and may execute one or more jobs (one after another or even at the same time). On Mon, Mar 7, 2016 at 6:37 PM, Stefano Baghino

Re: Flink on YARN: long-running session vs. one-off jobs

2016-03-07 Thread Stefano Baghino
One last question: running multiple jobs mean that each one has its own JobManager, right? On Mon, Mar 7, 2016 at 3:14 PM, Stefano Baghino < stefano.bagh...@radicalbit.io> wrote: > Good, thank you for the explanation! > > On Mon, Mar 7, 2016 at 2:38 PM, Maximilian Michels

Re: Jobmanager HA with Rolling Sink in HDFS

2016-03-07 Thread Maximilian Bode
Hi Aljoscha, thank you very much, I will try if this fixes the problem and get back to you. I am using 1.0.0 as of today :) Cheers, Max — Maximilian Bode * Junior Consultant * maximilian.b...@tngtech.com TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring Geschäftsführer: Henrik

Re[2]: DataStream, Sink and JDBC

2016-03-07 Thread toletum
Thanks Chiwan and Chesnay I'm happy :-) On lun., mar. 7, 2016 at 14:18, Chiwan Park wrote: Hi Toletum, You can initialize a JDBC connection with RichSinkFunction [1]. There are two methods, `open` and `close`. The `open` method is called once before calling `invoke` method. The `close` method

Flink and Directory Monitors

2016-03-07 Thread phiroc
Hello, has anyone ever used Flink with file/directory monitoring applications such as Directory Monitor (https://directorymonitor.com/)? Is it conceivable to process file-update events with Flink? For instance, let's says hundreds of users simultaneously modify files on a filesystem. Directory

Re: Flink on YARN: long-running session vs. one-off jobs

2016-03-07 Thread Stefano Baghino
Good, thank you for the explanation! On Mon, Mar 7, 2016 at 2:38 PM, Maximilian Michels wrote: > Hi Stefano, > > Essentially the Yarn Session is not much different from a per-job Yarn > cluster. In either case, a Flink cluster is brought up with resources > provided by Yarn. In

Re: DataStream, Sink and JDBC

2016-03-07 Thread Chiwan Park
Hi Toletum, You can initialize a JDBC connection with RichSinkFunction [1]. There are two methods, `open` and `close`. The `open` method is called once before calling `invoke` method. The `close` method is called lastly. Note that you should add `transient` keyword to the JDBC connection

Re: SourceFunction Scala

2016-03-07 Thread Ankur Sharma
Hi, I am getting following error while executing the fat jar of project: Any help? Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/flink/streaming/util/serialization/DeserializationSchema at org.mpi.debs.Main.main(Main.scala) Caused by: