Problem with KeyedStream 1.0-SNAPSHOT

2016-02-15 Thread Lopez, Javier
Hi guys, I'm running a small test with the SNAPSHOT version in order to be able to use Kafka 0.9 and I'm getting the following error: *cannot access org.apache.flink.api.java.operators.Keys* *[ERROR] class file for org.apache.flink.api.java.operators.Keys not found* The code I'm using is as

events eviction

2016-02-15 Thread Radu Tudoran
Hello, I am looking over the mechanisms of evicting events in Flink. I saw that either using a default evictor or building a custom one the logic is that the evictor will provide the number of events to be discarded. Could you please provide me with some additional pointers regarding the

Re: events eviction

2016-02-15 Thread Aljoscha Krettek
Hi, you are right, the logic is in EvictingNonKeyedWindowOperator.emitWindow() for non-parallel (non-keyed) windows and in EvictingWindow.processTriggerResult() in the case of keyed windows. You are also right about the contract of the Evictor, it returns the number of elements to be evicted

Re: writeAsCSV with partitionBy

2016-02-15 Thread Fabian Hueske
Hi Srikanth, DataSet.partitionBy() will partition the data on the declared partition fields. If you append a DataSink with the same parallelism as the partition operator, the data will be written out with the defined partitioning. It should be possible to achieve the behavior you described using

Re: Regarding Concurrent Modification Exception

2016-02-15 Thread Fabian Hueske
Hi, This stacktrace looks really suspicious. It includes classes from the submission client (CLIClient), optimizer (JobGraphGenerator), and runtime (KryoSerializer). Is it possible that you try to start a new Flink job inside another job? This would not work. Best, Fabian

Re: schedule tasks `inside` Flink

2016-02-15 Thread Fabian Hueske
Hi Michal, If I got your requirements right, you could try to solve this issue by serving the updates through a regular DataStream. You could add a SourceFunction which periodically emits a new version of the cache and a CoFlatMap operator which receives on the first input the regular streamed

Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
Hello everyone, last week I've ran some tests with Apache ZooKeeper to get a grip on Flink HA features. My tests went bad so far and I can't sort out the reason. My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3 masters and 4 slaves. The 3 masters are also the ZooKeeper

Re: Problem with KeyedStream 1.0-SNAPSHOT

2016-02-15 Thread Fabian Hueske
Hi Javier, Keys is an internal class and was recently moved to a different package. So it appears like your Flink dependencies are not aligned to the same version. We also added Scala version identifiers to all our dependencies which depend on Scala 2.10. For instance, flink-scala became

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Ufuk Celebi
Using the local file system as state backend only works if all job managers run on the same machine. Is that the case? Have you specified all job managers in the masters file? With the local file system state backend only something like host-X host-X host-X will be a valid masters

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
Hi Stefano, The Job should stop temporarily but then be resumed by the new JobManager. Have you increased the number of execution retries? AFAIK, it is set to 0 by default. This will not re-run the job, even in HA mode. You can enable it on the StreamExecutionEnvironment. Otherwise, you have

Re: IOException when trying flink-twitter example

2016-02-15 Thread ram kumar
org.apache.flink.streaming.connectors.twitter.TwitterFilterSource - Initializing Twitter Streaming API connection 12:27:32,134 INFO com.twitter.hbc.httpclient.BasicClient- New connection executed: twitterSourceClient, endpoint: /1.1/statuses/filter.json?delimited=length

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
Hi Ufuk, thanks for replying. Regarding the masters file: yes, I've specified all the masters and checked out that they were actually running after the start-cluster.sh. I'll gladly share the logs as soon as I get to see them. Regarding the state backend: how does having a non-distributed

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
Hi Maximilian, thank you for the reply. I've checked out the documentation before running my tests (I'm not expert enough to not read the docs ;)) but it doesn't mention some specific requirement regarding the execution retries, I'll check it out, thank! On Mon, Feb 15, 2016 at 12:51 PM,

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
Hi Stefano, That is true. The documentation doesn't mention that. Just wanted to point you to the documentation if anything else needs to be configured. We will update it. Instead of setting the number of execution retries on the StreamExecutionEnvironment, you may also set

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Ufuk Celebi
> On 15 Feb 2016, at 13:40, Stefano Baghino > wrote: > > Hi Ufuk, thanks for replying. > > Regarding the masters file: yes, I've specified all the masters and checked > out that they were actually running after the start-cluster.sh. I'll gladly > share the

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-15 Thread Stephan Ewen
Hi! Looks like that experience should be improved. Do you know why you are getting conflicts on the FashHashMap class, even though the core Flink dependencies are "provided"? Does adding the Kafka connector pull in all the core Flink dependencies? Concerning the Kafka connector: We did not

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-15 Thread shikhar
Stephan Ewen wrote > Do you know why you are getting conflicts on the FashHashMap class, even > though the core Flink dependencies are "provided"? Does adding the Kafka > connector pull in all the core Flink dependencies? Yes, the core Flink dependencies are being pulled in transitively from the

Read once input data?

2016-02-15 Thread Saliya Ekanayake
Hi, I see that an InputFormat's open() and nextRecord() methods get called for each terminal operation on a given dataset using that particular InputFormat. Is it possible to avoid this - possibly using some caching technique in Flink? For example, I've some code like below and I see for both

Re: Read once input data?

2016-02-15 Thread Fabian Hueske
Hi, it looks like you are executing two distinct Flink jobs. DataSet.count() triggers the execution of a new job. If you have an execute() call in your program, this will lead to two Flink jobs being executed. It is not possible to share state among these jobs. Maybe you should add a custom

Re: Read once input data?

2016-02-15 Thread Saliya Ekanayake
Fabian, count() was just an example. What I would like to do is say run two map operations on the dataset (ds). Each map will have it's own reduction, so is there a way to avoid creating two jobs for such scenario? The reason is, reading these binary matrices are expensive. In our current MPI

Re: Read once input data?

2016-02-15 Thread Fabian Hueske
I would have a look at the example programs in our code base: https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java Best, Fabian 2016-02-15 22:03 GMT+01:00 Saliya Ekanayake : > Thank you, Fabian. > > Any

Re: Read once input data?

2016-02-15 Thread Saliya Ekanayake
Thanks, I'll check this. Saliya On Mon, Feb 15, 2016 at 4:08 PM, Fabian Hueske wrote: > I would have a look at the example programs in our code base: > > >

Re: Finding the average temperature

2016-02-15 Thread Nirmalya Sengupta
Hello Stefano Sorry for the late reply. Many thanks for taking effort to write and share an example code snippet. I have been playing with the countWindow behaviour for some weeks now and I am generally aware of the functionality of countWindowAll(). For my