Re: Flink Checkpoint on yarn

2016-03-19 Thread Ufuk Celebi
Can you please have a look into the JobManager log file and report which checkpoints are restored? You should see messages from ZooKeeperCompletedCheckpointStore like: - Found X checkpoints in ZooKeeper - Initialized with X. Removing all older checkpoints You can share the complete job manager log

Re: Flink Checkpoint on yarn

2016-03-19 Thread Ufuk Celebi
Yes, the jobs have their own UUID. Although you expect there to be two independent clusters (which makes sense since you started via yarn-cluster), both clusters act as a single one because of the shared ZooKeeper root. What happens in your case is the following (this is also the reason why we se

Re: degree of Parallelism

2016-03-19 Thread Fabian Hueske
Hi, did find the documentation for configuring the parallelism [1]? It explains how to set the parallelism on different levels: Cluster, Job, Task. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html#parallel-execution 2016-03-18 13:34 GMT+01:00 T

Re: off-heap size feature request

2016-03-19 Thread Fabian Hueske
taskmanager.heap.mb defines the total amount of memory used by a task manager. The "heap" part of the parameter name originates from the time when Flink could not allocate off-heap and I agree it is confusing. Nonetheless, we decided to keep it for backward compatibility. The memory fraction is de

Re: Flink Checkpoint on yarn

2016-03-19 Thread Ufuk Celebi
OK, so you are submitting multiple jobs, but you submit them with -m yarn-cluster and therefore expect them to start separate YARN clusters. Makes sense and I would expect the same. I think that you can check in the client logs printed to stdout to which cluster the job is submitted. PS: The logs

Setting taskmanager.network.numberOfBuffers does not seem to have an affect - Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
All: Flink Version 0.10.2 The number that I set for *taskmanager.network.numberOfBuffers* doesn't seem to have any affect, even if I set it to a very high number. There might be a race condition here where the upper bound is not enforced or computer correctly. java.io.IOException: Insufficient n

TumblingProcessingTimeWindow and ContinuousProcessingTimeTrigger

2016-03-19 Thread Hironori Ogibayashi
Hello, I have a question about TumblingProcessingTimeWindow and ContinuousProcessingTimeTrigger. The code I tried is below. Output the distinct count of the words, counts are printed every 5 seconds and window is reset every 1 minute. --- val input = env.readFileStream(fileName,100,FileMonit

Re: what happens to the aggregate in a fold on a KeyedStream if the events for a key stop coming ?

2016-03-19 Thread Fabian Hueske
The "weight" of a window depends on the function that you apply. If you apply a generic WindowFunction Flink stores all elements that arrived for the window and applies the function if the trigger returns FIRE. If you apply a FoldFunction (or ReduceFunction), the function is called for each arrivin

Re: Flink CEP Pattern Matching

2016-03-19 Thread Till Rohrmann
Hi Jerry and Vitor, sorry for my late reply, but I was on vacation last week. I think that Flink's CEP library should indeed head function-wise in the direction of the standard you've linked. The next steps would be to enrich the expressiveness of the pattern language to support regular expressio

Re: degree of Parallelism

2016-03-19 Thread Ahmed Nader
Thanks so much Till and Fabian. So if i were to set it, how can i know the best degree to be used for my application, is it the number of cores or what should i set it to? Thanks On 18 March 2016 at 13:36, Fabian Hueske wrote: > Hi, > > did find the documentation for configuring the parallelism

Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
Hello, I'm testing the checkpointing functionality with hdfs as a backend. For what I can see it uses different checkpointing files and resume the computation from different points and not from the latest available. This is to me an unexpected behaviour. I log every second, for every worker, a c

what happens to the aggregate in a fold on a KeyedStream if the events for a key stop coming ?

2016-03-19 Thread Bart van Deenen
If I do a fold on a KeyedStream, I aggregate events for such-and-such key. My question is, what happens with the aggregate (and its key) when events for this key stop coming? My keys are browser session keys, and are virtually limitless. Ideally, I'd like to send some sort of purge event on key

Convert Datastream to Collector or List

2016-03-19 Thread Ahmed Nader
Hi, I want to pass an object of type DataStream ,after applying map function on it, as a parameter to be used somewhere else. But when i do so, i get an error message of trying to access a null context object. Is there a way that i can convert this DataStream object to a list or a collector so as t

Re: Flink Checkpoint on yarn

2016-03-19 Thread Ufuk Celebi
Hey Simone! Did you set different recovery.zookeeper.path.root keys? The default is /flink and if you don't change it for the 2nd cluster, it will try to recover the jobs of the first one. Can you gather the job manager logs as well please? – Ufuk On Thu, Mar 17, 2016 at 11:31 AM, Simone Robutti

Re: degree of Parallelism

2016-03-19 Thread Till Rohrmann
Hi Ahmed, if you don't set the parallelism in your program then depending on how you execute your program different parallelisms will be used. If you execute it in your IDE, then the number of cores will be used as parallelism. If you submit it to a cluster without specifying the parallelism via t

Re: off-heap size feature request

2016-03-19 Thread Ovidiu-Cristian MARCU
Thanks! I will try this one: taskmanager.memory.size. So I should expect this will be the off-heap memory size, right? I am using taskmanager.heap.mb=some value, taskmanager.memory.off-heap: true Memory usage goes up to 99%. The documentation is confusing: taskmanager.memory.size: The amount of

Re: what happens to the aggregate in a fold on a KeyedStream if the events for a key stop coming ?

2016-03-19 Thread Bart van Deenen
Hi Fabian I'm starting to get it :-) Do you think it's feasible to have one 24 hour window per key (with keys say a million at the same time)? So I mean, is a window a heavy thing? Because I really like the idea of having my aggregation run as the event comes in.. It just feels more natural than

Re: Flink and YARN ship folder

2016-03-19 Thread Stefano Baghino
I have another interesting test result on this matter, running again an individual job on a YARN cluster. When running bin/flink run -m yarn-cluster -yn 1 examples/batch/WordCount.jar the job fails with an error in the Job Manager LogType:jobmanager.err Log Upload Time:Thu Mar 17 07:05:32 -0400 2

How to start with the first Kafka Message

2016-03-19 Thread Dominique Rondé
Hi folks, i have a kafka topic with messages from the last 7 days. Now i have a new flink streaming process and like to consume the messages from the beginning. If I just bring up the topology, the consumer starts from this moment and not from beginning. THX Dominique

Re: Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
This is the log filtered to check messages from ZooKeeperCompletedCheckpointStore. https://gist.github.com/chobeat/0222b31b87df3fa46a23 It looks like it finds only a checkpoint but I'm not sure if the different hashes and IDs of the checkpoints are meaningful or not. 2016-03-16 15:33 GMT+01:00

Re: Error when accessing secure HDFS with standalone Flink

2016-03-19 Thread Stefano Baghino
Hi Max, thanks for clarifying the job ownership question. Regarding the security configuration, we set the HADOOP_CONF_DIR environment variable. Right now we're testing YARN again, if we go back to standalone and can come up with some better information regarding the failure I'll write again. Th

Re: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
Thanks for the info, will give it a try. BTW - We're using Hadoop 2.7 on AMR EMR 4.4.0. On Thu, Mar 17, 2016 at 5:55 PM, Ken Krugler wrote: > With Hadoop 2.6 or later, you can use the s3a:// protocol (vs. s3n://), > which should be more reliable (though some bug fixes aren't available until > 2

Re: Flink Checkpoint on yarn

2016-03-19 Thread Ufuk Celebi
Hey Simone, from the logs it looks like multiple jobs have been submitted to the cluster, not just one. The different files correspond to different jobs recovering. The filtered logs show three jobs running/recovering (with IDs 10d8ccae6e87ac56bf763caf4bc4742f, 124f29322f9026ac1b35435d5de9f625, 7f

S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
All: I'm trying to read lots of files from S3 and I am getting timeouts from S3: java.io.IOException: Error opening the Input Split [0,558574890]: Input opening request timed out. Opener was alive. Stack of split open thread: at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.

Re: Setting taskmanager.network.numberOfBuffers does not seem to have an affect - Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
Ufuk, I configured the a super high number: bin/flink run \ -m yarn-cluster \ -ynm "UMS ETL Flow - No Negative Links" \ -yjm 10240 \ -yn 10 \ -ytm 40960 \ -ys 32 \ -yD taskmanager.network.numberOfBuffers=*284800* \ and there are still issues. :( I did however, narro

Re: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Robert Metzger
The default timeout for opening a split is 5 minutes. You can set a higher value with "taskmanager.runtime.fs_timeout" (milliseconds), but I believe that 5 minutes is already way too long. It would be interesting to find out the root cause of this. On Thu, Mar 17, 2016 at 11:00 PM, Sourigna Phetsa

Re: Convert Datastream to Collector or List

2016-03-19 Thread Ahmed Nader
Hello Suneel, Yeah that worked, thanks so much. On 16 March 2016 at 12:50, Suneel Marthi wrote: > DataStream ds = ... > > Iterator iter = DataStreamUtils.collect(ds); > > List list = Lists.newArrayList(iterator); > > Hope that helps. > > > On Wed, Mar 16, 2016 at 7:37 AM, Ahmed Nader > wrot

Re: Flink Checkpoint on yarn

2016-03-19 Thread Stefano Baghino
Hi Ufuk, does the recovery.zookeeper.path.root property need to be set independently for each job that is run? Doesn't Flink take care of assigning some sort of identification to each job and storing their checkpoints independently? On Thu, Mar 17, 2016 at 11:43 AM, Ufuk Celebi wrote: > Hey Sim

Help with DeltaIteration

2016-03-19 Thread Lydia Ickler
Hi, I have a question regarding the Delta Iteration. I basically want to iterate as long as the former and the new calculated set are different. Stop if they are the same. Right now I get a result set that has entries with duplicate „row“ indices which should not be the case. I guess I am doing

Re: Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
Ok, i run another test. I launched two identical jobs, one after the other, on yarn (without the long running session). I then killed a job manager and both the jobs got problems and then resumed their work after a few seconds. The problem is the first job restored the state of the second job and

Re: akka.pattern.AskTimeoutException

2016-03-19 Thread Stefano Baghino
Hi Frederick, thanks for helping me, in the end it looked like it was just a missing property in the ones I gave to Kafka, but the error message looks really misleading. Thanks again. Best, Stefano On Wed, Mar 16, 2016 at 4:04 PM, Frederick Ayala wrote: > Hi Stefano, > > In my case running the

Re: Help with DeltaIteration

2016-03-19 Thread Balaji Rajagopalan
The easier way to debug this would be have prints in the projectjoinresultmapper and see what data you are getting. It is possible your original dataset has duplicate rows ? On Thu, Mar 17, 2016 at 6:36 PM, Lydia Ickler wrote: > Hi, > I have a question regarding the Delta Iteration. > I basicall

Re: Flink Checkpoint on yarn

2016-03-19 Thread Stefano Baghino
Hi Ufuk, I've read the documentation and it's exactly as you say, thanks for the clarification. Assuming one wants to run several jobs in parallel with different users on a secure cluster in HA m

RE: RollingSink with APIs requring fs+path

2016-03-19 Thread Lasse Dalegaard
Hello, Thanks for verifying my thesis. I've created FLINK-3637( https://issues.apache.org/jira/browse/FLINK-3637 ) and will start working on this :-) Best regards, Lasse From: Aljoscha Krettek Sent: Friday, March 18, 2016 1:56 PM To: user@flink.apache.o

Re: Window on stream with timestamps ascending by key

2016-03-19 Thread Aljoscha Krettek
Hi, what you essentially would require is watermarks that are tracked by key. Right now this is not possible in Flink. The watermarks, which are used for keeping track of the timestamps, are global across all keys. Maybe you could implement something that fits your requirements in a custom oper

Re: Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
Actually the test was intended for a single job. The fact that there are more jobs is unexpected and it will be the first thing to verify. Considering these problems we will go for deeper tests with multiple jobs. The logs are collected with "yarn logs" but log aggregation is not properly configur

Re: Flink and YARN ship folder

2016-03-19 Thread Ufuk Celebi
Can you try the same thing without -yt, but a yarn-session? – Ufuk On Thu, Mar 17, 2016 at 12:29 PM, Stefano Baghino wrote: > I have another interesting test result on this matter, running again an > individual job on a YARN cluster. > > When running bin/flink run -m yarn-cluster -yn 1 > example

Re: Setting taskmanager.network.numberOfBuffers does not seem to have an affect - Flink 0.10.2

2016-03-19 Thread Ufuk Celebi
Hey Souringa, could you provide some context about the program you are running? Is it batch or streaming? What is the parallelism? How many operators are you running? Thanks for reporting the issue. I think we will figure it out once you provide further information. :-) – Ufuk On Fri, Mar 18,

Re: what happens to the aggregate in a fold on a KeyedStream if the events for a key stop coming ?

2016-03-19 Thread Fabian Hueske
Hi Bart, if you run a fold function on a keyed stream without a window, there is no way to remove the key and the folded value. You will eventually run out of memory if your key space is continuously growing. If you apply a fold function in a window on a keyed stream you can bound the "lifetime"

Re: Javadoc version

2016-03-19 Thread Robert Metzger
Hi Ken, I see, 1.0-SNAPSHOT in particular is a bad version because that's the default version from the maven archetypes. Lets see, if more users stumble across this, we can see if there is an easy fix. Also, in (hopefully) ~3 months we'll have Flink 1.1 and then it won't look that bad anymore. Re

degree of Parallelism

2016-03-19 Thread Ahmed Nader
Hi, How can i setParallelism in a generic way that provide better performance on any device not only mine? In this case is it better to be set to a certain value or if i just didn't set it to any value does flink take care of that generically and provide better execution performance? Thanks.

Re: Unable to Read from Kafka [ zookeeper.connect error]

2016-03-19 Thread subash basnet
Hello Stefano, Thank you, it's working :). Regards, Subash Basnet On Wed, Mar 16, 2016 at 3:24 PM, Stefano Baghino < stefano.bagh...@radicalbit.io> wrote: > Hi Subash, > > you just have to add the following parameter: --zookeeper.connect > localhost:2181 > Let us know if it works before presumab

Re: Flink Checkpoint on yarn

2016-03-19 Thread Stefano Baghino
Yes, but each job runs his own cluster, right? We have to run them on a secure cluster and on a per-user basis, thus we can't run a YARN session but have to run each job independently. On Thu, Mar 17, 2016 at 12:09 PM, Ufuk Celebi wrote: > On Thu, Mar 17, 2016 at 11:51 AM, Stefano Baghino > wro

Re: Javadoc version

2016-03-19 Thread Robert Metzger
Hi Ken, we are building the docs for each version based on the "release-x.y" branch. This branch contains the snapshot version of the respective version. This allows us to fix documentation issues without releasing a new version. But you are right, it may happen that the docs / javadocs are slight

Re: Unable to Read from Kafka [ zookeeper.connect error]

2016-03-19 Thread Stefano Baghino
Hi Subash, you just have to add the following parameter: --zookeeper.connect localhost:2181 Let us know if it works before presumably the usage string for the ReadFromKafka example can by incorrect. Hope I've been helpful. On Wed, Mar 16, 2016 at 3:17 PM, subash basnet wrote: > Hello all, > >

Re: off-heap size feature request

2016-03-19 Thread Fabian Hueske
Oh yes, good point! The documentation needs to be updated to something like: "The amount of memory (in megabytes) that the task manager reserves on-heap or off-heap (depending on taskmanager.memory.off-heap) for sorting, hash tables, and caching of intermediate results. If unspecified (-1), the me

Re: Flink job on secure Yarn fails after many hours

2016-03-19 Thread Niels Basjes
Hi, In my environment doing the "proxy" thing didn't work. With an token expire of 168 hours (1 week) the job consistently terminates at exactly (within a margin of 10 seconds) 173.5 hours. So far we have not been able to solve this problem. Our teams now simply assume the thing fails once in a w

Java I/O exception

2016-03-19 Thread Ahmed Nader
Hi, I'm working on a project using flink with Spring boot, when i run the application i get an exception: Cannot determine the size of the physical memory for Windows host (using 'wmic memorychip'): Cannot run program "wmic": CreateProcess error=2, The system cannot find the file specified java.io

Re: Flink and YARN ship folder

2016-03-19 Thread Stefano Baghino
Hi Ufuk, I've just run a long running session and in that mode the libraries are correctly shipped without the need to specify -t lib. If you can double check the issue I can open an issue on JIRA. Thanks for helping us. On Thu, Mar 17, 2016 at 2:16 PM, Ufuk Celebi wrote: > Can you try the sa

Re: Flink Checkpoint on yarn

2016-03-19 Thread Stefano Baghino
> Do you have time to repeat your experiment with different ZooKeeper root paths? We reached the same conclusion and we're running this test right now, thanks. On Thu, Mar 17, 2016 at 12:08 PM, Ufuk Celebi wrote: > Yes, the jobs have their own UUID. > > Although you expect there to be two indep

Unable to Read from Kafka [ zookeeper.connect error]

2016-03-19 Thread subash basnet
Hello all, I ran the WriteIntoKafka.java/WikipediaAnalysis.java from flink-streaming examples and able to view the output stream in the terminal via following command: >bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wiki-result But I am unable to read the same stream written to K

Re: Unable to Read from Kafka [ zookeeper.connect error]

2016-03-19 Thread Robert Metzger
Hi, you can set the property by passing it as an argument when starting the job: --zookeeper.connect localhost:2181 On Wed, Mar 16, 2016 at 3:17 PM, subash basnet wrote: > Hello all, > > I ran the WriteIntoKafka.java/WikipediaAnalysis.java from flink-streaming > examples and able to view the out

Re: Flink Checkpoint on yarn

2016-03-19 Thread Simone Robutti
I didn't resubmitted the job. Also the jobs are submitted one by one with -m yarn-master, not with a long running yarn session so I don't really know if they could mix up. I will repeat the test with a cleaned state because we saw that killing the job with yarn application -kill left the "flink ru

RE:Flink job on secure Yarn fails after many hours

2016-03-19 Thread Thomas Lamirault
Hi Max, I will try these workaround. Thanks Thomas De : Maximilian Michels [m...@apache.org] Envoyé : mardi 15 mars 2016 16:51 À : user@flink.apache.org Cc : Niels Basjes Objet : Re: Flink job on secure Yarn fails after many hours Hi Thomas, Nils (CC) a

Re: Error when accessing secure HDFS with standalone Flink

2016-03-19 Thread Maximilian Michels
Hi Stefano, The preparations for Kerberos which you described look correct. Taking a closer lock at the Exception, it seems like the Hadoop config or environment variables are not correctly set. It keeps trying to authenticate SIMPLE but on the remote side only Kerberos is available. Have you add

Re: Convert Datastream to Collector or List

2016-03-19 Thread Suneel Marthi
DataStream ds = ... Iterator iter = DataStreamUtils.collect(ds); List list = Lists.newArrayList(iterator); Hope that helps. On Wed, Mar 16, 2016 at 7:37 AM, Ahmed Nader wrote: > Hi, > I want to pass an object of type DataStream ,after applying map function > on it, as a parameter to be u

RE: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Ken Krugler
With Hadoop 2.6 or later, you can use the s3a:// protocol (vs. s3n://), which should be more reliable (though some bug fixes aren't available until 2.7, see https://issues.apache.org/jira/browse/HADOOP-11571) And you can also then set these properties to control timeouts: > > fs.s3a.connecti

Re: Flink and YARN ship folder

2016-03-19 Thread Andrea Sella
Hi, After few tests I am able to write and read on Alluxio. I am using Flink 1.0.0 and in my case external libraries are not loaded from lib folder to classpath, it loads only flink-dist_2.11-1.0.0.jar. I need to specify the folder with -yt parameter to load the others. If I run `/bin/flink run -

Re: How to start with the first Kafka Message

2016-03-19 Thread Till Rohrmann
Hi Dominique, have you tried setting the Kafka property props.put("auto.offset.reset", "smallest");? Cheers, Till ​ On Thu, Mar 17, 2016 at 1:39 PM, Dominique Rondé < dominique.ro...@codecentric.de> wrote: > Hi folks, > > i have a kafka topic with messages from the last 7 days. Now i have a new

Re: akka.pattern.AskTimeoutException

2016-03-19 Thread stefanobaghino
Frederick, did you find the problem? I'm having a similar issue (the timeout apparently goes off immediately, despite the error message) with a very simple test job that reads from Kafka, appends a string to the input and writes it back to Kafka. Other jobs seem to work fine as well. -- View th