Re: How to share text file across tasks at run time in flink.

2016-08-24 Thread Baswaraj Kasture
Thanks to all for your inputs. Yeah, I could put all these common configurations/rules in DB and workers can pick it up dynamically at run time. In this case DB configuration/connection details need to be hard coded ? Is there any way worker can pickup DB name/credentials etc at run time

Regarding Global Configuration in Flink

2016-08-24 Thread Janardhan Reddy
Hi, Is global configuration same for all jobs in a Flink cluster. Is it a good idea to write a custom source which polls some external source every x minutes and updates the global config. Will the config change be propagated across all jobs? What happens when the size of global config grows

How to set a custom JAVA_HOME when run flink on YARN?

2016-08-24 Thread Renkai
Hi,all: The YARN cluster of my company is default to use Java 7,and I want to use java 8 form my Flink application, whant can I do to achieve it? -- View this message in context:

Re: Firing windows multiple times

2016-08-24 Thread Shannon Carey
What do you think about adding the current watermark to the window function metadata in FLIP-2? From: Shannon Carey > Date: Friday, August 12, 2016 at 6:24 PM To: Aljoscha Krettek >,

Re: How to get latency info from benchmark

2016-08-24 Thread Robert Metzger
Hi, Version 0.10-SNAPSHOT is pretty old. The snapshot repository of Apache probably doesn't keep old artifacts around forever. Maybe you can migrate the tests to Flink 0.10.0, or maybe even to a higher version. Regards, Robert On Wed, Aug 24, 2016 at 10:32 PM, Eric Fukuda

Re: How to get latency info from benchmark

2016-08-24 Thread Eric Fukuda
Hi Max, Robert, Thanks for the advice. I'm trying to build the "performance" project, but failing with the following error. Is there a solution for this? [ERROR] Failed to execute goal on project streaming-state-demo: Could not resolve dependencies for project com.dataartisans.flink:

Re: Dealing with Multiple sinks in Flink

2016-08-24 Thread vinay patil
Hi, Just an update, the window is not getting triggered when I change the parallelism to more than 1. Can you please explain why this is happening ? Regards, Vinay Patil On Wed, Aug 24, 2016 at 9:55 AM, vinay patil [via Apache Flink User Mailing List archive.]

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread Yassine Marzougui
Sorry I mistyped the code, it should be *timeWindow**(Time.minutes(10))* instead of *window**(Time.minutes(10)).* On Wed, Aug 24, 2016 at 9:29 PM, Yassine Marzougui wrote: > Hi subash, > > A stream is infinite, hence it has no notion of "final" count. To get > distinct

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread Yassine Marzougui
Hi subash, A stream is infinite, hence it has no notion of "final" count. To get distinct counts you need to define a period (= a window [1] ) over which you count elements and emit a result, by adding a winow operator before the reduce. For example the following will emit distinct counts every

Re: Setting number of TaskManagers

2016-08-24 Thread Foster, Craig
Oh, sorry, I didn't specify I was using YARN and don't necessarily want to specify with the command line option. From: Greg Hogan Reply-To: "user@flink.apache.org" Date: Wednesday, August 24, 2016 at 12:07 PM To: "user@flink.apache.org"

Re: Setting number of TaskManagers

2016-08-24 Thread Greg Hogan
The number of TaskManagers will be equal to the number of entries in the conf/slaves file. On Wed, Aug 24, 2016 at 3:04 PM, Foster, Craig wrote: > Is there a way to set the number of TaskManagers using a configuration > file or environment variable? I'm looking at the docs

Setting number of TaskManagers

2016-08-24 Thread Foster, Craig
Is there a way to set the number of TaskManagers using a configuration file or environment variable? I'm looking at the docs for it and it says you can set slots but not the number of TMs. Thanks, Craig

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread subash basnet
Hello Kostas, Sorry for late reply. But I couldn't understand how to apply split in datastream, such as in below to get the distinct output stream element with the count after applying group by and reduce. DataStream> gridWithDensity = pointsWithGridCoordinates.map(new

Re: Setting up zeppelin with flink

2016-08-24 Thread Trevor Grant
Frank, can you post the zeppelin flink log please? You can probably find it in zeppelin_dir/logs/*flink*.log You've got a few moving pieces here. I've never run zeppelin against Flink in a docker container. But I think the Zeppelin-Flink log is the first place to look. You say you can't get

Re: Dealing with Multiple sinks in Flink

2016-08-24 Thread vinay patil
Hi Max, I tried writing to local file as well, its giving me the same issue, I have attached the logs and dummy pipeline code. logs.txt dummy_pipeline.txt

Re: Setting up zeppelin with flink

2016-08-24 Thread Trevor Grant
Hey Frank, Saw your post on the Zeppelin list yesterday. I can look at it later this morning, but my gut feeling is a ghost Zeppelin daemon is running in the background and it's local Flink is holding the port 6123. This is fairly common and would explain the issue. Idk if you're on linux or

Re: How large a Flink cluster can have?

2016-08-24 Thread Alexis Gendronneau
Hi, If possible, with this information, that be great to know how jobmanager has to be scaled according to number of nodes ? Will 1 Jobmanager be able to manage hundreds of nodes ? Is there recommandation in terms of JM/TM ratio ? Thanks 2016-07-14 15:41 GMT+02:00 Robert Metzger

Re: How to get latency info from benchmark

2016-08-24 Thread Robert Metzger
Hi Eric, Max is right, the tool has been used for a different benchmark [1]. The throughput logger that should produce the right output is this one [2]. Very recently, I've opened a pull request for adding metric-measuring support into the engine [3]. Maybe that's helpful for your experiments.

Re: How to get latency info from benchmark

2016-08-24 Thread Maximilian Michels
I believe the AnaylzeTool is for processing logs of a different benchmark. CC Jamie and Robert who worked on the benchmark. On Wed, Aug 24, 2016 at 3:25 AM, Eric Fukuda wrote: > Hi, > > I'm trying to benchmark Flink without Kafka as mentioned in this post >

Re: Setting up zeppelin with flink

2016-08-24 Thread Maximilian Michels
Hi! There are some people familiar with the Zeppelin integration. CCing Till and Trevor. Otherwise, you could also send this to the Zeppelin community. Cheers, Max On Wed, Aug 24, 2016 at 12:58 PM, Frank Dekervel wrote: > Hello, > > for reference: > > i already found out that

Re: How to share text file across tasks at run time in flink.

2016-08-24 Thread Maximilian Michels
Hi! 1. The community is working on adding side inputs to the DataStream API. That will allow you to easily distribute data to all of your workers. 2. In the meantime, you could use `.broadcast()` on a DataSet to broadcast data to all workers. You still have to join that data with another stream

Re: Setting up zeppelin with flink

2016-08-24 Thread Frank Dekervel
Hello, for reference: i already found out that "connect to existing process" was my error here: it means connecting to an existing zeppelin interpreter, not an existing flink cluster. After fixing my error, i'm now in the same situation as described here:

Re: Dealing with Multiple sinks in Flink

2016-08-24 Thread Maximilian Michels
Hi Vinay, Does this only happen with the S3 file system or also with your local file system? Could you share some example code or log output of your running job? Best, Max On Wed, Aug 24, 2016 at 4:20 AM, Vinay Patil wrote: > Hi, > > In our flink pipeline we are

Re: FLINK-4329 fix version

2016-08-24 Thread Maximilian Michels
Added a fix version 1.1.2 and 1.2.0 because a pull request is under way. On Tue, Aug 23, 2016 at 1:17 PM, Ufuk Celebi wrote: > On Tue, Aug 23, 2016 at 12:28 PM, Yassine Marzougui > wrote: >> The fix version of FLINK-4329 in JIRA is set to 1.1.1, but 1.1.1

Re: Checking for existance of output directory/files before running a batch job

2016-08-24 Thread Maximilian Michels
Forgot to mention, this is on the master. For Flink < 1.2.x, you will have to use GlobalConfiguration.get(); On Wed, Aug 24, 2016 at 12:23 PM, Maximilian Michels wrote: > Hi Niels, > > The problem is that such method only works reliably if the cluster > configuration, e.g. Flink

Re: Checking for existance of output directory/files before running a batch job

2016-08-24 Thread Maximilian Michels
Hi Niels, The problem is that such method only works reliably if the cluster configuration, e.g. Flink and Hadoop config files, are present on the client machine. Also, the environment variables have to be set correctly. This is usually not the case when working from the IDE. But seems like your

Re: "Failed to retrieve JobManager address" in Flink 1.1.1 with JM HA

2016-08-24 Thread Maximilian Michels
Hi Hironori, That's what I thought. So it won't be an issue for most users who do not comment out the JobManager url from the config. Still, the information printed is not correct. The issue has just been fixed. You will have to wait for the next minor release 1.1.2 or build the 'release-1.1'

Re: sharded state, 2-step operation

2016-08-24 Thread Stephan Ewen
Hi! The "feedback loop" sounds like a solution, yes. Actually, that works well with the CoMap / CoFlatMap - one input to the CoMap would be the original value, the other input the feedback value.

Re: JobManager HA without Distributed FileSystem

2016-08-24 Thread Stephan Ewen
Hi! - Concerning replication to other JobManagers - this could be an extension, but it would need to also support additional replacement JobManagers coming up later, so it would need a replication service in the JobManagers, not just a "send to all" at program startup. - That would work in

Re: "Failed to retrieve JobManager address" in Flink 1.1.1 with JM HA

2016-08-24 Thread Hironori Ogibayashi
Ufuk, Max, Thank you for your answer and opening JIRA. I will wait for the fix. As Max mentioned, I first commented out jobmanager.rpc.address, jobmanager.rpc.port. When I tried setting localhost and 6123 respectively, it worked. Regards, Hironori 2016-08-24 0:54 GMT+09:00 Maximilian Michels

Re: JobManager HA without Distributed FileSystem

2016-08-24 Thread Konstantin Knauf
Hi Stephan, thanks for the quick response, understood. Is there a reason why JAR files and JobGraph are not sent to all JobManagers by the client? Accordingly, why don't all taskmanagers sent Checkpoint Metadata to all JobManagers? I did not have any other storage at mind [1]. I am basically

Re: flink-shaded-hadoop

2016-08-24 Thread Aljoscha Krettek
Hi, this might be due to a bug in the Flink 1.1.0 maven dependencies. Can you try updating to Flink 1.1.1? Cheers, Aljoscha On Mon, 22 Aug 2016 at 07:48 wrote: > Hi, > every one , when i use scala version 2.10,and set the sbt project(add >