Re: Watermarks as "process completion" flags

2015-11-24 Thread Fabian Hueske
Hi Anton, If I got your requirements right, you are looking for a solution that continuously produces updated partial aggregates in a streaming fashion. When a special event (no more trades) is received, you would like to store the last update as a final result. Is that correct? You can compute

Re: Watermarks as "process completion" flags

2015-11-24 Thread Anton Polyakov
Hi Max thanks for reply. From what I understand window works in a way that it buffers records while window is open, then apply transformation once window close is triggered and pass transformed result. In my case then window will be open for few hours, then the whole amount of trades will be

Standalone Cluster vs YARN

2015-11-24 Thread Welly Tambunan
Hi All, I would like to know if there any feature differences between using Standalone Cluster vs YARN ? Until now we are using Standalone cluster for our jobs. Is there any added value for using YARN ? We don't have any hadoop infrastructure in place right now but we can provide that if

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Chiwan Park
Thanks for correction @Fabian. :) > On Nov 25, 2015, at 4:40 AM, Suneel Marthi wrote: > > Guess, it makes sense to add readHadoopXXX() methods to > StreamExecutionEnvironment (for feature parity with what's existing presently > in ExecutionEnvironment). > > Also

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Nick Dimiduk
I completely missed this, thanks Chiwan. Can these be used with DataStreams as well as DataSets? On Tue, Nov 24, 2015 at 10:06 AM, Chiwan Park wrote: > Hi Nick, > > You can use Hadoop Input/Output Format without modification! Please check > the documentation[1] in Flink

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Chiwan Park
I’m not streaming expert. AFAIK, the layer can be used with only DataSet. There are some streaming-specific features such as distributed snapshot in Flink. These need some supports of source and sink. So you have to implement I/O. > On Nov 25, 2015, at 3:22 AM, Nick Dimiduk

Using Hadoop Input/Output formats

2015-11-24 Thread Nick Dimiduk
Hello, Is it possible to use existing Hadoop Input and OutputFormats with Flink? There's a lot of existing code that conforms to these interfaces, seems a shame to have to re-implement it all. Perhaps some adapter shim..? Thanks, Nick

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Chiwan Park
Hi Nick, You can use Hadoop Input/Output Format without modification! Please check the documentation[1] in Flink homepage. [1] https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk wrote: >

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Fabian Hueske
Hi Nick, you can use Flink's HadoopInputFormat wrappers also for the DataStream API. However, DataStream does not offer as much "sugar" as DataSet because StreamEnvironment does not offer dedicated createHadoopInput or readHadoopFile methods. In DataStream Scala you can read from a Hadoop

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Suneel Marthi
Guess, it makes sense to add readHadoopXXX() methods to StreamExecutionEnvironment (for feature parity with what's existing presently in ExecutionEnvironment). Also Flink-2949 addresses the need to add relevant syntactic sugar wrappers in DataSet api for the code snippet in Fabian's previous

Re: Cancel Streaming Job

2015-11-24 Thread Ufuk Celebi
You can use the current release candidate if you like to try it out: Binaries are here: http://people.apache.org/~rmetzger/flink-0.10.1-rc1/ The dependency with version 0.10.1 is found in the staging repositories: https://repository.apache.org/content/repositories/orgapacheflink-1058 If you

Re: Cancel Streaming Job

2015-11-24 Thread Welly Tambunan
Hi Gyula and Ufuk, Thanks, I will give it a try. Cheers On Tue, Nov 24, 2015 at 3:42 PM, Ufuk Celebi wrote: > You can use the current release candidate if you like to try it out: > > Binaries are here: > > http://people.apache.org/~rmetzger/flink-0.10.1-rc1/ > > The

Re: Adding TaskManager on Cluster

2015-11-24 Thread Till Rohrmann
Hi Welly, you can always start a new TaskManager by simply calling taskmanager.sh start [streaming|batch], depending whether you are running a streaming cluster or a batch cluster. You can find the script in /bin. Cheers, Till ​ On Tue, Nov 24, 2015 at 10:27 AM, Welly Tambunan

Re: Adding TaskManager on Cluster

2015-11-24 Thread Welly Tambunan
Hi Till, I've just tried that. It's works like a charm. Thanks a lot. Is there any documentation on taskmanager.sh and other script and the parameters ? I try to look at the docs but can't find it Thanks again Cheers On Tue, Nov 24, 2015 at 4:29 PM, Till Rohrmann

Adding TaskManager on Cluster

2015-11-24 Thread Welly Tambunan
Hi All, Currently we are running flink using standalone mode. Is there any way to add one node ( task manager ) to the cluster without bringing the cluster down ? Cheers -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com

Re: LDBC Graph Data into Flink

2015-11-24 Thread Vasiliki Kalavri
Great, thanks for sharing Martin! On 24 November 2015 at 15:00, Martin Junghanns wrote: > Hi, > > I wrote a short blog post about the ldbc-flink tool including a short > overview of Flink and a Gelly example. > > http://ldbcouncil.org/blog/ldbc-and-apache-flink > >

Re: LDBC Graph Data into Flink

2015-11-24 Thread Till Rohrmann
Nice blog post Martin! On Tue, Nov 24, 2015 at 3:14 PM, Vasiliki Kalavri wrote: > Great, thanks for sharing Martin! > > On 24 November 2015 at 15:00, Martin Junghanns > wrote: > >> Hi, >> >> I wrote a short blog post about the ldbc-flink

Re: Adding TaskManager on Cluster

2015-11-24 Thread Ufuk Celebi
I’ve added a section about this to the standalone cluster setup guide. The webpage should be updated tonight by the automatic build bot. – Ufuk > On 24 Nov 2015, at 10:39, Welly Tambunan wrote: > > Hi Till, > > I've just tried that. It's works like a charm. Thanks a lot.

Re: Creating a representative streaming workload

2015-11-24 Thread Andra Lungu
Hi, Sorry for the ultra-late reply. Another real-life streaming scenario would be the one I am working on: - collecting data from telecom cells in real-time - and filtering out certain information or enriching/correlating (adding additional info based on the parameters received) events - this is

Re: How to pass hdp.version to flink on yarn

2015-11-24 Thread Maximilian Michels
Hi Jagat, I think your issue here are not the JVM options. You are missing shell environment variables during the container launch. Adding those to the user's .bashrc or .profile should fix the problem. Best regards, Max On Mon, Nov 23, 2015 at 10:14 PM, Jagat Singh

Re: Running Flink in Cloudfoundry Environment

2015-11-24 Thread Maximilian Michels
Hi Madhukar, I'm not too familiar with Cloudfoundry but seems like you would have to write a service integration. Ideally, a Hadoop YARN service is already available in your Cloudfoundry environment. You could then create an application which deploys a Flink job on YARN. Best regards, Max On