Re: Why Scala Option is not a valid key?

2016-04-04 Thread Chiwan Park
I just found that Timur created a JIRA issue for this (FLINK-3698). Regards, Chiwan Park > On Mar 31, 2016, at 7:27 PM, Till Rohrmann wrote: > > Actually I think that it’s not correct that the OptionType cannot be used as > a key type. In fact it is similar to a

Re: building for Scala 2.11

2016-04-04 Thread Balaji Rajagopalan
In your pom file you can mention against which version of scala you want to build and also remember to add the scala version in the artifactId in all the dependencies which takes scala version, there might be some libraries which are scala agnostic there you do not have to specify the scala

Re: building for Scala 2.11

2016-04-04 Thread Chiwan Park
Hi Andrew, The method to build Flink with Scala 2.11 is described in Flink documentation [1]. I think this is not relevant but just FYI, to build your application based on Flink 1.0 (or later) with Scala 2.11, you should be careful to set Flink dependencies. There is a guide in Wiki [2].

Re: FYI: Added Documentation on Basic Concepts

2016-04-04 Thread Balaji Rajagopalan
Nice write up, one question though my understanding of keyed stream is that it will fork n number of streams from one stream based on n keys, if that is true it can be pictorially depicted as well and the apply function will can be shown to operate over the time period by clearly marking a time

building for Scala 2.11

2016-04-04 Thread Andrew Gaydenko
Hi! How to build the project for Scala 2.11? -- Regards, Andrew

Re: Integrate Flink with S3 on EMR cluster

2016-04-04 Thread Timur Fayruzov
Thanks for the answer, Ken. My understanding is that file system selection is driven by the following sections in core-site.xml: fs.s3.impl org.apache.hadoop.fs.s3native.NativeS3FileSystem fs.s3n.impl com.amazon.ws.emr.hadoop.fs.EmrFileSystem If I run the program using

Re: Integrate Flink with S3 on EMR cluster

2016-04-04 Thread Ken Krugler
Normally in Hadoop jobs you’d want to use s3n:// as the protocol, not s3. Though EMR has some support for magically treating the s3 protocol as s3n (or maybe s3a now, with Hadoop 2.6 or later) What happens if you use s3n:/// for the --input parameter? — Ken > On Apr 4, 2016, at 2:51pm, Timur

Integrate Flink with S3 on EMR cluster

2016-04-04 Thread Timur Fayruzov
Hello, I'm trying to run a Flink WordCount job on an AWS EMR cluster. I succeeded with a three-step procedure: load data from S3 to cluster's HDFS, run Flink Job, unload outputs from HDFS to S3. However, ideally I'd like to read/write data directly from/to S3. I followed the instructions here:

Re: YARN High Availability

2016-04-04 Thread Stephan Ewen
This seems to the the critical part in the logs: 2016-03-31 09:01:52,234 INFO org.apache.flink.yarn.YarnJobManager - Re-submitting 0 job graphs. 2016-03-31 09:02:51,182 INFO org.apache.flink.yarn.YarnJobManager - Stopping YARN JobManager with status FAILED and

Re: FYI: Added Documentation on Basic Concepts

2016-04-04 Thread Henry Saputra
This is great. This will open up the concepts in Flink that kind of "magic" from external users. - Henry On Mon, Apr 4, 2016 at 2:39 AM, Ufuk Celebi wrote: > Dear Flink community, > > I'm happy to announce that we have added a long overdue section on > general Flink concepts

Re: Using GeoIP2 in Flink Streaming

2016-04-04 Thread Stephan Ewen
Your code has to send the variable "DatabaseReader reader" into the cluster together with the map function. The class is not serializable, which means you cannot ship it like that. If you control the code of the DatabaseReader , try to make the class serializable. If you cannot change the code

Using GeoIP2 in Flink Streaming

2016-04-04 Thread Zhun Shen
Hi there, In my case, I want to use GeoIP2 in Flink Streaming, I know I need to serialize geoip2 related classes using Kryo. But I did know how to do it. Flink version: 1.0.0 Kafka version: 0.9.0.0 Deploy Mode: Local My demo code as below: File database = new

Re: Kafka Test Error

2016-04-04 Thread Zhun Shen
I created a new project, and only add kaka-client, Flink-kafka-connect and Flink streaming libs, it works. Thanks. > On Apr 2, 2016, at 12:54 AM, Stephan Ewen wrote: > > The issue may be that you include Kafka twice: > > 1) You explicitly add

Re: TopologyBuilder throws java.lang.ExceptionInInitializerError

2016-04-04 Thread Robert Metzger
Hi, I suspect that this dependency: storm storm-kafka 0.9.0-wip16a-scala292 pulls in a different storm version. Can you exclude the storm from that dependency? You can also run: mvn clean install and

Re: scaling a flink streaming application on a single node

2016-04-04 Thread Aljoscha Krettek
Hi, I am not sure since people normally don't run Flink on such large machines. They rather run it on many smaller machines. I will definitely be interesting too see your new results where the Job can actually use all the memory available on the machine. -- aljoscha On Mon, 4 Apr 2016 at 15:54

Re: scaling a flink streaming application on a single node

2016-04-04 Thread Shinhyung Yang
Dear Aljoscha and Ufuk, Thank you for clarifying! Yes I'm running this wordcount application on a 64-core machine with 120GB ram allocated for users. > In that case, the amount of RAM you give to the TaskManager seems to low. > Could you try re-running your experiments with: >

Re: CEP blog post

2016-04-04 Thread Till Rohrmann
Thanks a lot to all for the valuable feedback. I've incorporated your suggestions and will publish the article, once Flink 1.0.1 has been released (we need 1.0.1 to run the example code). Cheers, Till On Mon, Apr 4, 2016 at 10:29 AM, gen tang wrote: > It is really a good

Re: FYI: Updated Slides Section

2016-04-04 Thread Maximilian Michels
Hi Ufuk, Thanks for updating the page. The "latest documentation" points to the page itself and not the documentation. I've fixed that and added the slides from Big Data Warsaw. Cheers, Max On Mon, Apr 4, 2016 at 12:09 PM, Ufuk Celebi wrote: > @Paris: Just added it. Thanks for

Re: FYI: Updated Slides Section

2016-04-04 Thread Ufuk Celebi
@Paris: Just added it. Thanks for the pointer. Great slides!

Re: FYI: Updated Slides Section

2016-04-04 Thread Paris Carbone
Some people might find my slides on the FT fundamentals from last summer interesting. If you like it feel free to include it. http://www.slideshare.net/ParisCarbone/tech-talk-google-on-flink-fault-tolerance-and-ha Paris On 04 Apr 2016, at 11:33, Ufuk Celebi

FYI: Added Documentation on Basic Concepts

2016-04-04 Thread Ufuk Celebi
Dear Flink community, I'm happy to announce that we have added a long overdue section on general Flink concepts to the documentation. You can find it here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html Thanks to Stephan Ewen who wrote this great overview. I

FYI: Updated Slides Section

2016-04-04 Thread Ufuk Celebi
Dear Flink community, I have updated the Material section on the Flink project page and moved the slides section to a separate page. You can find links to slides and talks here now: http://flink.apache.org/slides.html I've added slides for talks from this year by Till Rohrmann, Vasia Kalavri,

CEP API: Question on FollowedBy

2016-04-04 Thread Anwar Rizal
Hi All, I saw Till's blog preparation. It will be a very helpful blog. I hope that some other blogs that explain how it works will come soon :-) I have a question on followedBy pattern matching semantic. >From the documentation

Re: scaling a flink streaming application on a single node

2016-04-04 Thread Ufuk Celebi
Just to clarify: Shinhyung is running one a single node with 4 CPUs, each having 16 cores. On Mon, Apr 4, 2016 at 10:32 AM, Robert Metzger wrote: > Hi, > > usually it doesn't make sense to run multiple task managers on a single > machine to get more slots. > Your machine has

Re: scaling a flink streaming application on a single node

2016-04-04 Thread Aljoscha Krettek
Hi, I'm afraid no one read your email carefully. You indeed have one very big machine with 64 physical CPU cores and 120 GB of RAM, correct? In that case, the amount of RAM you give to the TaskManager seems to low. Could you try re-running your experiments with: jobmanager.heap.mb: 5000

Re: scaling a flink streaming application on a single node

2016-04-04 Thread Robert Metzger
Hi, usually it doesn't make sense to run multiple task managers on a single machine to get more slots. Your machine has only 4 CPU cores, so you are just putting a lot of pressure on the cpu scheduler.. On Thu, Mar 31, 2016 at 7:16 PM, Shinhyung Yang wrote: > Thank

Re: CEP blog post

2016-04-04 Thread gen tang
It is really a good article. Please put it on Flink Blog Cheers Gen On Fri, Apr 1, 2016 at 9:56 PM, Till Rohrmann wrote: > Hi Flink community, > > I've written a short blog [1] post about Flink's new CEP library which > basically showcases its functionality using a

Re: CEP blog post

2016-04-04 Thread Maximilian Michels
Made a few suggestions. Reads well, Till! On Mon, Apr 4, 2016 at 10:10 AM, Ufuk Celebi wrote: > Same here. > > +1 to publish > > On Mon, Apr 4, 2016 at 10:04 AM, Aljoscha Krettek wrote: >> Hi, >> I like it. Very dense and very focused on the example but I

varying results: local VS cluster

2016-04-04 Thread Lydia Ickler
Hi all, I have an issue regarding execution on 1 machine VS 5 machines. If I execute the following code the results are not the same though I would expect them to be since the input file is the same. Do you have any suggestions? Thanks in advance! Lydia ExecutionEnvironment env =

Re: ContinuousProcessingTimeTrigger does not fire

2016-04-04 Thread Hironori Ogibayashi
It worked as expected. One thing I also need to modify was the condition in onProcessingTime and onElement if (currentTime > nextFireTimestamp) { to if (currentTime >= nextFireTimestamp) { Because there was a case when currentTime and nextFireTimestamp was equal, so the