MaxPermSize on yarn

2015-11-16 Thread Gwenhael Pasquiers
Hi, We're having some OOM permgen exceptions when running on yarn. We're not yet sure if it is either a consequence or a cause of our crashes, but we've been trying to increase that value... And we did not find how to do it. I've seen that the yarn-daemon.sh sets a 256m value. It looks to me

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
Why not use an existing benchmarking tool -- is there one? Perhaps you'd like to build something like YCSB [0] but for streaming workloads? Apache Storm is the OSS framework that's been around the longest. Search for "apache storm benchmark" and you'll get some promising hits. Looks like

Re: Multilang Support on Flink

2015-11-16 Thread Maximilian Michels
Hi Welly, It's in the main Flink repository. Actually, this has just been integrated with the Python API, see https://github.com/apache/flink/blob/master/flink-libraries/flink-python/src/main/java/org/apache/flink/python/api/PythonPlanBinder.java Before it was independent

Re: Apache Flink Operator State as Query Cache

2015-11-16 Thread Anwar Rizal
Stephan, Having a look at the brand new 0.10 release, I noticed that OperatorState is not implemented for ConnectedStream, which is quite the opposite of what you said below. Or maybe I misunderstood your sentence here ? Thanks, Anwar. On Wed, Nov 11, 2015 at 10:49 AM, Stephan Ewen

Apache Flink 0.10.0 released

2015-11-16 Thread Fabian Hueske
Hi everybody, The Flink community is excited to announce that Apache Flink 0.10.0 has been released. Please find the release announcement here: --> http://flink.apache.org/news/2015/11/16/release-0.10.0.html Best, Fabian

Re: Apache Flink 0.10.0 released

2015-11-16 Thread Slim Baltagi
Hi I’m very pleased to be first to tweet about the release of Apache Flink 0.10.0 just after receiving Fabian’s email :) Flink 1.0 is around the corner now! Slim Baltagi On Nov 16, 2015, at 7:53 AM, Fabian Hueske wrote: > Hi everybody, > > The Flink community is excited

Re: Apache Flink Operator State as Query Cache

2015-11-16 Thread Stephan Ewen
Hi Anwar! 0.10.0 was feature frozen at that time already and under testing. Key/value state on connected streams will have to go into the next release... Stephan On Mon, Nov 16, 2015 at 3:00 PM, Anwar Rizal wrote: > Stephan, > > Having a look at the brand new 0.10

Re: Error handling

2015-11-16 Thread Stephan Ewen
Makes sense. The class of operations that work "per-tuple" before the data is forwarded to the network stack could be extended to have error traps. @Nick: Is that what you had in mind? On Mon, Nov 16, 2015 at 7:27 PM, Aljoscha Krettek wrote: > Hi, > I don’t think that

Re: Error handling

2015-11-16 Thread Aljoscha Krettek
Hi, I don’t think that alleviates the problem. Sometimes you might want the system to continue even if stuff outside the UDF fails. For example, if a serializer does not work because of a null value somewhere. You would, however, like to get a message about this somewhere, I assume. Cheers,

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Konstantin Knauf
Hi Aljoscha, I changed the Timestamp Extraktor to save the lastSeenTimestamp and only emit with getCurrentWatermark [1] as you suggested. So basically I do the opposite than before (only watermarks per events vs only watermarks per autowatermark). And now it works :). The question remains, why it

Re: Creating a representative streaming workload

2015-11-16 Thread Vasiliki Kalavri
Hi, thanks Nick and Ovidiu for the links! Just to clarify, we're not looking into creating a generic streaming benchmark. We have quite limited time and resources for this project. What we want is to decide on a set of 3-4 _common_ streaming applications. To give you an idea, for the batch

Re: Error handling

2015-11-16 Thread Nick Dimiduk
> > I have been thinking about this, maybe we can add a special output stream > (for example Kafka, but can be generic) that would get errors/exceptions > that where throws during processing. The actual processing would not stop > and the messages in this special stream would contain some

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
All those should apply for streaming too... On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri < vasilikikala...@gmail.com> wrote: > Hi, > > thanks Nick and Ovidiu for the links! > > Just to clarify, we're not looking into creating a generic streaming > benchmark. We have quite limited time and

Re: Different CoGroup behavior inside DeltaIteration

2015-11-16 Thread Stephan Ewen
It is actually very important that the co group in delta iterations works like that. If the CoGroup touched every element in the solution set, the "decreasing work" effect would not happen. The delta iterations are designed for cases where specific updates to the solution are made, driven by the

Re: Different CoGroup behavior inside DeltaIteration

2015-11-16 Thread Duc Kien Truong
Hi, Thanks for the suggestion. I'm trying to use the delta iteration so that I can get the empty work set convergence criteria for free. But since doing an outer join between the work set and the solution set is not possible using cogroup, I will try to adapt my algorithm to use the bulk

Re: Apache Flink Operator State as Query Cache

2015-11-16 Thread Welly Tambunan
Hi Stephan, So that will be in Flink 1.0 right ? Cheers On Mon, Nov 16, 2015 at 9:06 PM, Stephan Ewen wrote: > Hi Anwar! > > 0.10.0 was feature frozen at that time already and under testing. > Key/value state on connected streams will have to go into the next > release... >

Re: Apache Flink 0.10.0 released

2015-11-16 Thread Welly Tambunan
Great Job guys, So this is the first production ready for Streaming API ! Cool ! Cheers On Mon, Nov 16, 2015 at 9:02 PM, Leonard Wolters wrote: > congrats! > > L. > > > On 16-11-15 14:53, Fabian Hueske wrote: > > Hi everybody, > > The Flink community is excited to announce

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-16 Thread Konstantin Knauf
Hi Aljoscha, thanks for your answer. Yes I am using the same TimestampExtractor-Class. The timestamps look good to me. Here an example. {"time": 1447666537260, ...} And parsed: 2015-11-16T10:35:37.260+01:00 The order now is stream .map(dummyMapper) .assignTimestamps(...) .timeWindow(...) Is