Re: Exploring Performance Testing

2016-10-18 Thread Lukasz Cwik
FYI, there was a PR which was outstanding which was about adding the Nexmark suite: https://github.com/apache/incubator-beam/pull/366 On Tue, Oct 18, 2016 at 1:12 PM, Ismaël Mejía wrote: > @Jason, Just some additional refs for ideas, since I already researched a > little >

Re: Exploring Performance Testing

2016-10-18 Thread Ismaël Mejía
@Jason, Just some additional refs for ideas, since I already researched a little bit about how people evaluated this in other Apache projects. Yahoo published one benchmarking analysis in different streaming frameworks like a year ago: https://github.com/yahoo/streaming-benchmarks And the flink

Re: Exploring Performance Testing

2016-10-18 Thread Ismaël Mejía
Hello, Now that we are discussing about the subject of performance testing, I want to jump into the conversation to remind everybody that we have a really interesting benchmarking suite already contributed by google that has (sadly) not been merged yet.

Re: [DISCUSS] Sources and Runners

2016-10-18 Thread Robert Bradshaw
Eventually we'll be able to communicate intent with the runner much more directly via the ProcessContinuation object: https://github.com/apache/incubator-beam/blob/a0f649eaca8d8bd47d22db0ba7150fea1bf07975/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L658 On Tue, Oct 18,

Re: [DISCUSS] Sources and Runners

2016-10-18 Thread Jean-Baptiste Onofré
Thanks for the update and summary. Regards JB ⁣​ On Oct 18, 2016, 20:47, at 20:47, Amit Sela wrote: >I wanted to summarize here a couple of important points raised in some >PRs >I was involved with, and while those PRs were about KafkaIO and related >to >the Spark/Direct

Re: [DISCUSS] Sources and Runners

2016-10-18 Thread Raghu Angadi
One way I would rephrase the concern from unbounded source developer point of view : - What is the recommended blocking behavior for start() and advance()? E.g. on one extreme should they wait till there is a record? Mostly this will be bad. I am glad at pull/1125

Re: Exploring Performance Testing

2016-10-18 Thread Jean-Baptiste Onofré
It sounds like a good idea to me. Regards JB On 10/18/2016 08:08 PM, Amit Sela wrote: @Jesse how about runners "tracing" the constructed DAG (by Beam) so that it's clear what the runner actually executed ? Example: For the SparkRunner, a ParDo translates to a mapPartitions transformation.

Re: Exploring Performance Testing

2016-10-18 Thread Jesse Anderson
@Dan before starting with Beam, I'd want to know how much performance I've giving up by not programming directly to the API. On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin wrote: > I think there are lots of excellent one-off performance studies, but I'm > not sure

Re: Exploring Performance Testing

2016-10-18 Thread Jesse Anderson
I found data Artisan's benchmarking post . They also shared the code . I didn't dig in much, but they did a wide range of algorithms. They have

Build failed in Jenkins: beam_Release_NightlySnapshot #203

2016-10-18 Thread Apache Jenkins Server
See Changes: [lcwik] Move the step output ids to use a flat namespace. Also add a logical [dhalperi] Remove Remaining Nested Contexts from NullableCoder [bchambers] Fix SplittableParDoTest [dhalperi] AvroIO.Write: minor

Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-18 Thread Maximilian Michels
Great to have another Runner on board! Congrats! -Max On Tue, Oct 18, 2016 at 8:10 AM, Jean-Baptiste Onofré wrote: > Awesome ! > > Great job guys ! > > Thanks to Thomas, Vlad, Guaray and Ken for this. > > Regards > JB > > > On 10/17/2016 06:51 PM, Kenneth Knowles wrote: >>

Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-18 Thread Jean-Baptiste Onofré
Awesome ! Great job guys ! Thanks to Thomas, Vlad, Guaray and Ken for this. Regards JB On 10/17/2016 06:51 PM, Kenneth Knowles wrote: Hi all, I would to, once again, call attention to a great addition to Beam: a runner for Apache Apex. After lots of review and much thoughtful revision,

Re: Introduction

2016-10-18 Thread Jean-Baptiste Onofré
Hi Neelesh and Jesse, I already created a Jira about a M/R runner and I also started to work on it (it's on a branch on my github). I'm looking for help on this, so, please, let me know if you are interested ;) Thanks, Regards JB On 10/17/2016 08:36 PM, Jesse Anderson wrote: Neelesh, I

Re: Introduction

2016-10-18 Thread Jean-Baptiste Onofré
Hi Neeles, Welcome aboard. Regards JB On 10/17/2016 08:14 PM, Neelesh Salian wrote: Hello folks, I am Neelesh Salian; I recently joined the Beam community and I wanted to take this opportunity to formally introduce myself. I have been working with the Hadoop and Spark ecosystems over the