Re: Exploring Performance Testing

2016-10-19 Thread Bobby Evans
pe we can use/refine to
> evaluate
> > the
> > runners.
> >
> > Ismaël Mejía
> >
> >
> > On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> >> It sounds like a good idea to me.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 10/18/2016 08:08 PM, Amit Sela wrote:
> >>
> >>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so
> that
> >>> it's clear what the runner actually executed ?
> >>>
> >>> Example:
> >>> For the SparkRunner, a ParDo translates to a mapPartitions
> >>> transformation.
> >>>
> >>> That could provide transparency when debugging/benchmarking pipelines
> >>> per-runner.
> >>>
> >>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson 
> >>> wrote:
> >>>
> >>> @Dan before starting with Beam, I'd want to know how much performance
> >>>> I've
> >>>> giving up by not programming directly to the API.
> >>>>
> >>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
> >>>>  >>>>
> >>>>>
> >>>>> wrote:
> >>>>
> >>>> I think there are lots of excellent one-off performance studies, but
> I'm
> >>>>> not sure how useful that is to Beam.
> >>>>>
> >>>>> From a test infra point of view, I'm wondering more about tracking of
> >>>>> performance over time, identifying regressions, etc.
> >>>>>
> >>>>> Google has some tools like PerfKit
> >>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> >>>>> basically a skin on a database + some scripts to load and query data;
> >>>>>
> >>>> but I
> >>>>
> >>>>> don't love it. Do other Apache projects do public, long-term
> >>>>> benchmarking
> >>>>> and performance regression testing?
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <
> je...@smokinghand.com
> >>>>> >
> >>>>> wrote:
> >>>>>
> >>>>> I found data Artisan's benchmarking post
> >>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
> >>>>>> exactly-once-stream-processing-with-apache-flink/>.
> >>>>>> They also shared the code <https://github.com/dataArtisa
> >>>>>> ns/performance
> >>>>>>
> >>>>> .
> >>>>> I
> >>>>>
> >>>>>> didn't dig in much, but they did a wide range of algorithms. They
> have
> >>>>>>
> >>>>> the
> >>>>>
> >>>>>> native code, so you write the Beam code and check against the native
> >>>>>> performance.
> >>>>>>
> >>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> >>>>>> 
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
> >>>>>>>
> >>>>>> under
> >>>>>
> >>>>>> Beam.I can share my experience. Can you list items of interest to
> >>>>>>>
> >>>>>> know
> >>>>
> >>>>> so I
> >>>>>>
> >>>>>>> can answer them to the best of my knowledge.Cheers
> >>>>>>>
> >>>>>>>      From: Jason Kuster 
> >>>>>>>  To: dev@beam.incubator.apache.org
> >>>>>>>  Sent: Monday, October 17, 2016 5:06 PM
> >>>>>>>  Subject: Exploring Performance Testing
> >>>>>>>
> >>>>>>> Hey all,
> >>>>>>>
> >>>>>>> Now that we've covered some of the initial ground with regard to
> >>>>>>> correctness testing, I'm going to be starting work on performance
> >>>>>>>
> >>>>>> testing
> >>>>>
> >>>>>> and benchmarking. I wanted to reach out and see what people's
> >>>>>>>
> >>>>>> experiences
> >>>>>
> >>>>>> have been with performance testing and benchmarking
> >>>>>>> frameworks, particularly in other Apache projects. Anyone have any
> >>>>>>> experience or thoughts?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Jason
> >>>>>>>
> >>>>>>> --
> >>>>>>> ---
> >>>>>>> Jason Kuster
> >>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
> >
>

   

Re: Exploring Performance Testing

2016-10-18 Thread Lukasz Cwik
FYI, there was a PR which was outstanding which was about adding the
Nexmark suite: https://github.com/apache/incubator-beam/pull/366

On Tue, Oct 18, 2016 at 1:12 PM, Ismaël Mejía  wrote:

> @Jason, Just some additional refs for ideas, since I already researched a
> little
> bit about how people evaluated this in other Apache projects.
>
> Yahoo published one benchmarking analysis in different streaming frameworks
> like
> a year ago:
> https://github.com/yahoo/streaming-benchmarks
>
> And the flink guys extended it:
> https://github.com/dataArtisans/yahoo-streaming-benchmark
>
> Notice that the common approach comes from the classical database world,
> and it
> is to take one of the TPC queries suites (TPC-H or TPC-DS) and evaluate a
> data
> processing framework against it, Spark does this to evaluate their SQL
> performance.
>
> https://github.com/databricks/spark-sql-perf
>
> However this approach is not 100% aligned with Beam because AFAIK there is
> not a
> TPC suite for continuous processing, that's the reason why I found the
> NexMark
> suite as a more appropriate example.
>
>
> On Tue, Oct 18, 2016 at 9:50 PM, Ismaël Mejía  wrote:
>
> > Hello,
> >
> > Now that we are discussing about the subject of performance testing, I
> > want to
> > jump into the conversation to remind everybody that we have a really
> > interesting
> > benchmarking suite already contributed by google that has (sadly) not
> been
> > merged yet.
> >
> > https://github.com/apache/incubator-beam/pull/366
> > https://issues.apache.org/jira/browse/BEAM-160
> >
> > This is not exactly the kind of benchmark of the current discussion, but
> > for me
> > is a super valuable contribution that I hope we can use/refine to
> evaluate
> > the
> > runners.
> >
> > Ismaël Mejía
> >
> >
> > On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> >> It sounds like a good idea to me.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 10/18/2016 08:08 PM, Amit Sela wrote:
> >>
> >>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so
> that
> >>> it's clear what the runner actually executed ?
> >>>
> >>> Example:
> >>> For the SparkRunner, a ParDo translates to a mapPartitions
> >>> transformation.
> >>>
> >>> That could provide transparency when debugging/benchmarking pipelines
> >>> per-runner.
> >>>
> >>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson 
> >>> wrote:
> >>>
> >>> @Dan before starting with Beam, I'd want to know how much performance
> >>>> I've
> >>>> giving up by not programming directly to the API.
> >>>>
> >>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
> >>>>  >>>>
> >>>>>
> >>>>> wrote:
> >>>>
> >>>> I think there are lots of excellent one-off performance studies, but
> I'm
> >>>>> not sure how useful that is to Beam.
> >>>>>
> >>>>> From a test infra point of view, I'm wondering more about tracking of
> >>>>> performance over time, identifying regressions, etc.
> >>>>>
> >>>>> Google has some tools like PerfKit
> >>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> >>>>> basically a skin on a database + some scripts to load and query data;
> >>>>>
> >>>> but I
> >>>>
> >>>>> don't love it. Do other Apache projects do public, long-term
> >>>>> benchmarking
> >>>>> and performance regression testing?
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <
> je...@smokinghand.com
> >>>>> >
> >>>>> wrote:
> >>>>>
> >>>>> I found data Artisan's benchmarking post
> >>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
> >>>>>> exactly-once-stream-processing-with-apache-flink/>.
> >>>>>> They also shared the code <https://github.com/dataArtisa
> >>>>>> ns/performance
> >>>>>>
> >>>>> .
> >>>>> I
> >>>>

Re: Exploring Performance Testing

2016-10-18 Thread Ismaël Mejía
@Jason, Just some additional refs for ideas, since I already researched a
little
bit about how people evaluated this in other Apache projects.

Yahoo published one benchmarking analysis in different streaming frameworks
like
a year ago:
https://github.com/yahoo/streaming-benchmarks

And the flink guys extended it:
https://github.com/dataArtisans/yahoo-streaming-benchmark

Notice that the common approach comes from the classical database world,
and it
is to take one of the TPC queries suites (TPC-H or TPC-DS) and evaluate a
data
processing framework against it, Spark does this to evaluate their SQL
performance.

https://github.com/databricks/spark-sql-perf

However this approach is not 100% aligned with Beam because AFAIK there is
not a
TPC suite for continuous processing, that's the reason why I found the
NexMark
suite as a more appropriate example.


On Tue, Oct 18, 2016 at 9:50 PM, Ismaël Mejía  wrote:

> Hello,
>
> Now that we are discussing about the subject of performance testing, I
> want to
> jump into the conversation to remind everybody that we have a really
> interesting
> benchmarking suite already contributed by google that has (sadly) not been
> merged yet.
>
> https://github.com/apache/incubator-beam/pull/366
> https://issues.apache.org/jira/browse/BEAM-160
>
> This is not exactly the kind of benchmark of the current discussion, but
> for me
> is a super valuable contribution that I hope we can use/refine to evaluate
> the
> runners.
>
> Ismaël Mejía
>
>
> On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré 
> wrote:
>
>> It sounds like a good idea to me.
>>
>> Regards
>> JB
>>
>>
>> On 10/18/2016 08:08 PM, Amit Sela wrote:
>>
>>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so that
>>> it's clear what the runner actually executed ?
>>>
>>> Example:
>>> For the SparkRunner, a ParDo translates to a mapPartitions
>>> transformation.
>>>
>>> That could provide transparency when debugging/benchmarking pipelines
>>> per-runner.
>>>
>>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson 
>>> wrote:
>>>
>>> @Dan before starting with Beam, I'd want to know how much performance
>>>> I've
>>>> giving up by not programming directly to the API.
>>>>
>>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
>>>> >>>
>>>>>
>>>>> wrote:
>>>>
>>>> I think there are lots of excellent one-off performance studies, but I'm
>>>>> not sure how useful that is to Beam.
>>>>>
>>>>> From a test infra point of view, I'm wondering more about tracking of
>>>>> performance over time, identifying regressions, etc.
>>>>>
>>>>> Google has some tools like PerfKit
>>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
>>>>> basically a skin on a database + some scripts to load and query data;
>>>>>
>>>> but I
>>>>
>>>>> don't love it. Do other Apache projects do public, long-term
>>>>> benchmarking
>>>>> and performance regression testing?
>>>>>
>>>>> Dan
>>>>>
>>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson >>>> >
>>>>> wrote:
>>>>>
>>>>> I found data Artisan's benchmarking post
>>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
>>>>>> exactly-once-stream-processing-with-apache-flink/>.
>>>>>> They also shared the code <https://github.com/dataArtisa
>>>>>> ns/performance
>>>>>>
>>>>> .
>>>>> I
>>>>>
>>>>>> didn't dig in much, but they did a wide range of algorithms. They have
>>>>>>
>>>>> the
>>>>>
>>>>>> native code, so you write the Beam code and check against the native
>>>>>> performance.
>>>>>>
>>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
>>>>>> 
>>>>>> wrote:
>>>>>>
>>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
>>>>>>>
>>>>>> under
>>>>>
>>>>>> Beam.I can share my experience. Can you list items of interest to
>>>>>>>
>>>>>> know
>>>>
>>>>> so I
>>>>>>
>>>>>>> can answer them to the best of my knowledge.Cheers
>>>>>>>
>>>>>>>   From: Jason Kuster 
>>>>>>>  To: dev@beam.incubator.apache.org
>>>>>>>  Sent: Monday, October 17, 2016 5:06 PM
>>>>>>>  Subject: Exploring Performance Testing
>>>>>>>
>>>>>>> Hey all,
>>>>>>>
>>>>>>> Now that we've covered some of the initial ground with regard to
>>>>>>> correctness testing, I'm going to be starting work on performance
>>>>>>>
>>>>>> testing
>>>>>
>>>>>> and benchmarking. I wanted to reach out and see what people's
>>>>>>>
>>>>>> experiences
>>>>>
>>>>>> have been with performance testing and benchmarking
>>>>>>> frameworks, particularly in other Apache projects. Anyone have any
>>>>>>> experience or thoughts?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Jason
>>>>>>>
>>>>>>> --
>>>>>>> ---
>>>>>>> Jason Kuster
>>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>


Re: Exploring Performance Testing

2016-10-18 Thread Ismaël Mejía
Hello,

Now that we are discussing about the subject of performance testing, I want
to
jump into the conversation to remind everybody that we have a really
interesting
benchmarking suite already contributed by google that has (sadly) not been
merged yet.

https://github.com/apache/incubator-beam/pull/366
https://issues.apache.org/jira/browse/BEAM-160

This is not exactly the kind of benchmark of the current discussion, but
for me
is a super valuable contribution that I hope we can use/refine to evaluate
the
runners.

Ismaël Mejía


On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré 
wrote:

> It sounds like a good idea to me.
>
> Regards
> JB
>
>
> On 10/18/2016 08:08 PM, Amit Sela wrote:
>
>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so that
>> it's clear what the runner actually executed ?
>>
>> Example:
>> For the SparkRunner, a ParDo translates to a mapPartitions transformation.
>>
>> That could provide transparency when debugging/benchmarking pipelines
>> per-runner.
>>
>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson 
>> wrote:
>>
>> @Dan before starting with Beam, I'd want to know how much performance I've
>>> giving up by not programming directly to the API.
>>>
>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
>>> >>
>>>>
>>>> wrote:
>>>
>>> I think there are lots of excellent one-off performance studies, but I'm
>>>> not sure how useful that is to Beam.
>>>>
>>>> From a test infra point of view, I'm wondering more about tracking of
>>>> performance over time, identifying regressions, etc.
>>>>
>>>> Google has some tools like PerfKit
>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
>>>> basically a skin on a database + some scripts to load and query data;
>>>>
>>> but I
>>>
>>>> don't love it. Do other Apache projects do public, long-term
>>>> benchmarking
>>>> and performance regression testing?
>>>>
>>>> Dan
>>>>
>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson 
>>>> wrote:
>>>>
>>>> I found data Artisan's benchmarking post
>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
>>>>> exactly-once-stream-processing-with-apache-flink/>.
>>>>> They also shared the code <https://github.com/dataArtisans/performance
>>>>>
>>>> .
>>>> I
>>>>
>>>>> didn't dig in much, but they did a wide range of algorithms. They have
>>>>>
>>>> the
>>>>
>>>>> native code, so you write the Beam code and check against the native
>>>>> performance.
>>>>>
>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
>>>>> 
>>>>> wrote:
>>>>>
>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
>>>>>>
>>>>> under
>>>>
>>>>> Beam.I can share my experience. Can you list items of interest to
>>>>>>
>>>>> know
>>>
>>>> so I
>>>>>
>>>>>> can answer them to the best of my knowledge.Cheers
>>>>>>
>>>>>>   From: Jason Kuster 
>>>>>>  To: dev@beam.incubator.apache.org
>>>>>>  Sent: Monday, October 17, 2016 5:06 PM
>>>>>>  Subject: Exploring Performance Testing
>>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> Now that we've covered some of the initial ground with regard to
>>>>>> correctness testing, I'm going to be starting work on performance
>>>>>>
>>>>> testing
>>>>
>>>>> and benchmarking. I wanted to reach out and see what people's
>>>>>>
>>>>> experiences
>>>>
>>>>> have been with performance testing and benchmarking
>>>>>> frameworks, particularly in other Apache projects. Anyone have any
>>>>>> experience or thoughts?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jason
>>>>>>
>>>>>> --
>>>>>> ---
>>>>>> Jason Kuster
>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Exploring Performance Testing

2016-10-18 Thread Jean-Baptiste Onofré

It sounds like a good idea to me.

Regards
JB

On 10/18/2016 08:08 PM, Amit Sela wrote:

@Jesse how about runners "tracing" the constructed DAG (by Beam) so that
it's clear what the runner actually executed ?

Example:
For the SparkRunner, a ParDo translates to a mapPartitions transformation.

That could provide transparency when debugging/benchmarking pipelines
per-runner.

On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson 
wrote:


@Dan before starting with Beam, I'd want to know how much performance I've
giving up by not programming directly to the API.

On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin 


wrote:


I think there are lots of excellent one-off performance studies, but I'm
not sure how useful that is to Beam.

From a test infra point of view, I'm wondering more about tracking of
performance over time, identifying regressions, etc.

Google has some tools like PerfKit
<https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
basically a skin on a database + some scripts to load and query data;

but I

don't love it. Do other Apache projects do public, long-term benchmarking
and performance regression testing?

Dan

On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson 
wrote:


I found data Artisan's benchmarking post
<http://data-artisans.com/high-throughput-low-latency-and-
exactly-once-stream-processing-with-apache-flink/>.
They also shared the code <https://github.com/dataArtisans/performance

.
I

didn't dig in much, but they did a wide range of algorithms. They have

the

native code, so you write the Beam code and check against the native
performance.

On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari

wrote:


Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)

under

Beam.I can share my experience. Can you list items of interest to

know

so I

can answer them to the best of my knowledge.Cheers

  From: Jason Kuster 
 To: dev@beam.incubator.apache.org
 Sent: Monday, October 17, 2016 5:06 PM
 Subject: Exploring Performance Testing

Hey all,

Now that we've covered some of the initial ground with regard to
correctness testing, I'm going to be starting work on performance

testing

and benchmarking. I wanted to reach out and see what people's

experiences

have been with performance testing and benchmarking
frameworks, particularly in other Apache projects. Anyone have any
experience or thoughts?

Best,

Jason

--
---
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow













--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Exploring Performance Testing

2016-10-18 Thread Amit Sela
@Jesse how about runners "tracing" the constructed DAG (by Beam) so that
it's clear what the runner actually executed ?

Example:
For the SparkRunner, a ParDo translates to a mapPartitions transformation.

That could provide transparency when debugging/benchmarking pipelines
per-runner.

On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson 
wrote:

> @Dan before starting with Beam, I'd want to know how much performance I've
> giving up by not programming directly to the API.
>
> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin  >
> wrote:
>
> > I think there are lots of excellent one-off performance studies, but I'm
> > not sure how useful that is to Beam.
> >
> > From a test infra point of view, I'm wondering more about tracking of
> > performance over time, identifying regressions, etc.
> >
> > Google has some tools like PerfKit
> > <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> > basically a skin on a database + some scripts to load and query data;
> but I
> > don't love it. Do other Apache projects do public, long-term benchmarking
> > and performance regression testing?
> >
> > Dan
> >
> > On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson 
> > wrote:
> >
> > > I found data Artisan's benchmarking post
> > > <http://data-artisans.com/high-throughput-low-latency-and-
> > > exactly-once-stream-processing-with-apache-flink/>.
> > > They also shared the code <https://github.com/dataArtisans/performance
> >.
> > I
> > > didn't dig in much, but they did a wide range of algorithms. They have
> > the
> > > native code, so you write the Beam code and check against the native
> > > performance.
> > >
> > > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> > > 
> > > wrote:
> > >
> > > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
> > under
> > > > Beam.I can share my experience. Can you list items of interest to
> know
> > > so I
> > > > can answer them to the best of my knowledge.Cheers
> > > >
> > > >   From: Jason Kuster 
> > > >  To: dev@beam.incubator.apache.org
> > > >  Sent: Monday, October 17, 2016 5:06 PM
> > > >  Subject: Exploring Performance Testing
> > > >
> > > > Hey all,
> > > >
> > > > Now that we've covered some of the initial ground with regard to
> > > > correctness testing, I'm going to be starting work on performance
> > testing
> > > > and benchmarking. I wanted to reach out and see what people's
> > experiences
> > > > have been with performance testing and benchmarking
> > > > frameworks, particularly in other Apache projects. Anyone have any
> > > > experience or thoughts?
> > > >
> > > > Best,
> > > >
> > > > Jason
> > > >
> > > > --
> > > > ---
> > > > Jason Kuster
> > > > Apache Beam (Incubating) / Google Cloud Dataflow
> > > >
> > > >
> > > >
> > >
> >
>


Re: Exploring Performance Testing

2016-10-18 Thread Jesse Anderson
@Dan before starting with Beam, I'd want to know how much performance I've
giving up by not programming directly to the API.

On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin 
wrote:

> I think there are lots of excellent one-off performance studies, but I'm
> not sure how useful that is to Beam.
>
> From a test infra point of view, I'm wondering more about tracking of
> performance over time, identifying regressions, etc.
>
> Google has some tools like PerfKit
> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> basically a skin on a database + some scripts to load and query data; but I
> don't love it. Do other Apache projects do public, long-term benchmarking
> and performance regression testing?
>
> Dan
>
> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson 
> wrote:
>
> > I found data Artisan's benchmarking post
> > <http://data-artisans.com/high-throughput-low-latency-and-
> > exactly-once-stream-processing-with-apache-flink/>.
> > They also shared the code <https://github.com/dataArtisans/performance>.
> I
> > didn't dig in much, but they did a wide range of algorithms. They have
> the
> > native code, so you write the Beam code and check against the native
> > performance.
> >
> > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> > 
> > wrote:
> >
> > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
> under
> > > Beam.I can share my experience. Can you list items of interest to know
> > so I
> > > can answer them to the best of my knowledge.Cheers
> > >
> > >   From: Jason Kuster 
> > >  To: dev@beam.incubator.apache.org
> > >  Sent: Monday, October 17, 2016 5:06 PM
> > >  Subject: Exploring Performance Testing
> > >
> > > Hey all,
> > >
> > > Now that we've covered some of the initial ground with regard to
> > > correctness testing, I'm going to be starting work on performance
> testing
> > > and benchmarking. I wanted to reach out and see what people's
> experiences
> > > have been with performance testing and benchmarking
> > > frameworks, particularly in other Apache projects. Anyone have any
> > > experience or thoughts?
> > >
> > > Best,
> > >
> > > Jason
> > >
> > > --
> > > ---
> > > Jason Kuster
> > > Apache Beam (Incubating) / Google Cloud Dataflow
> > >
> > >
> > >
> >
>


Re: Exploring Performance Testing

2016-10-18 Thread Dan Halperin
I think there are lots of excellent one-off performance studies, but I'm
not sure how useful that is to Beam.

>From a test infra point of view, I'm wondering more about tracking of
performance over time, identifying regressions, etc.

Google has some tools like PerfKit
<https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
basically a skin on a database + some scripts to load and query data; but I
don't love it. Do other Apache projects do public, long-term benchmarking
and performance regression testing?

Dan

On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson 
wrote:

> I found data Artisan's benchmarking post
> <http://data-artisans.com/high-throughput-low-latency-and-
> exactly-once-stream-processing-with-apache-flink/>.
> They also shared the code <https://github.com/dataArtisans/performance>. I
> didn't dig in much, but they did a wide range of algorithms. They have the
> native code, so you write the Beam code and check against the native
> performance.
>
> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> 
> wrote:
>
> > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under
> > Beam.I can share my experience. Can you list items of interest to know
> so I
> > can answer them to the best of my knowledge.Cheers
> >
> >   From: Jason Kuster 
> >  To: dev@beam.incubator.apache.org
> >  Sent: Monday, October 17, 2016 5:06 PM
> >  Subject: Exploring Performance Testing
> >
> > Hey all,
> >
> > Now that we've covered some of the initial ground with regard to
> > correctness testing, I'm going to be starting work on performance testing
> > and benchmarking. I wanted to reach out and see what people's experiences
> > have been with performance testing and benchmarking
> > frameworks, particularly in other Apache projects. Anyone have any
> > experience or thoughts?
> >
> > Best,
> >
> > Jason
> >
> > --
> > ---
> > Jason Kuster
> > Apache Beam (Incubating) / Google Cloud Dataflow
> >
> >
> >
>


Re: Exploring Performance Testing

2016-10-18 Thread Jesse Anderson
I found data Artisan's benchmarking post
<http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/>.
They also shared the code <https://github.com/dataArtisans/performance>. I
didn't dig in much, but they did a wide range of algorithms. They have the
native code, so you write the Beam code and check against the native
performance.

On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari 
wrote:

> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under
> Beam.I can share my experience. Can you list items of interest to know so I
> can answer them to the best of my knowledge.Cheers
>
>   From: Jason Kuster 
>  To: dev@beam.incubator.apache.org
>  Sent: Monday, October 17, 2016 5:06 PM
>  Subject: Exploring Performance Testing
>
> Hey all,
>
> Now that we've covered some of the initial ground with regard to
> correctness testing, I'm going to be starting work on performance testing
> and benchmarking. I wanted to reach out and see what people's experiences
> have been with performance testing and benchmarking
> frameworks, particularly in other Apache projects. Anyone have any
> experience or thoughts?
>
> Best,
>
> Jason
>
> --
> ---
> Jason Kuster
> Apache Beam (Incubating) / Google Cloud Dataflow
>
>
>


Re: Exploring Performance Testing

2016-10-17 Thread amir bahmanyari
Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under Beam.I 
can share my experience. Can you list items of interest to know so I can answer 
them to the best of my knowledge.Cheers

  From: Jason Kuster 
 To: dev@beam.incubator.apache.org 
 Sent: Monday, October 17, 2016 5:06 PM
 Subject: Exploring Performance Testing
   
Hey all,

Now that we've covered some of the initial ground with regard to
correctness testing, I'm going to be starting work on performance testing
and benchmarking. I wanted to reach out and see what people's experiences
have been with performance testing and benchmarking
frameworks, particularly in other Apache projects. Anyone have any
experience or thoughts?

Best,

Jason

-- 
---
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow


   

Exploring Performance Testing

2016-10-17 Thread Jason Kuster
Hey all,

Now that we've covered some of the initial ground with regard to
correctness testing, I'm going to be starting work on performance testing
and benchmarking. I wanted to reach out and see what people's experiences
have been with performance testing and benchmarking
frameworks, particularly in other Apache projects. Anyone have any
experience or thoughts?

Best,

Jason

-- 
---
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow