Re: Exploring Performance Testing
pe we can use/refine to > evaluate > > the > > runners. > > > > Ismaël Mejía > > > > > > On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré > > wrote: > > > >> It sounds like a good idea to me. > >> > >> Regards > >> JB > >> > >> > >> On 10/18/2016 08:08 PM, Amit Sela wrote: > >> > >>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so > that > >>> it's clear what the runner actually executed ? > >>> > >>> Example: > >>> For the SparkRunner, a ParDo translates to a mapPartitions > >>> transformation. > >>> > >>> That could provide transparency when debugging/benchmarking pipelines > >>> per-runner. > >>> > >>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson > >>> wrote: > >>> > >>> @Dan before starting with Beam, I'd want to know how much performance > >>>> I've > >>>> giving up by not programming directly to the API. > >>>> > >>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin > >>>> >>>> > >>>>> > >>>>> wrote: > >>>> > >>>> I think there are lots of excellent one-off performance studies, but > I'm > >>>>> not sure how useful that is to Beam. > >>>>> > >>>>> From a test infra point of view, I'm wondering more about tracking of > >>>>> performance over time, identifying regressions, etc. > >>>>> > >>>>> Google has some tools like PerfKit > >>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is > >>>>> basically a skin on a database + some scripts to load and query data; > >>>>> > >>>> but I > >>>> > >>>>> don't love it. Do other Apache projects do public, long-term > >>>>> benchmarking > >>>>> and performance regression testing? > >>>>> > >>>>> Dan > >>>>> > >>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson < > je...@smokinghand.com > >>>>> > > >>>>> wrote: > >>>>> > >>>>> I found data Artisan's benchmarking post > >>>>>> <http://data-artisans.com/high-throughput-low-latency-and- > >>>>>> exactly-once-stream-processing-with-apache-flink/>. > >>>>>> They also shared the code <https://github.com/dataArtisa > >>>>>> ns/performance > >>>>>> > >>>>> . > >>>>> I > >>>>> > >>>>>> didn't dig in much, but they did a wide range of algorithms. They > have > >>>>>> > >>>>> the > >>>>> > >>>>>> native code, so you write the Beam code and check against the native > >>>>>> performance. > >>>>>> > >>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) > >>>>>>> > >>>>>> under > >>>>> > >>>>>> Beam.I can share my experience. Can you list items of interest to > >>>>>>> > >>>>>> know > >>>> > >>>>> so I > >>>>>> > >>>>>>> can answer them to the best of my knowledge.Cheers > >>>>>>> > >>>>>>> From: Jason Kuster > >>>>>>> To: dev@beam.incubator.apache.org > >>>>>>> Sent: Monday, October 17, 2016 5:06 PM > >>>>>>> Subject: Exploring Performance Testing > >>>>>>> > >>>>>>> Hey all, > >>>>>>> > >>>>>>> Now that we've covered some of the initial ground with regard to > >>>>>>> correctness testing, I'm going to be starting work on performance > >>>>>>> > >>>>>> testing > >>>>> > >>>>>> and benchmarking. I wanted to reach out and see what people's > >>>>>>> > >>>>>> experiences > >>>>> > >>>>>> have been with performance testing and benchmarking > >>>>>>> frameworks, particularly in other Apache projects. Anyone have any > >>>>>>> experience or thoughts? > >>>>>>> > >>>>>>> Best, > >>>>>>> > >>>>>>> Jason > >>>>>>> > >>>>>>> -- > >>>>>>> --- > >>>>>>> Jason Kuster > >>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> -- > >> Jean-Baptiste Onofré > >> jbono...@apache.org > >> http://blog.nanthrax.net > >> Talend - http://www.talend.com > >> > > > > >
Re: Exploring Performance Testing
FYI, there was a PR which was outstanding which was about adding the Nexmark suite: https://github.com/apache/incubator-beam/pull/366 On Tue, Oct 18, 2016 at 1:12 PM, Ismaël Mejía wrote: > @Jason, Just some additional refs for ideas, since I already researched a > little > bit about how people evaluated this in other Apache projects. > > Yahoo published one benchmarking analysis in different streaming frameworks > like > a year ago: > https://github.com/yahoo/streaming-benchmarks > > And the flink guys extended it: > https://github.com/dataArtisans/yahoo-streaming-benchmark > > Notice that the common approach comes from the classical database world, > and it > is to take one of the TPC queries suites (TPC-H or TPC-DS) and evaluate a > data > processing framework against it, Spark does this to evaluate their SQL > performance. > > https://github.com/databricks/spark-sql-perf > > However this approach is not 100% aligned with Beam because AFAIK there is > not a > TPC suite for continuous processing, that's the reason why I found the > NexMark > suite as a more appropriate example. > > > On Tue, Oct 18, 2016 at 9:50 PM, Ismaël Mejía wrote: > > > Hello, > > > > Now that we are discussing about the subject of performance testing, I > > want to > > jump into the conversation to remind everybody that we have a really > > interesting > > benchmarking suite already contributed by google that has (sadly) not > been > > merged yet. > > > > https://github.com/apache/incubator-beam/pull/366 > > https://issues.apache.org/jira/browse/BEAM-160 > > > > This is not exactly the kind of benchmark of the current discussion, but > > for me > > is a super valuable contribution that I hope we can use/refine to > evaluate > > the > > runners. > > > > Ismaël Mejía > > > > > > On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré > > wrote: > > > >> It sounds like a good idea to me. > >> > >> Regards > >> JB > >> > >> > >> On 10/18/2016 08:08 PM, Amit Sela wrote: > >> > >>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so > that > >>> it's clear what the runner actually executed ? > >>> > >>> Example: > >>> For the SparkRunner, a ParDo translates to a mapPartitions > >>> transformation. > >>> > >>> That could provide transparency when debugging/benchmarking pipelines > >>> per-runner. > >>> > >>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson > >>> wrote: > >>> > >>> @Dan before starting with Beam, I'd want to know how much performance > >>>> I've > >>>> giving up by not programming directly to the API. > >>>> > >>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin > >>>> >>>> > >>>>> > >>>>> wrote: > >>>> > >>>> I think there are lots of excellent one-off performance studies, but > I'm > >>>>> not sure how useful that is to Beam. > >>>>> > >>>>> From a test infra point of view, I'm wondering more about tracking of > >>>>> performance over time, identifying regressions, etc. > >>>>> > >>>>> Google has some tools like PerfKit > >>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is > >>>>> basically a skin on a database + some scripts to load and query data; > >>>>> > >>>> but I > >>>> > >>>>> don't love it. Do other Apache projects do public, long-term > >>>>> benchmarking > >>>>> and performance regression testing? > >>>>> > >>>>> Dan > >>>>> > >>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson < > je...@smokinghand.com > >>>>> > > >>>>> wrote: > >>>>> > >>>>> I found data Artisan's benchmarking post > >>>>>> <http://data-artisans.com/high-throughput-low-latency-and- > >>>>>> exactly-once-stream-processing-with-apache-flink/>. > >>>>>> They also shared the code <https://github.com/dataArtisa > >>>>>> ns/performance > >>>>>> > >>>>> . > >>>>> I > >>>>
Re: Exploring Performance Testing
@Jason, Just some additional refs for ideas, since I already researched a little bit about how people evaluated this in other Apache projects. Yahoo published one benchmarking analysis in different streaming frameworks like a year ago: https://github.com/yahoo/streaming-benchmarks And the flink guys extended it: https://github.com/dataArtisans/yahoo-streaming-benchmark Notice that the common approach comes from the classical database world, and it is to take one of the TPC queries suites (TPC-H or TPC-DS) and evaluate a data processing framework against it, Spark does this to evaluate their SQL performance. https://github.com/databricks/spark-sql-perf However this approach is not 100% aligned with Beam because AFAIK there is not a TPC suite for continuous processing, that's the reason why I found the NexMark suite as a more appropriate example. On Tue, Oct 18, 2016 at 9:50 PM, Ismaël Mejía wrote: > Hello, > > Now that we are discussing about the subject of performance testing, I > want to > jump into the conversation to remind everybody that we have a really > interesting > benchmarking suite already contributed by google that has (sadly) not been > merged yet. > > https://github.com/apache/incubator-beam/pull/366 > https://issues.apache.org/jira/browse/BEAM-160 > > This is not exactly the kind of benchmark of the current discussion, but > for me > is a super valuable contribution that I hope we can use/refine to evaluate > the > runners. > > Ismaël Mejía > > > On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré > wrote: > >> It sounds like a good idea to me. >> >> Regards >> JB >> >> >> On 10/18/2016 08:08 PM, Amit Sela wrote: >> >>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so that >>> it's clear what the runner actually executed ? >>> >>> Example: >>> For the SparkRunner, a ParDo translates to a mapPartitions >>> transformation. >>> >>> That could provide transparency when debugging/benchmarking pipelines >>> per-runner. >>> >>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson >>> wrote: >>> >>> @Dan before starting with Beam, I'd want to know how much performance >>>> I've >>>> giving up by not programming directly to the API. >>>> >>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin >>>> >>> >>>>> >>>>> wrote: >>>> >>>> I think there are lots of excellent one-off performance studies, but I'm >>>>> not sure how useful that is to Beam. >>>>> >>>>> From a test infra point of view, I'm wondering more about tracking of >>>>> performance over time, identifying regressions, etc. >>>>> >>>>> Google has some tools like PerfKit >>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is >>>>> basically a skin on a database + some scripts to load and query data; >>>>> >>>> but I >>>> >>>>> don't love it. Do other Apache projects do public, long-term >>>>> benchmarking >>>>> and performance regression testing? >>>>> >>>>> Dan >>>>> >>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson >>>> > >>>>> wrote: >>>>> >>>>> I found data Artisan's benchmarking post >>>>>> <http://data-artisans.com/high-throughput-low-latency-and- >>>>>> exactly-once-stream-processing-with-apache-flink/>. >>>>>> They also shared the code <https://github.com/dataArtisa >>>>>> ns/performance >>>>>> >>>>> . >>>>> I >>>>> >>>>>> didn't dig in much, but they did a wide range of algorithms. They have >>>>>> >>>>> the >>>>> >>>>>> native code, so you write the Beam code and check against the native >>>>>> performance. >>>>>> >>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari >>>>>> >>>>>> wrote: >>>>>> >>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) >>>>>>> >>>>>> under >>>>> >>>>>> Beam.I can share my experience. Can you list items of interest to >>>>>>> >>>>>> know >>>> >>>>> so I >>>>>> >>>>>>> can answer them to the best of my knowledge.Cheers >>>>>>> >>>>>>> From: Jason Kuster >>>>>>> To: dev@beam.incubator.apache.org >>>>>>> Sent: Monday, October 17, 2016 5:06 PM >>>>>>> Subject: Exploring Performance Testing >>>>>>> >>>>>>> Hey all, >>>>>>> >>>>>>> Now that we've covered some of the initial ground with regard to >>>>>>> correctness testing, I'm going to be starting work on performance >>>>>>> >>>>>> testing >>>>> >>>>>> and benchmarking. I wanted to reach out and see what people's >>>>>>> >>>>>> experiences >>>>> >>>>>> have been with performance testing and benchmarking >>>>>>> frameworks, particularly in other Apache projects. Anyone have any >>>>>>> experience or thoughts? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Jason >>>>>>> >>>>>>> -- >>>>>>> --- >>>>>>> Jason Kuster >>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> > >
Re: Exploring Performance Testing
Hello, Now that we are discussing about the subject of performance testing, I want to jump into the conversation to remind everybody that we have a really interesting benchmarking suite already contributed by google that has (sadly) not been merged yet. https://github.com/apache/incubator-beam/pull/366 https://issues.apache.org/jira/browse/BEAM-160 This is not exactly the kind of benchmark of the current discussion, but for me is a super valuable contribution that I hope we can use/refine to evaluate the runners. Ismaël Mejía On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré wrote: > It sounds like a good idea to me. > > Regards > JB > > > On 10/18/2016 08:08 PM, Amit Sela wrote: > >> @Jesse how about runners "tracing" the constructed DAG (by Beam) so that >> it's clear what the runner actually executed ? >> >> Example: >> For the SparkRunner, a ParDo translates to a mapPartitions transformation. >> >> That could provide transparency when debugging/benchmarking pipelines >> per-runner. >> >> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson >> wrote: >> >> @Dan before starting with Beam, I'd want to know how much performance I've >>> giving up by not programming directly to the API. >>> >>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin >>> >> >>>> >>>> wrote: >>> >>> I think there are lots of excellent one-off performance studies, but I'm >>>> not sure how useful that is to Beam. >>>> >>>> From a test infra point of view, I'm wondering more about tracking of >>>> performance over time, identifying regressions, etc. >>>> >>>> Google has some tools like PerfKit >>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is >>>> basically a skin on a database + some scripts to load and query data; >>>> >>> but I >>> >>>> don't love it. Do other Apache projects do public, long-term >>>> benchmarking >>>> and performance regression testing? >>>> >>>> Dan >>>> >>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson >>>> wrote: >>>> >>>> I found data Artisan's benchmarking post >>>>> <http://data-artisans.com/high-throughput-low-latency-and- >>>>> exactly-once-stream-processing-with-apache-flink/>. >>>>> They also shared the code <https://github.com/dataArtisans/performance >>>>> >>>> . >>>> I >>>> >>>>> didn't dig in much, but they did a wide range of algorithms. They have >>>>> >>>> the >>>> >>>>> native code, so you write the Beam code and check against the native >>>>> performance. >>>>> >>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari >>>>> >>>>> wrote: >>>>> >>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) >>>>>> >>>>> under >>>> >>>>> Beam.I can share my experience. Can you list items of interest to >>>>>> >>>>> know >>> >>>> so I >>>>> >>>>>> can answer them to the best of my knowledge.Cheers >>>>>> >>>>>> From: Jason Kuster >>>>>> To: dev@beam.incubator.apache.org >>>>>> Sent: Monday, October 17, 2016 5:06 PM >>>>>> Subject: Exploring Performance Testing >>>>>> >>>>>> Hey all, >>>>>> >>>>>> Now that we've covered some of the initial ground with regard to >>>>>> correctness testing, I'm going to be starting work on performance >>>>>> >>>>> testing >>>> >>>>> and benchmarking. I wanted to reach out and see what people's >>>>>> >>>>> experiences >>>> >>>>> have been with performance testing and benchmarking >>>>>> frameworks, particularly in other Apache projects. Anyone have any >>>>>> experience or thoughts? >>>>>> >>>>>> Best, >>>>>> >>>>>> Jason >>>>>> >>>>>> -- >>>>>> --- >>>>>> Jason Kuster >>>>>> Apache Beam (Incubating) / Google Cloud Dataflow >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >
Re: Exploring Performance Testing
It sounds like a good idea to me. Regards JB On 10/18/2016 08:08 PM, Amit Sela wrote: @Jesse how about runners "tracing" the constructed DAG (by Beam) so that it's clear what the runner actually executed ? Example: For the SparkRunner, a ParDo translates to a mapPartitions transformation. That could provide transparency when debugging/benchmarking pipelines per-runner. On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson wrote: @Dan before starting with Beam, I'd want to know how much performance I've giving up by not programming directly to the API. On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin wrote: I think there are lots of excellent one-off performance studies, but I'm not sure how useful that is to Beam. From a test infra point of view, I'm wondering more about tracking of performance over time, identifying regressions, etc. Google has some tools like PerfKit <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is basically a skin on a database + some scripts to load and query data; but I don't love it. Do other Apache projects do public, long-term benchmarking and performance regression testing? Dan On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson wrote: I found data Artisan's benchmarking post <http://data-artisans.com/high-throughput-low-latency-and- exactly-once-stream-processing-with-apache-flink/>. They also shared the code <https://github.com/dataArtisans/performance . I didn't dig in much, but they did a wide range of algorithms. They have the native code, so you write the Beam code and check against the native performance. On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari wrote: Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under Beam.I can share my experience. Can you list items of interest to know so I can answer them to the best of my knowledge.Cheers From: Jason Kuster To: dev@beam.incubator.apache.org Sent: Monday, October 17, 2016 5:06 PM Subject: Exploring Performance Testing Hey all, Now that we've covered some of the initial ground with regard to correctness testing, I'm going to be starting work on performance testing and benchmarking. I wanted to reach out and see what people's experiences have been with performance testing and benchmarking frameworks, particularly in other Apache projects. Anyone have any experience or thoughts? Best, Jason -- --- Jason Kuster Apache Beam (Incubating) / Google Cloud Dataflow -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: Exploring Performance Testing
@Jesse how about runners "tracing" the constructed DAG (by Beam) so that it's clear what the runner actually executed ? Example: For the SparkRunner, a ParDo translates to a mapPartitions transformation. That could provide transparency when debugging/benchmarking pipelines per-runner. On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson wrote: > @Dan before starting with Beam, I'd want to know how much performance I've > giving up by not programming directly to the API. > > On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin > > wrote: > > > I think there are lots of excellent one-off performance studies, but I'm > > not sure how useful that is to Beam. > > > > From a test infra point of view, I'm wondering more about tracking of > > performance over time, identifying regressions, etc. > > > > Google has some tools like PerfKit > > <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is > > basically a skin on a database + some scripts to load and query data; > but I > > don't love it. Do other Apache projects do public, long-term benchmarking > > and performance regression testing? > > > > Dan > > > > On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson > > wrote: > > > > > I found data Artisan's benchmarking post > > > <http://data-artisans.com/high-throughput-low-latency-and- > > > exactly-once-stream-processing-with-apache-flink/>. > > > They also shared the code <https://github.com/dataArtisans/performance > >. > > I > > > didn't dig in much, but they did a wide range of algorithms. They have > > the > > > native code, so you write the Beam code and check against the native > > > performance. > > > > > > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari > > > > > > wrote: > > > > > > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) > > under > > > > Beam.I can share my experience. Can you list items of interest to > know > > > so I > > > > can answer them to the best of my knowledge.Cheers > > > > > > > > From: Jason Kuster > > > > To: dev@beam.incubator.apache.org > > > > Sent: Monday, October 17, 2016 5:06 PM > > > > Subject: Exploring Performance Testing > > > > > > > > Hey all, > > > > > > > > Now that we've covered some of the initial ground with regard to > > > > correctness testing, I'm going to be starting work on performance > > testing > > > > and benchmarking. I wanted to reach out and see what people's > > experiences > > > > have been with performance testing and benchmarking > > > > frameworks, particularly in other Apache projects. Anyone have any > > > > experience or thoughts? > > > > > > > > Best, > > > > > > > > Jason > > > > > > > > -- > > > > --- > > > > Jason Kuster > > > > Apache Beam (Incubating) / Google Cloud Dataflow > > > > > > > > > > > > > > > > > >
Re: Exploring Performance Testing
@Dan before starting with Beam, I'd want to know how much performance I've giving up by not programming directly to the API. On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin wrote: > I think there are lots of excellent one-off performance studies, but I'm > not sure how useful that is to Beam. > > From a test infra point of view, I'm wondering more about tracking of > performance over time, identifying regressions, etc. > > Google has some tools like PerfKit > <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is > basically a skin on a database + some scripts to load and query data; but I > don't love it. Do other Apache projects do public, long-term benchmarking > and performance regression testing? > > Dan > > On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson > wrote: > > > I found data Artisan's benchmarking post > > <http://data-artisans.com/high-throughput-low-latency-and- > > exactly-once-stream-processing-with-apache-flink/>. > > They also shared the code <https://github.com/dataArtisans/performance>. > I > > didn't dig in much, but they did a wide range of algorithms. They have > the > > native code, so you write the Beam code and check against the native > > performance. > > > > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari > > > > wrote: > > > > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) > under > > > Beam.I can share my experience. Can you list items of interest to know > > so I > > > can answer them to the best of my knowledge.Cheers > > > > > > From: Jason Kuster > > > To: dev@beam.incubator.apache.org > > > Sent: Monday, October 17, 2016 5:06 PM > > > Subject: Exploring Performance Testing > > > > > > Hey all, > > > > > > Now that we've covered some of the initial ground with regard to > > > correctness testing, I'm going to be starting work on performance > testing > > > and benchmarking. I wanted to reach out and see what people's > experiences > > > have been with performance testing and benchmarking > > > frameworks, particularly in other Apache projects. Anyone have any > > > experience or thoughts? > > > > > > Best, > > > > > > Jason > > > > > > -- > > > --- > > > Jason Kuster > > > Apache Beam (Incubating) / Google Cloud Dataflow > > > > > > > > > > > >
Re: Exploring Performance Testing
I think there are lots of excellent one-off performance studies, but I'm not sure how useful that is to Beam. >From a test infra point of view, I'm wondering more about tracking of performance over time, identifying regressions, etc. Google has some tools like PerfKit <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is basically a skin on a database + some scripts to load and query data; but I don't love it. Do other Apache projects do public, long-term benchmarking and performance regression testing? Dan On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson wrote: > I found data Artisan's benchmarking post > <http://data-artisans.com/high-throughput-low-latency-and- > exactly-once-stream-processing-with-apache-flink/>. > They also shared the code <https://github.com/dataArtisans/performance>. I > didn't dig in much, but they did a wide range of algorithms. They have the > native code, so you write the Beam code and check against the native > performance. > > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari > > wrote: > > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under > > Beam.I can share my experience. Can you list items of interest to know > so I > > can answer them to the best of my knowledge.Cheers > > > > From: Jason Kuster > > To: dev@beam.incubator.apache.org > > Sent: Monday, October 17, 2016 5:06 PM > > Subject: Exploring Performance Testing > > > > Hey all, > > > > Now that we've covered some of the initial ground with regard to > > correctness testing, I'm going to be starting work on performance testing > > and benchmarking. I wanted to reach out and see what people's experiences > > have been with performance testing and benchmarking > > frameworks, particularly in other Apache projects. Anyone have any > > experience or thoughts? > > > > Best, > > > > Jason > > > > -- > > --- > > Jason Kuster > > Apache Beam (Incubating) / Google Cloud Dataflow > > > > > > >
Re: Exploring Performance Testing
I found data Artisan's benchmarking post <http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/>. They also shared the code <https://github.com/dataArtisans/performance>. I didn't dig in much, but they did a wide range of algorithms. They have the native code, so you write the Beam code and check against the native performance. On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari wrote: > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under > Beam.I can share my experience. Can you list items of interest to know so I > can answer them to the best of my knowledge.Cheers > > From: Jason Kuster > To: dev@beam.incubator.apache.org > Sent: Monday, October 17, 2016 5:06 PM > Subject: Exploring Performance Testing > > Hey all, > > Now that we've covered some of the initial ground with regard to > correctness testing, I'm going to be starting work on performance testing > and benchmarking. I wanted to reach out and see what people's experiences > have been with performance testing and benchmarking > frameworks, particularly in other Apache projects. Anyone have any > experience or thoughts? > > Best, > > Jason > > -- > --- > Jason Kuster > Apache Beam (Incubating) / Google Cloud Dataflow > > >
Re: Exploring Performance Testing
Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under Beam.I can share my experience. Can you list items of interest to know so I can answer them to the best of my knowledge.Cheers From: Jason Kuster To: dev@beam.incubator.apache.org Sent: Monday, October 17, 2016 5:06 PM Subject: Exploring Performance Testing Hey all, Now that we've covered some of the initial ground with regard to correctness testing, I'm going to be starting work on performance testing and benchmarking. I wanted to reach out and see what people's experiences have been with performance testing and benchmarking frameworks, particularly in other Apache projects. Anyone have any experience or thoughts? Best, Jason -- --- Jason Kuster Apache Beam (Incubating) / Google Cloud Dataflow
Exploring Performance Testing
Hey all, Now that we've covered some of the initial ground with regard to correctness testing, I'm going to be starting work on performance testing and benchmarking. I wanted to reach out and see what people's experiences have been with performance testing and benchmarking frameworks, particularly in other Apache projects. Anyone have any experience or thoughts? Best, Jason -- --- Jason Kuster Apache Beam (Incubating) / Google Cloud Dataflow