There is an extension in Beam to support TPC-DS benchmark [1] that basically 
runs TPC-DS SQL queries via Beam SQL. Though, I’m not sure if it runs regularly 
and, IIRC (when I took a look on this last time, maybe I’m mistaken), it 
requires some adjustments to run on any other runners than Dataflow. Also, when 
I tried to run it on SparkRunner many queries failed because of different 
reasons [2].

I believe that if we will manage to make it running for most of the queries on 
any runner then it will be a good addition to Nexmark benchmark that we have 
for now since TPC-DS results can be used to compare with other data processing 
systems as well.

[1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
[2] https://issues.apache.org/jira/browse/BEAM-9891

> On 22 Mar 2021, at 18:00, Tao Li <[email protected]> wrote:
> 
> Hi Beam community,
>  
> I am wondering if there is a doc to compare perf of Beam (on Spark) and 
> native spark for batch processing? For example using TPCDS benmark.
>  
> I did find some relevant links like this 
> <https://archive.fosdem.org/2018/schedule/event/nexmark_benchmarking_suite/attachments/slides/2494/export/events/attachments/nexmark_benchmarking_suite/slides/2494/Nexmark_Suite_for_Apache_Beam_(FOSDEM18).pdf>
>  but it’s old and it mostly covers the streaming scenarios.
>  
> Thanks!

Reply via email to