Re: XML Benchmark Test Scenarios and Queries

Michael Carey Mon, 18 Nov 2013 23:52:10 -0800

A minor nit: This definition of "speed up" is the correct/common use -you increase the number of nodes and hope to see that system getproportionally faster. E.g., 10x the iron makes your query run 10xfaster if you have "perfect speed up". However, this is not acorrect/common use of the term "scale up" - see the classic paper inComm. ACM by DeWitt and Gray on parallel database systems and theirperformance. "Batch scale up" is when you increase both the data setsize AND the system size in tandem - and you hope to keep performanceflat. This means 10x the iron will run 10x the problem size in the sametime as 1x ran the 1x problem This is what it means to provide "perfectbatch scale up". There is also "transaction scale up" - this is whenyou increase the number of concurrent queries and the amount of iron intandem - e.g., 10x the offered load and 10x the iron - and again, a"perfect" result is that the bigger system handles the bigger workloadwith the same performance as the 1x/1x case.

I suggest moving to the common "scale up" notion (and in this case, thecommon "batch scale up" notion). So, as you increase problem size, alsoincrease cluster size, and look for performance to stay flat as you goal.


Cheers,
Mike

On 11/17/13 12:33 PM, Eldon Carman wrote:

The goal of the benchmark tests are to highlight the parallel aspects of
VXQuery. The tests need to show how VXQuery scales. In addition, other
queries may be added to highlight our specific speed improvements or where
improvements can still be made. At first we want to show how the system
works with parallel queries. We focus on three types of queries: filtering,
aggregation and nested loops (join).

For these three queries the following scaling tests will be completed:
scale up and speed up.
  * Scale up keeps the number of nodes in the cluster constant and increases
the data set in each successive test.
  * Speed up keeps the data set size constant and increases the number of
nodes processing the data in each successive test. (
http://en.wikipedia.org/wiki/Speedup)

Still working on the specific queries for our GHCN daily data, but you can
see the draft version here:
https://svn.apache.org/repos/asf/incubator/vxquery/trunk/vxquery/vxquery-benchmark/src/main/resources/noaa-ghcn-daily/queries/

Re: XML Benchmark Test Scenarios and Queries

Reply via email to