Preston,
With respect to the benchmark queries, let me suggest the following approach. Send out an email with: - GHCN information content (schema with some English description of each field). - A list of questions in English that represent the interesting questions to ask of that data.
This will give everyone the necessary background to possibly suggest modifications and other interesting queries to ask against the data.
Finally, once there is consensus regarding the queries, we can translate the English version into XQuery.
Thanks, Vinayak On 11/17/13, 12:33 PM, Eldon Carman wrote:
The goal of the benchmark tests are to highlight the parallel aspects of VXQuery. The tests need to show how VXQuery scales. In addition, other queries may be added to highlight our specific speed improvements or where improvements can still be made. At first we want to show how the system works with parallel queries. We focus on three types of queries: filtering, aggregation and nested loops (join). For these three queries the following scaling tests will be completed: scale up and speed up. * Scale up keeps the number of nodes in the cluster constant and increases the data set in each successive test. * Speed up keeps the data set size constant and increases the number of nodes processing the data in each successive test. ( http://en.wikipedia.org/wiki/Speedup) Still working on the specific queries for our GHCN daily data, but you can see the draft version here: https://svn.apache.org/repos/asf/incubator/vxquery/trunk/vxquery/vxquery-benchmark/src/main/resources/noaa-ghcn-daily/queries/
