Hello PostgreSQL fans,
I would like to introduce myself and the TPC-V benchmark to the PostgreSQL 
community. I would then like to ask the community to help us make the TPC-V 
reference benchmarking kit a success, and establish PostgreSQL as a common DBMS 
used in measuring the performance of enterprise servers.

I am VMware's rep to the TPC, and chair the TPC's virtualization benchmark 
development subcommittee. For those of you who don't know the TPC, it is an 
industry standards consortium, and its benchmarks are the main performance 
tests for enterprise-class database servers. For external (marketing) use, 
these benchmarks are the gold standard of comparing different servers, 
processors, databases, etc. For internal use, they are typically the biggest 
hammers an organization can use for performance stress testing of their 
products. TPC benchmarks are one of the workloads (if not the main workload) 
that processor vendors use to design their products. So the benchmarks are in 
much heavier use internal to companies than there are official disclosures.

TPC-V is a new benchmark under development for virtualized databases. A TPC-V 
configuration has:
- multiple virtual machines running a mix of DSS, OLTP, and business logic apps
- VMs running with throughputs ranging from 10% to 40% of the total system
- load elasticity emulating cloud characteristic: The benchmark maintains a 
constant overall tpsV load level, but the proportion directed to each VM 
changes every 10 minutes

A paper in the TPC Technical Conference track of VLDB 2010 described the 
initial motivation and architecture of TPC-V. A paper that has been accepted to 
the TPC TC track of VLDB 2012 describes in detail the current status of the 
benchmark.

All TPC results up to now have been on commercial databases. The majority of 
active results are on Oracle or Microsoft SQL Server, followed by DB2, Sybase, 
and other players. Again, keep in mind that these benchmarks aren't meant to 
only compare DBMS products. In fact the majority of results are "sponsored" by 
server hardware companies. The server hardware, processor, storage, OS, etc. 
all contribute to the performance. But you can't have a database server 
benchmark results without a good DBMS!

And that's where PostgreSQL comes in. The TPC-V development subcommittee 
followed the usual path of TPC benchmarks by writing a functional 
specification, and looking to TPC members to develop benchmarking kits to 
implement the spec. TPC-V uses the schema and transactions of TPC-E, but the 
transaction mixes and the way the benchmark is run it totally new and 
virtualization-specific. We chose to start from TPC-E to accelerate the 
benchmark development phase: the specification would be easier to write, and 
DBMS vendors could create TPC-V kits starting from their existing TPC-E kits. 
Until now, benchmarking kits for various TPC benchmarks have been typically 
developed by DBMS vendors, and offered to their partners for internal testing 
or disclosures. So our expectation was that one or more DBMS companies that 
owned existing TPC-E benchmarking kits would allocate resources to modify their 
kits to execute the TPC-V transactions, and supply kits to subcommittee members 
for prototyping. This did not happen (let's not get into the internal politics 
of the TPC!!), so the subcommittee moved forward with developing its own 
reference kit. The reference kit has been developed to run on PostgreSQL, and 
we are focusing our development efforts and testing on PostgreSQL.

The reference kit will be a first for the TPC, which until now has only 
published paper functional specifications. This kit will be publically 
available to anyone who wants to run TPC-V, whether for internal testing, 
academic studies, or official publications. Commercial DBMS vendors are allowed 
to develop their own kits and publish with them. Even if commercial DBMS 
vendors decide later on to develop TPC-V kits, we expect official TPC-V 
publications with this reference kit using PostgreSQL, and of course a lot of 
academic use of the kit. I think this will be a boost for the PostgreSQL 
community (correct me if I am wrong!!).

The most frequent question to the TPC is "do you offer a kit to run one of your 
benchmarks?". There will finally be such a kit, and it will run on PGSQL.

But TPC benchmarks is where the big boys play. If we want the reference kit to 
be credible, it has to have good performance. We don't expect it to beat the 
commercial databases, but it has to be in the ballpark. We have started our 
work running the kit in a simple, single-VM, TPC-E type configuration since 
TPC-E is a known animal with official publications available. We have compared 
our performance to Microsoft SQL results published on a similar platform. After 
waving our hands through a number of small differences between the platforms, 
we have calculated a CPU cost of around 3.2ms/transaction for the published MS 
SQL results, versus a measurement of 8.6ms/transaction for PostgreSQL. (TPC 
benchmarks are typically pushed to full CPU utilization. One removes all 
bottlenecks in storage, networking, etc., to achieve the 100% CPU usage. So CPU 
cost/tran is the final decider of performance.) So we need to cut the CPU cost 
of transactions in half to make publications with PostgreSQL comparable to 
commercial databases. It is OK to be slower than MS SQL or Oracle. The 
benchmark running PostgreSQL can still be used to compare the performance of 
servers, processors, and especially, hypervisors under a demanding database 
workload. But the slower we are, the less credible we are.

Sorry for the long post. I will follow up with specific questions next.

Thanks,
Reza Taheri

Reply via email to