[PERFORM] Open request for benchmarking input
Did you folks see this article on Slashdot with a fellow requesting input on what sort of benchmarks to run to get a good Postgresql vs Mysql dataset? Perhaps this would be a good opportunity for us to get some good benchmarking done. Here's the article link and top text: http://ask.slashdot.org/article.pl?sid=05/11/26/0317213 David Lang asks: "With the release of MySQL 5.0, PostgreSQL 8.1, and the flap over Oracle purchasing InnoDB, the age old question of performance is coming up again. I've got some boxes that were purchased for a data warehouse project that isn't going to be installed for a month or two, and could probably squeeze some time in to do some benchmarks on the machines. However, the question is: what should be done that's reasonably fair to both MySQL and PostgreSQL? We all know that careful selection of the benchmark can seriously skew the results, and I want to avoid that (in fact I would consider it close to ideal if the results came out that each database won in some tests). I would also not like to spend time generating the benchmarks only to have the losing side accuse me of being unfair. So, for both MySQL and PostgreSQL advocates, what would you like to see in a series of benchmarks?" "The hardware I have available is as follows: * 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI * 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI * 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA I would prefer to use Debian Sarge as the base install of the systems (with custom built kernels), but would compile the databases from source rather then using binary packages. For my own interests, I would like to at least cover the following bases: 32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type tests (data >> memory); and web prefs test (active data RAM) What specific benchmarks should be run, and what other things should be tested? Where should I go for assistance on tuning each database, evaluating the benchmark results, and re-tuning them?" --- Jeff Frost, Owner <[EMAIL PROTECTED]> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Open request for benchmarking input
"Jeff Frost" <[EMAIL PROTECTED]> wrote > > Did you folks see this article on Slashdot with a fellow requesting input > on what sort of benchmarks to run to get a good Postgresql vs Mysql > dataset? Perhaps this would be a good opportunity for us to get some good > benchmarking done. > "The hardware I have available is as follows: > > * 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI > * 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI > * 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA > I see this as a good chance to evaluate and boost PostgreSQL performance in general. My two concerns: (1) How long will David Lang spend on the benchmarking? We need *continous* feedback after each tuning. This is important and Mark Wong has done great job on this. (2) The hardware configuration may not reflect all potentials of PostgreSQL. For example, so far, PostgreSQL does not pay much attention in reducing I/O cost, so a stronger RAID definitely will benefit PostgreSQL performance. > > For my own interests, I would like to at least cover the following bases: > 32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type > tests (data >> memory); and web prefs test (active data RAM) > Don't forget TPCC (data > memory, with intensive updates). So the benchmarks in my mind include TPCC, TPCH and TPCW. Regards, Qingqing ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Open request for benchmarking input
Jeff, Qingqing, On 11/26/05 10:57 AM, "Qingqing Zhou" <[EMAIL PROTECTED]> wrote: > > "Jeff Frost" <[EMAIL PROTECTED]> wrote >> >> Did you folks see this article on Slashdot with a fellow requesting input >> on what sort of benchmarks to run to get a good Postgresql vs Mysql >> dataset? Perhaps this would be a good opportunity for us to get some good >> benchmarking done. >> "The hardware I have available is as follows: >> >> * 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI >> * 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI >> * 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA >> I suggest specifying a set of basic system / HW benchmarks to baseline the hardware before each benchmark is run. This has proven to be a major issue with most performance tests. My pick for I/O is bonnie++. Your equipment allows you the opportunity to benchmark all 5 machines running together as a cluster - this is important to measure maturity of solutions for high performance warehousing. Greenplum can provide you a license for Bizgres MPP for this purpose. > (2) The hardware configuration may not reflect all potentials of PostgreSQL. > For example, so far, PostgreSQL does not pay much attention in reducing I/O > cost, so a stronger RAID definitely will benefit PostgreSQL performance. The 16x SATA drives should be great, provided you have a high performance RAID adapter configured properly. You should be able to get 800MB/s of sequential scan performance by using a card like the 3Ware 9550SX. I've also heard that the Areca cards are good (how good?). Configuration of the I/O must be validated though - I've seen as low as 25MB/s from a misconfigured system. >> For my own interests, I would like to at least cover the following bases: >> 32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type >> tests (data >> memory); and web prefs test (active data RAM) >> > > Don't forget TPCC (data > memory, with intensive updates). So the benchmarks > in my mind include TPCC, TPCH and TPCW. I agree with Qingqing, though I think the OSTG DBT-3 (very similar to TPC-H) is sufficient for data warehousing. This is a fairly ambitious project - one problem I see is that MySQL may not run all of these benchmarks, particularly the DBT-3. Also - would the rules allow for mixing / matching pluggable features of the DBMS? Innodb versus MyISAM? - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Open request for benchmarking input
At 03:15 PM 11/26/2005, Luke Lonergan wrote: I suggest specifying a set of basic system / HW benchmarks to baseline the hardware before each benchmark is run. This has proven to be a major issue with most performance tests. My pick for I/O is bonnie++. Your equipment allows you the opportunity to benchmark all 5 machines running together as a cluster - this is important to measure maturity of solutions for high performance warehousing. Greenplum can provide you a license for Bizgres MPP for this purpose. ...and detailed config / tuning specs as well for it or everyone is probably wasting their time. For instance, it seems fairly clear that the default 8KB table size and default read ahead size are both pessimal, at least for non OLTP-like apps. In addition, there's been a reasonable amount of evidence that xfs should be the file system of choice for pg. Things like optimal RAID strip size, how to allocate tables to various IO HW, and what levels of RAID to use for each RAID set also have to be defined. The 16x SATA drives should be great, provided you have a high performance RAID adapter configured properly. You should be able to get 800MB/s of sequential scan performance by using a card like the 3Ware 9550SX. I've also heard that the Areca cards are good (how good?). Configuration of the I/O must be validated though - I've seen as low as 25MB/s from a misconfigured system. The Areca cards, particularly with 1-2GB of buffer cache, are the current commodity RAID controller performance leader. Better performance can be gotten out of HW from vendors like Xyratex, but it will cost much more. Ron ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Open request for benchmarking input
Qingqing Zhou wrote: "Jeff Frost" <[EMAIL PROTECTED]> wrote "The hardware I have available is as follows: * 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI * 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI * 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA (2) The hardware configuration may not reflect all potentials of PostgreSQL. These boxes don't look like being designed for a DB server. The first are very CPU bound, and the third may be a good choice for very large amounts of streamed data, but not optimal for TP random access. Hopefully, when publicly visible benchmarks are performed, machines are used that comply with common engineering knowledge, ignoring those guys who still believe that sequential performance is the most important issue on disk subsystems for DBMS. Regards, Andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Open request for benchmarking input
Ok, I've subscribed (hopefully list volume won't kill me :-) I'm covering several things in this message since I didn't receive the prior messages in the thread first off these benchamrks are not being sponsered by my employer, they need the machines burned in and so I'm going to use them for the tests while burning them in. I can spend a little official time on this, justifying it as learning the proper config/tuneing settings for our project, but not too much. and I'm deliberatly not useing my work e-mail and am not mentioning the company name, so please respect this and keep the two seperate (some of you have dealt with us formally, others will over the next few months) this means no remote access for people (but I am willing to run tests and send configs around). in fact the machines will not have Internet access for the duration of the tests. it also means I'm doing almost all the configuration work for this on my own time (nights and weekends). the machines will not be moved to production for a couple of months. this should mean that we can go back and forth with questions and answers (albeit somewhat slowly, with me checking in every night) while whatever tests are done happen during the day. once we get past the system tuneing and start doing different tests it would probably be helpful if people can send me scripts to run that I can just let loose. I don't have any money to pay for benchmark suites, so if things like the TPC benchmarks cost money to do I won't be able to do them to clarify the hardware I have 5 machines total to work with, this includes client machines to make the queries (I may be able to get hold of 2-3 more, but they are similar configs) none of these have dual-core processors on them, the CPU's are 246 or 252 Opterons (I'll have to double check which is in which machine, I think the large disk machine has 246's and the others 252's) I have access to a gig-E switch that's on a fairly idle network to use to connect these machines the large-disk machine has 3ware 9500 series 8-port SATA controllers in them with battery backup. in our official dealings with Greenplum we attempted to do a set of benchmarks on that machine, but had horrible timing with me being too busy when they worked with us on this and we never did figure out the best setting to use for this machine. Part of the reason I posted this to /. rather then just contacting you and MySQL folks directly is that I would like to see a reasonable set of benchmarks agreed to and have people with different hardware then I have run the same sets of tests. I know the tuneing will be different for different hardware, but if we can have a bunch of people run similar tests we should learn a lot. David Lang ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Open request for benchmarking input
by the way, this is the discussion that promped me to start this project http://lwn.net/Articles/161323/ David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Open request for benchmarking input
These boxes don't look like being designed for a DB server. The first are very CPU bound, and the third may be a good choice for very large amounts of streamed data, but not optimal for TP random access. I don't know what you mean when you say that the first ones are CPU bound, they have far more CPU then they do disk I/O however I will agree that they were not designed to be DB servers, they weren't. they happen to be the machines that I have available. they only have a pair of disks each, which would not be reasonable for most production DB uses, and they have far more CPU then is normally reccomended. So I'll have to run raid 0 instead of 0+1 (or not use raid) which would be unacceptable in a production environment, but can still give some useful info. the 5th box _was_ purchased to be a DB server, but one to store and analyse large amounts of log data, so large amounts of data storage were more important then raw DB performance (although we did max out the RAM at 16G to try and make up for it). it was a deliberate price/performance tradeoff. this machine ran ~$20k, but a similar capacity with SCSI drives would have been FAR more expensive (IIRC a multiple of 4x or more more expensive). Hopefully, when publicly visible benchmarks are performed, machines are used that comply with common engineering knowledge, ignoring those guys who still believe that sequential performance is the most important issue on disk subsystems for DBMS. are you saying that I shouldn't do any benchmarks becouse the machines aren't what you would consider good enough? if so I disagree with you and think that benchmarks should be done on even worse machines, but should also be done on better machines. (are you volunteering to provide time on better machines for benchmarks?) not everyone will buy a lot of high-end hardware before they start useing a database. in fact most companies will start with a database on lower end hardware and then as their requirements grow they will move to better hardware. I'm willing to bet that what I have available is better then the starting point for most places. Postgres needs to work on the low end stuff as well as the high end stuff or people will write their app to work with things that DO run on low end hardware and they spend much more money then is needed to scale the hardware up rather then re-writing their app. Part of the reason that I made the post on /. to start this was the hope that a reasonable set of benchmarks could be hammered out and then more people then just me could run them to get a wider range of results. David Lang ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Another thought - I priced out a maxed out machine with 16 cores and 128GB of RAM and 1.5TB of usable disk - $71,000. You could instead buy 8 machines that total 16 cores, 128GB RAM and 28TB of disk for $48,000, and it would be 16 times faster in scan rate, which is the most important factor for large databases. The size would be 16 rack units instead of 5, and you'd have to add a GigE switch for $1500. Scan rate for above SMP: 200MB/s Scan rate for above cluster: 3,200Mb/s You could even go dual core and double the memory on the cluster and you'd about match the price of the "god box". - Luke Luke, I assume you are talking about useing the Greenplum MPP for this (otherwise I don't know how you are combining all the different systems). If you are, then you are overlooking one very significant factor, the cost of the MPP software, at $10/cpu the cluster has an extra $160K in software costs, which is double the hardware costs. if money is no object then go for it, but if it is then you comparison would be (ignoring software maintinance costs) the 16 core 128G ram system vs ~3xsmall systems totaling 6 cores and 48G ram. yes if scan speed is the bottleneck you still win with the small systems, but for most other uses the large system would win easily. and in any case it's not the open and shut case that you keep presenting it as. David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases (
For data warehousing its pretty well open and shut. To use all cpus and io channels on each query you will need mpp. Has anyone done the math.on the original post? 5TB takes how long to scan once? If you want to wait less than a couple of days just for a seq scan, you'd better be in the multi-gb per second range. - Luke -- Sent from my BlackBerry Wireless Device -Original Message- From: [EMAIL PROTECTED] <[EMAIL PROTECTED]> To: pgsql-performance@postgresql.org Sent: Sat Nov 26 13:51:18 2005 Subject: Re: [PERFORM] Hardware/OS recommendations for large databases ( >Another thought - I priced out a maxed out machine with 16 cores and >128GB of RAM and 1.5TB of usable disk - $71,000. > >You could instead buy 8 machines that total 16 cores, 128GB RAM and 28TB >of disk for $48,000, and it would be 16 times faster in scan rate, which >is the most important factor for large databases. The size would be 16 >rack units instead of 5, and you'd have to add a GigE switch for $1500. > >Scan rate for above SMP: 200MB/s > >Scan rate for above cluster: 3,200Mb/s > >You could even go dual core and double the memory on the cluster and >you'd about match the price of the "god box". > >- Luke Luke, I assume you are talking about useing the Greenplum MPP for this (otherwise I don't know how you are combining all the different systems). If you are, then you are overlooking one very significant factor, the cost of the MPP software, at $10/cpu the cluster has an extra $160K in software costs, which is double the hardware costs. if money is no object then go for it, but if it is then you comparison would be (ignoring software maintinance costs) the 16 core 128G ram system vs ~3xsmall systems totaling 6 cores and 48G ram. yes if scan speed is the bottleneck you still win with the small systems, but for most other uses the large system would win easily. and in any case it's not the open and shut case that you keep presenting it as. David Lang ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
On Sun, 27 Nov 2005, Luke Lonergan wrote: For data warehousing its pretty well open and shut. To use all cpus and io channels on each query you will need mpp. Has anyone done the math.on the original post? 5TB takes how long to scan once? If you want to wait less than a couple of days just for a seq scan, you'd better be in the multi-gb per second range. if you truely need to scan the entire database then you are right, however indexes should be able to cut the amount you need to scan drasticly. David Lang ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly