Re: [PERFORM] Postgresql in a Virtual Machine
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/25/2013 09:01 PM, Lee Nguyen wrote: Hi, Having attended a few PGCons, I've always heard the remark from a few presenters and attendees that Postgres shouldn't be run inside a VM. That bare metal is the only way to go. [] Hello This was true some years ago. In our experience, this is not true anymore if you are not running a very demanding system that will be a challenge even running on metal. It should work well for most use cases if your infrastructure is configured correctly. This year we have moved all our postgreSQL servers (45+) to a VMware cluster running vSphere 5.1. We are also almost finished moving all our oracle databases to this cluster too. More than 100 virtual servers and some thousands databases are running without problems in our VM environment. In our experience, VMware vSphere 5.1 makes a huge different in IO performance compared to older versions. Our tests against a storage solution connected to vm servers and metal servers last year, did not show any particular difference in performance between them. Some tips: * We use a SAN via Fibre Channel to storage our data. Be sure to have enough active FC channels for your load. Do not even think to use NFS to connect your physical nodes to your SAN. * We are using 10GigE to interconnect the physical nodes in our cluster. This helps a lot when moving VM servers between nodes. * Don't use in production the snapshot functionality in VM clusters. * Don't over provision resources, specially memory. * Use paravirtualized drivers. * As usual, your storage solution will define the limits in performance of your VM cluster. We have gained a lot in flexibility and manageability without losing performance, the benefits in these areas are many when you administrate many servers/databases. regards, - -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAlKUbjcACgkQBhuKQurGihTpHQCeIDkjR/BFM61V2ft72BYd2SBr sowAnRrscNmByay3KL9iicpGUYcb2hv6 =Qvey -END PGP SIGNATURE- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
2013-11-25 21:19 keltezéssel, Heikki Linnakangas írta: On 25.11.2013 22:01, Lee Nguyen wrote: Hi, Having attended a few PGCons, I've always heard the remark from a few presenters and attendees that Postgres shouldn't be run inside a VM. That bare metal is the only way to go. Here at work we were entertaining the idea of running our Postgres database on our VM farm alongside our application vm's. We are planning to run a few Postgres synchronous replication nodes. Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone have any metrics or benchmarks with the latest Postgres? I've also heard people say that they've seen PostgreSQL to perform worse in a VM. In the performance testing that we've done in VMware, though, we haven't seen any big impact. So I guess the answer is that it depends on the specific configuration of CPU, memory, disks and the software. We at Cybertec tested some configurations about 2 months ago. The performance drop is coming from the disk given to the VM guest. When there is a dedicated disk (pass through) given to the VM guest, PostgreSQL runs at a speed of around 98% of the bare metal. When the virtual disk is a disk file on the host machine, we've measured 20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux kernel with EXT4 filesystem (this latter is sure, not IIRC). The effect was observed both under Qemu/KVM and Xen. The virtual disk was not pre-allocated, since it was the default setting, i.e. space savings preferred over speed. The figure might be better with a pre-allocated disk but the filesystem journalling done twice (both in the host and the guest) will have an effect. The PostgreSQL server versions 9.2.x, 9.3beta were tested with pgbench, standalone, without replication. Best regards, Zoltán Böszörményi Synchronous replication is likely going to be the biggest bottleneck by far, unless it's mostly read-only. I don't know if virtualization will have a measurable impact on network latency, which is what matters for synchronous replication. So, I'd suggest that you try it yourself, and see how it performs. And please report back to the list, I'd also love to see some numbers! - Heikki -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
Zoltan, * Boszormenyi Zoltan (z...@cybertec.at) wrote: When the virtual disk is a disk file on the host machine, we've measured 20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux kernel with EXT4 filesystem (this latter is sure, not IIRC). The effect was observed both under Qemu/KVM and Xen. Interesting- that's far worse than I would have expected. Was this test done with paravirtualized drivers? If not, I can certainly understand the terrible performance. Independently of that, I'll add my own 2c that DB people tend to be pretty paranoid and the current round of VM technologies out there have caused more than one person to lose data because fsync wasn't honored all the way down to the disk. This is especially true of 'home-grown' setups, imv, but I'm sure you could configure the commercial offerings to lie to the guest OS too. Of course, there are similar concerns about a SAN or even local RAID cards, but there's a lot more general familiarity and history around those which reduces the risk there (or at least, that's the thought). Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] Postgresql in a Virtual Machine
On 11/26/2013 09:26 AM, Craig James wrote: On 25.11.2013 22:01, Lee Nguyen wrote: Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone have any metrics or benchmarks with the latest Postgres? For those of us with small (a few to a dozen servers), we'd like to get out of server maintenance completely. Can anyone with experience on a cloud VM solution comment? Do the VM solutions provided by the major hosting companies have the same good performance as the VM's that that several have described here? Obviously there's Amazon's new Postgres solution available. What else is out there in the way of instant on solutions with Linux/Postgres/Apache preconfigured systems? Has anyone used them in production? If you want a full stack including Postgres, Heroku might be your best bet. Depends a bit on your application and your workload. And yes, I've used it. Full disclosure: I have done work paid for by Heroku. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
On 11/26/2013 08:51 AM, Boszormenyi Zoltan wrote: 2013-11-25 21:19 keltezéssel, Heikki Linnakangas írta: On 25.11.2013 22:01, Lee Nguyen wrote: Hi, Having attended a few PGCons, I've always heard the remark from a few presenters and attendees that Postgres shouldn't be run inside a VM. That bare metal is the only way to go. Here at work we were entertaining the idea of running our Postgres database on our VM farm alongside our application vm's. We are planning to run a few Postgres synchronous replication nodes. Why shouldn't we run Postgres in a VM? What are the downsides? Does anyone have any metrics or benchmarks with the latest Postgres? I've also heard people say that they've seen PostgreSQL to perform worse in a VM. In the performance testing that we've done in VMware, though, we haven't seen any big impact. So I guess the answer is that it depends on the specific configuration of CPU, memory, disks and the software. We at Cybertec tested some configurations about 2 months ago. The performance drop is coming from the disk given to the VM guest. When there is a dedicated disk (pass through) given to the VM guest, PostgreSQL runs at a speed of around 98% of the bare metal. When the virtual disk is a disk file on the host machine, we've measured 20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux kernel with EXT4 filesystem (this latter is sure, not IIRC). The effect was observed both under Qemu/KVM and Xen. The virtual disk was not pre-allocated, since it was the default setting, i.e. space savings preferred over speed. The figure might be better with a pre-allocated disk but the filesystem journalling done twice (both in the host and the guest) will have an effect. Not-pre-allocated disk-file backed is just about the worst case in my experience. Try pre-allocated VirtIO disks on an LVM volume group - you should get much better performance. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
On 11/26/2013 7:26 AM, Craig James wrote: For those of us with small (a few to a dozen servers), we'd like to get out of server maintenance completely. Can anyone with experience on a cloud VM solution comment? Do the VM solutions provided by the major hosting companies have the same good performance as the VM's that that several have described here? Obviously there's Amazon's new Postgres solution available. What else is out there in the way of instant on solutions with Linux/Postgres/Apache preconfigured systems? Has anyone used them in production? I've done some work with Heroku and the MySQL flavor of AWS service. They work, and are convenient, but there are a couple of issues : 1. Random odd (and bad) things can happen from a performance perspective that you just need to cope with. e.g. I/O will become vastly slower for periods of 10s of seconds, once or twice a day. If you don't like the idea of phenomena like this in your system, beware. 2. Your inability to connect with the bare metal may turn out to be a significant hassle when trying to understand some performance issue in the future. Tricks that we're used to using such as looking at iostat (or even top) output are no longer usable because the hosting company will not give you a login on the host VM. This limitation extends to many many techniques that have been commonly used in the past and can become a major headache to the point where you need to reproduce the system on physical hardware just to understand what's going on with it (been there, done that...) For the reasons above I would caution deploying a production service (today) on a SaaS database service like Heroku or Amazon RDS. Running your own database inside a stock VM might be better, but it can be hard to get the right kind of I/O for that deployment scenario. In the case of self-hosted VMWare or KVM obviously you have much more control and observability. Heroku had (at least when I last used it, a year ago or so) an additional issue in that they host on AWS VMs so if something goes wrong you are talking to one company that is using another company's virtual machine service. Not a recipe for clarity, good service and hair retention... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
On Tue, Nov 26, 2013 at 8:29 AM, David Boreham david_l...@boreham.orgwrote: On 11/26/2013 7:26 AM, Craig James wrote: For those of us with small (a few to a dozen servers), we'd like to get out of server maintenance completely. Can anyone with experience on a cloud VM solution comment? ... I've done some work with Heroku and the MySQL flavor of AWS service. Thanks, I'll check Heroku out. For the reasons above I would caution deploying a production service (today) on a SaaS database service like Heroku or Amazon RDS. Running your own database inside a stock VM might be better, but it can be hard to get the right kind of I/O for that deployment scenario. In the case of self-hosted VMWare or KVM obviously you have much more control and observability. Well, the whole point of switching to a cloud provider is to get out of the business of buying hardware and hauling it down to the co-lo facility. Adding VMWare or KVM is just one more thing we'd have to add to our sysadmin skills. We'd rather focus on our core technology, the stuff we're better at than anyone else. So far I'm impressed by what I've read about Amazon's Postgres instances. Maybe the reality will be disappointing, but (for example) the idea of setting up streaming replication with one click is pretty appealing. Craig
Re: [PERFORM] Postgresql in a Virtual Machine
On 11/25/2013 12:01 PM, Lee Nguyen wrote: Hi, Having attended a few PGCons, I've always heard the remark from a few presenters and attendees that Postgres shouldn't be run inside a VM. That bare metal is the only way to go. This is pretty dated advice. Early VMs had horrible performance under load, which is mostly where this thinking comes from. It's not true anymore. It *is* true that getting good performance in a virtualized environment requires more tuning than bare metal, because you have to tune the VM system as well. Here at work we were entertaining the idea of running our Postgres database on our VM farm alongside our application vm's. We are planning to run a few Postgres synchronous replication nodes. Biggest pitfall here is IO performance configuration. I can't give you specific advice without knowing the platform and the desired workload. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
On Tue, Nov 26, 2013 at 11:31 AM, Josh Berkus j...@agliodbs.com wrote: On 11/25/2013 12:01 PM, Lee Nguyen wrote: Hi, Having attended a few PGCons, I've always heard the remark from a few presenters and attendees that Postgres shouldn't be run inside a VM. That bare metal is the only way to go. This is pretty dated advice. Early VMs had horrible performance under load, which is mostly where this thinking comes from. It's not true anymore. It *is* true that getting good performance in a virtualized environment requires more tuning than bare metal, because you have to tune the VM system as well. Here at work we were entertaining the idea of running our Postgres database on our VM farm alongside our application vm's. We are planning to run a few Postgres synchronous replication nodes. Biggest pitfall here is IO performance configuration. I can't give you specific advice without knowing the platform and the desired workload. Yeah. Seeing things like provisioned iops in the cloud services is a pretty big deal. I do think it's still fairly expensive for what you get but SSDs and competition is going to force prices down quickly over time. For in house virtualized setups, you can get pretty far with SSDs using any number of options (direct attached to the host, iscsi etc, SAN etc). For I/O constrained systems, I don't consider any spindle based systems, in particular SANs, to be a good investment. Curious: I just read your article on iscsi (http://it.toolbox.com/blogs/database-soup/the-problem-with-iscsi-30602). Do you still consider iscsi to be imperformant? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql in a Virtual Machine
On Tue, Nov 26, 2013 at 10:40 AM, Ben Chobot be...@silentmedia.com wrote: On Nov 26, 2013, at 9:24 AM, Craig James wrote: So far I'm impressed by what I've read about Amazon's Postgres instances. Maybe the reality will be disappointing, but (for example) the idea of setting up streaming replication with one click is pretty appealing. Where did you hear this was an option? When we talked to AWS about their Postgres RDS offering, they were pretty clear that (currently) replication is hardware-based, the slave is not live, and you don't get access to the WALs that they use internally for PITR. Changing that is something they want to address, but isn't there today. I was guessing from the description of their High Availability option ... but maybe it uses something like pg-pool, or as you say, maybe they do it at the hardware level. http://aws.amazon.com/rds/postgresql/#High-Availability Multi-AZ Deployments – This deployment option for your production DB Instances enhances database availability while protecting your latest database updates against unplanned outages. When you create or modify your DB Instance to run as a Multi-AZ deployment, Amazon RDS will automatically provision and manage a “standby” replica in a different Availability Zone (independent infrastructure in a physically separate location). Database updates are made concurrently on the primary and standby resources to prevent replication lag. In the event of planned database maintenance, DB Instance failure, or an Availability Zone failure, Amazon RDS will automatically failover to the up-to-date standby so that database operations can resume quickly without administrative intervention. Prior to failover you cannot directly access the standby, and it cannot be used to serve read traffic. Either way, if a cold standby is all you need, it's still a one-click option, lots simpler than setting it up yourself. Craig
Re: [PERFORM] Postgresql in a Virtual Machine
On Tue, Nov 26, 2013 at 11:18:41AM -0800, Craig James wrote: - On Tue, Nov 26, 2013 at 10:40 AM, Ben Chobot be...@silentmedia.com wrote: - - On Nov 26, 2013, at 9:24 AM, Craig James wrote: - - So far I'm impressed by what I've read about Amazon's Postgres instances. - Maybe the reality will be disappointing, but (for example) the idea of - setting up streaming replication with one click is pretty appealing. - - - Where did you hear this was an option? When we talked to AWS about their - Postgres RDS offering, they were pretty clear that (currently) replication - is hardware-based, the slave is not live, and you don't get access to the - WALs that they use internally for PITR. Changing that is something they - want to address, but isn't there today. - - - I was guessing from the description of their High Availability option ... - but maybe it uses something like pg-pool, or as you say, maybe they do it - at the hardware level. - - http://aws.amazon.com/rds/postgresql/#High-Availability - - - Multi-AZ Deployments This deployment option for your production DB - Instances enhances database availability while protecting your latest - database updates against unplanned outages. When you create or modify your - DB Instance to run as a Multi-AZ deployment, Amazon RDS will automatically - provision and manage a standby replica in a different Availability Zone - (independent infrastructure in a physically separate location). Database - updates are made concurrently on the primary and standby resources to - prevent replication lag. In the event of planned database maintenance, DB - Instance failure, or an Availability Zone failure, Amazon RDS will - automatically failover to the up-to-date standby so that database - operations can resume quickly without administrative intervention. Prior to - failover you cannot directly access the standby, and it cannot be used to - serve read traffic. - - Either way, if a cold standby is all you need, it's still a one-click - option, lots simpler than setting it up yourself. - - Craig The Multi-AZ deployments don't expose the replica to you unless there is a failover. (in which case it picks one and promotes it) There is an option for Create Read Replica but it's currently not available so we can assume that will eventually be an option. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance