Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Rafael Martinez
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/25/2013 09:01 PM, Lee Nguyen wrote:
 Hi,
 
 Having attended a few PGCons, I've always heard the remark from a 
 few presenters and attendees that Postgres shouldn't be run inside 
 a VM. That bare metal is the only way to go.
 
[]

Hello

This was true some years ago. In our experience, this is not true
anymore if you are not running a very demanding system that will be a
challenge even running on metal. It should work well for most use
cases if your infrastructure is configured correctly.

This year we have moved all our postgreSQL servers (45+) to a VMware
cluster running vSphere 5.1. We are also almost finished moving all
our oracle databases to this cluster too. More than 100 virtual
servers and some thousands databases are running without problems in
our VM environment.

In our experience, VMware vSphere 5.1 makes a huge different in IO
performance compared to older versions. Our tests against a storage
solution connected to vm servers and metal servers last year, did not
show any particular difference in performance between them. Some tips:

* We use a SAN via Fibre Channel to storage our data. Be sure to have
enough active FC channels for your load. Do not even think to use NFS
to connect your physical nodes to your SAN.

* We are using 10GigE to interconnect the physical nodes in our
cluster. This helps a lot when moving VM servers between nodes.

* Don't use in production the snapshot functionality in VM clusters.

* Don't over provision resources, specially memory.

* Use paravirtualized drivers.

* As usual, your storage solution will define the limits in
performance of your VM cluster.

We have gained a lot in flexibility and manageability without losing
performance, the benefits in these areas are many when you
administrate many servers/databases.

regards,
- -- 
 Rafael Martinez Guerrero
 Center for Information Technology
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlKUbjcACgkQBhuKQurGihTpHQCeIDkjR/BFM61V2ft72BYd2SBr
sowAnRrscNmByay3KL9iicpGUYcb2hv6
=Qvey
-END PGP SIGNATURE-


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Boszormenyi Zoltan

2013-11-25 21:19 keltezéssel, Heikki Linnakangas írta:

On 25.11.2013 22:01, Lee Nguyen wrote:

Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM. That
bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres database
on our VM farm alongside our application vm's.  We are planning to run a
few Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM?  What are the downsides? Does anyone
have any metrics or benchmarks with the latest Postgres?


I've also heard people say that they've seen PostgreSQL to perform worse in a VM. In the 
performance testing that we've done in VMware, though, we haven't seen any big impact. 
So I guess the answer is that it depends on the specific configuration of CPU, memory, 
disks and the software.


We at Cybertec tested some configurations about 2 months ago.
The performance drop is coming from the disk given to the VM guest.

When there is a dedicated disk (pass through) given to the VM guest,
PostgreSQL runs at a speed of around 98% of the bare metal.

When the virtual disk is a disk file on the host machine, we've measured
20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux kernel
with EXT4 filesystem (this latter is sure, not IIRC). The effect was observed
both under Qemu/KVM and Xen.

The virtual disk was not pre-allocated, since it was the default setting,
i.e. space savings preferred over speed. The figure might be better with
a pre-allocated disk but the filesystem journalling done twice (both in the
host and the guest) will have an effect.

The PostgreSQL server versions 9.2.x, 9.3beta were tested with pgbench,
standalone, without replication.

Best regards,
Zoltán  Böszörményi

Synchronous replication is likely going to be the biggest bottleneck by far, unless it's 
mostly read-only. I don't know if virtualization will have a measurable impact on 
network latency, which is what matters for synchronous replication.


So, I'd suggest that you try it yourself, and see how it performs. And please report 
back to the list, I'd also love to see some numbers!


- Heikki





--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Stephen Frost
Zoltan,

* Boszormenyi Zoltan (z...@cybertec.at) wrote:
 When the virtual disk is a disk file on the host machine, we've measured
 20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux kernel
 with EXT4 filesystem (this latter is sure, not IIRC). The effect was observed
 both under Qemu/KVM and Xen.

Interesting- that's far worse than I would have expected.  Was this test
done with paravirtualized drivers?  If not, I can certainly understand
the terrible performance.

Independently of that, I'll add my own 2c that DB people tend to be
pretty paranoid and the current round of VM technologies out there have
caused more than one person to lose data because fsync wasn't honored
all the way down to the disk.  This is especially true of 'home-grown'
setups, imv, but I'm sure you could configure the commercial offerings
to lie to the guest OS too.  Of course, there are similar concerns about
a SAN or even local RAID cards, but there's a lot more general
familiarity and history around those which reduces the risk there (or at
least, that's the thought).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Andrew Dunstan


On 11/26/2013 09:26 AM, Craig James wrote:


On 25.11.2013 22:01, Lee Nguyen wrote:


Why shouldn't we run Postgres in a VM?  What are the
downsides? Does anyone
have any metrics or benchmarks with the latest Postgres?


For those of us with small (a few to a dozen servers), we'd like to 
get out of server maintenance completely. Can anyone with experience 
on a cloud VM solution comment?  Do the VM solutions provided by the 
major hosting companies have the same good performance as the VM's 
that that several have described here?


Obviously there's Amazon's new Postgres solution available.  What else 
is out there in the way of instant on solutions with 
Linux/Postgres/Apache preconfigured systems?  Has anyone used them in 
production?





If you want a full stack including Postgres, Heroku might be your best 
bet. Depends a bit on your application and your workload. And yes, I've 
used it. Full disclosure: I have done work paid for by Heroku.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Andrew Dunstan


On 11/26/2013 08:51 AM, Boszormenyi Zoltan wrote:

2013-11-25 21:19 keltezéssel, Heikki Linnakangas írta:

On 25.11.2013 22:01, Lee Nguyen wrote:

Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM. 
That

bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres 
database
on our VM farm alongside our application vm's.  We are planning to 
run a

few Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM?  What are the downsides? Does 
anyone

have any metrics or benchmarks with the latest Postgres?


I've also heard people say that they've seen PostgreSQL to perform 
worse in a VM. In the performance testing that we've done in VMware, 
though, we haven't seen any big impact. So I guess the answer is that 
it depends on the specific configuration of CPU, memory, disks and 
the software.


We at Cybertec tested some configurations about 2 months ago.
The performance drop is coming from the disk given to the VM guest.

When there is a dedicated disk (pass through) given to the VM guest,
PostgreSQL runs at a speed of around 98% of the bare metal.

When the virtual disk is a disk file on the host machine, we've measured
20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux 
kernel
with EXT4 filesystem (this latter is sure, not IIRC). The effect was 
observed

both under Qemu/KVM and Xen.

The virtual disk was not pre-allocated, since it was the default setting,
i.e. space savings preferred over speed. The figure might be better with
a pre-allocated disk but the filesystem journalling done twice (both 
in the

host and the guest) will have an effect.



Not-pre-allocated disk-file backed is just about the worst case in my 
experience.


Try pre-allocated VirtIO disks on an LVM volume group - you should get 
much better performance.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread David Boreham

On 11/26/2013 7:26 AM, Craig James wrote:


For those of us with small (a few to a dozen servers), we'd like to 
get out of server maintenance completely. Can anyone with experience 
on a cloud VM solution comment?  Do the VM solutions provided by the 
major hosting companies have the same good performance as the VM's 
that that several have described here?


Obviously there's Amazon's new Postgres solution available.  What else 
is out there in the way of instant on solutions with 
Linux/Postgres/Apache preconfigured systems?  Has anyone used them in 
production?


I've done some work with Heroku and the MySQL flavor of AWS service.
They work, and are convenient, but there are a couple of issues :

1. Random odd (and bad) things can happen from a performance perspective 
that you just need to cope with. e.g. I/O will become vastly slower for 
periods of 10s of seconds, once or twice a day. If you don't like the 
idea of phenomena like this in your system, beware.


2. Your inability to connect with the bare metal may turn out to be a 
significant hassle when trying to understand some performance issue in 
the future. Tricks that we're used to using such as looking at iostat 
(or even top) output are no longer usable because the hosting company 
will not give you a login on the host VM. This limitation extends to 
many many techniques that have been commonly used in the past and can 
become a major headache to the point where you need to reproduce the 
system on physical hardware just to understand what's going on with it 
(been there, done that...)


For the reasons above I would caution deploying a production service 
(today) on a SaaS database service like Heroku or Amazon RDS.
Running your own database inside a stock VM might be better, but it can 
be hard to get the right kind of I/O for that deployment scenario.
In the case of self-hosted VMWare or KVM obviously you have much more 
control and observability.


Heroku had (at least when I last used it, a year ago or so) an 
additional issue in that they host on AWS VMs so if something goes wrong 
you are talking to one company that is using another company's virtual 
machine service. Not a recipe for clarity, good service and hair 
retention...








--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Craig James
On Tue, Nov 26, 2013 at 8:29 AM, David Boreham david_l...@boreham.orgwrote:

 On 11/26/2013 7:26 AM, Craig James wrote:


 For those of us with small (a few to a dozen servers), we'd like to get
 out of server maintenance completely. Can anyone with experience on a cloud
 VM solution comment?  ...


 I've done some work with Heroku and the MySQL flavor of AWS service.


Thanks, I'll check Heroku out.


 For the reasons above I would caution deploying a production service
 (today) on a SaaS database service like Heroku or Amazon RDS.
 Running your own database inside a stock VM might be better, but it can be
 hard to get the right kind of I/O for that deployment scenario.
 In the case of self-hosted VMWare or KVM obviously you have much more
 control and observability.


Well, the whole point of switching to a cloud provider is to get out of the
business of buying hardware and hauling it down to the co-lo facility.
Adding VMWare or KVM is just one more thing we'd have to add to our
sysadmin skills.  We'd rather focus on our core technology, the stuff we're
better at than anyone else.

So far I'm impressed by what I've read about Amazon's Postgres instances.
Maybe the reality will be disappointing, but (for example) the idea of
setting up streaming replication with one click is pretty appealing.

Craig


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Josh Berkus
On 11/25/2013 12:01 PM, Lee Nguyen wrote:
 Hi,
 
 Having attended a few PGCons, I've always heard the remark from a few
 presenters and attendees that Postgres shouldn't be run inside a VM. That
 bare metal is the only way to go.

This is pretty dated advice.  Early VMs had horrible performance under
load, which is mostly where this thinking comes from.  It's not true
anymore.

It *is* true that getting good performance in a virtualized environment
requires more tuning than bare metal, because you have to tune the VM
system as well.

 Here at work we were entertaining the idea of running our Postgres database
 on our VM farm alongside our application vm's.  We are planning to run a
 few Postgres synchronous replication nodes.

Biggest pitfall here is IO performance configuration.  I can't give you
specific advice without knowing the platform and the desired workload.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Merlin Moncure
On Tue, Nov 26, 2013 at 11:31 AM, Josh Berkus j...@agliodbs.com wrote:
 On 11/25/2013 12:01 PM, Lee Nguyen wrote:
 Hi,

 Having attended a few PGCons, I've always heard the remark from a few
 presenters and attendees that Postgres shouldn't be run inside a VM. That
 bare metal is the only way to go.

 This is pretty dated advice.  Early VMs had horrible performance under
 load, which is mostly where this thinking comes from.  It's not true
 anymore.

 It *is* true that getting good performance in a virtualized environment
 requires more tuning than bare metal, because you have to tune the VM
 system as well.

 Here at work we were entertaining the idea of running our Postgres database
 on our VM farm alongside our application vm's.  We are planning to run a
 few Postgres synchronous replication nodes.

 Biggest pitfall here is IO performance configuration.  I can't give you
 specific advice without knowing the platform and the desired workload.

Yeah.  Seeing things like provisioned iops in the cloud services is a
pretty big deal.  I do think it's still fairly expensive for what you
get but SSDs and competition is going to force prices down quickly
over time. For in house virtualized setups, you can get pretty far
with SSDs using any number of options (direct attached to the host,
iscsi etc, SAN etc).

For I/O constrained systems, I don't consider any spindle based
systems, in particular SANs, to be a good investment.   Curious: I
just read your article on iscsi
(http://it.toolbox.com/blogs/database-soup/the-problem-with-iscsi-30602).
 Do you still consider iscsi to be imperformant?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Craig James
On Tue, Nov 26, 2013 at 10:40 AM, Ben Chobot be...@silentmedia.com wrote:

 On Nov 26, 2013, at 9:24 AM, Craig James wrote:

 So far I'm impressed by what I've read about Amazon's Postgres instances.
 Maybe the reality will be disappointing, but (for example) the idea of
 setting up streaming replication with one click is pretty appealing.


 Where did you hear this was an option? When we talked to AWS about their
 Postgres RDS offering, they were pretty clear that (currently) replication
 is hardware-based, the slave is not live, and you don't get access to the
 WALs that they use internally for PITR. Changing that is something they
 want to address, but isn't there today.


I was guessing from the description of their High Availability option ...
but maybe it uses something like pg-pool, or as you say, maybe they do it
at the hardware level.

http://aws.amazon.com/rds/postgresql/#High-Availability


Multi-AZ Deployments – This deployment option for your production DB
Instances enhances database availability while protecting your latest
database updates against unplanned outages. When you create or modify your
DB Instance to run as a Multi-AZ deployment, Amazon RDS will automatically
provision and manage a “standby” replica in a different Availability Zone
(independent infrastructure in a physically separate location). Database
updates are made concurrently on the primary and standby resources to
prevent replication lag. In the event of planned database maintenance, DB
Instance failure, or an Availability Zone failure, Amazon RDS will
automatically failover to the up-to-date standby so that database
operations can resume quickly without administrative intervention. Prior to
failover you cannot directly access the standby, and it cannot be used to
serve read traffic.

Either way, if a cold standby is all you need, it's still a one-click
option, lots simpler than setting it up yourself.

Craig


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread David Kerr
On Tue, Nov 26, 2013 at 11:18:41AM -0800, Craig James wrote:
- On Tue, Nov 26, 2013 at 10:40 AM, Ben Chobot be...@silentmedia.com wrote:
- 
-  On Nov 26, 2013, at 9:24 AM, Craig James wrote:
- 
-  So far I'm impressed by what I've read about Amazon's Postgres instances.
-  Maybe the reality will be disappointing, but (for example) the idea of
-  setting up streaming replication with one click is pretty appealing.
- 
- 
-  Where did you hear this was an option? When we talked to AWS about their
-  Postgres RDS offering, they were pretty clear that (currently) replication
-  is hardware-based, the slave is not live, and you don't get access to the
-  WALs that they use internally for PITR. Changing that is something they
-  want to address, but isn't there today.
- 
- 
- I was guessing from the description of their High Availability option ...
- but maybe it uses something like pg-pool, or as you say, maybe they do it
- at the hardware level.
- 
- http://aws.amazon.com/rds/postgresql/#High-Availability
- 
- 
- Multi-AZ Deployments – This deployment option for your production DB
- Instances enhances database availability while protecting your latest
- database updates against unplanned outages. When you create or modify your
- DB Instance to run as a Multi-AZ deployment, Amazon RDS will automatically
- provision and manage a “standby” replica in a different Availability Zone
- (independent infrastructure in a physically separate location). Database
- updates are made concurrently on the primary and standby resources to
- prevent replication lag. In the event of planned database maintenance, DB
- Instance failure, or an Availability Zone failure, Amazon RDS will
- automatically failover to the up-to-date standby so that database
- operations can resume quickly without administrative intervention. Prior to
- failover you cannot directly access the standby, and it cannot be used to
- serve read traffic.
- 
- Either way, if a cold standby is all you need, it's still a one-click
- option, lots simpler than setting it up yourself.
- 
- Craig

The Multi-AZ deployments don't expose the replica to you unless there is a 
failover. (in which case it picks one and promotes it)

There is an option for Create Read Replica but it's currently not available so
we can assume that will eventually be an option.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance