Re: [PERFORM] amazon ec2

2011-05-05 Thread Dave Page
On Thu, May 5, 2011 at 1:02 AM, Josh Berkus j...@agliodbs.com wrote:

 FWIW, EnterpriseDB's InfiniCache provides the same caching benefit. The 
 way that works is when PG goes to evict a page from shared buffers that page 
 gets compressed and stuffed into a memcache cluster. When PG determines that 
 a given page isn't in shared buffers it will then check that memcache 
 cluster before reading the page from disk. This allows you to cache amounts 
 of data that far exceed the amount of memory you could put in a physical 
 server.

 So memcached basically replaces the filesystem?

No, it sits in between shared buffers and the filesystem, effectively
providing an additional layer of extremely large, compressed cache.
Even on a single server there can be benefits over larger shared
buffers due to the compression.

 That sounds cool, but I'm wondering if it's actually a performance
 speedup.  Seems like it would only be a benefit for single-row lookups;
 any large reads would be a mess.

Depends on the database and the workload - if you can fit your entire
100GB database in cache, and your workload is read intensive then the
speedups are potentially huge (I've seen benchmarks showing 20x+).
Write intensive workloads, less so, similarly if the working set is
far larger than your cache size.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-04 Thread Shaun Thomas

On 05/03/2011 01:48 PM, Joel Reymont wrote:


What are the best practices for setting up PG 9.x on Amazon EC2 to
get the best performance?


Use EC2 and other Amazon hosting for cloud-based client access only. 
Their shared disk services are universally despised by basically 
everyone who has tried to use it for database hosting.


The recommended pattern is to have the scalable cloud clients access 
(and memcache) a remote DB at a colo or managed services host. EBS is a 
nice idea, and probably fine for things like image or video hosting, but 
database access, especially for OLTP databases, will just result in 
wailing and gnashing of teeth.


Just ask anyone who got bit by the recent EBS failure that spanned 
*several* availability zones. For all those clients who thought they 
were safe by deploying across multiple ACs, it was a rather rude awakening.


http://aws.amazon.com/message/65648/

The consensus seems to be that Amazon's cloud is fine... so long as you 
stay far, far away from EBS. Apparently that needs a little more work.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-04 Thread Josh Berkus

 FWIW, EnterpriseDB's InfiniCache provides the same caching benefit. The way 
 that works is when PG goes to evict a page from shared buffers that page gets 
 compressed and stuffed into a memcache cluster. When PG determines that a 
 given page isn't in shared buffers it will then check that memcache cluster 
 before reading the page from disk. This allows you to cache amounts of data 
 that far exceed the amount of memory you could put in a physical server.

So memcached basically replaces the filesystem?

That sounds cool, but I'm wondering if it's actually a performance
speedup.  Seems like it would only be a benefit for single-row lookups;
any large reads would be a mess.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-04 Thread david

On Wed, 4 May 2011, Josh Berkus wrote:


Date: Wed, 04 May 2011 17:02:53 -0700
From: Josh Berkus j...@agliodbs.com
To: postgres performance list pgsql-performance@postgresql.org
Subject: Re: [PERFORM] amazon ec2



FWIW, EnterpriseDB's InfiniCache provides the same caching benefit. The way 
that works is when PG goes to evict a page from shared buffers that page gets compressed 
and stuffed into a memcache cluster. When PG determines that a given page isn't in shared 
buffers it will then check that memcache cluster before reading the page from disk. This 
allows you to cache amounts of data that far exceed the amount of memory you could put in 
a physical server.


So memcached basically replaces the filesystem?

That sounds cool, but I'm wondering if it's actually a performance
speedup.  Seems like it would only be a benefit for single-row lookups;
any large reads would be a mess.


I think it would depend a lot on how well the page compresses.

if the network I/O plus uncompression time is faster than the seek and 
read from disk it should be a win.


I don't see why the benifit would be limited to single row lookups, 
anything within that page should be the same.


for multipage actions, the disk would have less of a disadvantage as 
readahead may be able to hide some of the work while the memcache approach 
would need to do separate transactions for each page.


this does bring up an interesting variation for hierarchical storage, 
using compressed pages in memcache rather than dedicated resources on a 
local system. thanks for the thought and I'll keep it in mind. I can think 
of lots of cases where the database stores a relativly small set of values 
that would compress well.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-04 Thread Denis de Bernardy
- Original Message -

 From: Josh Berkus j...@agliodbs.com
 To: postgres performance list pgsql-performance@postgresql.org
 Cc: 
 Sent: Thursday, May 5, 2011 2:02 AM
 Subject: Re: [PERFORM] amazon ec2
 So memcached basically replaces the filesystem?
 
 That sounds cool, but I'm wondering if it's actually a performance
 speedup.  Seems like it would only be a benefit for single-row lookups;
 any large reads would be a mess.


I've never tested with pgsql, but with mysql it makes a *huge* difference when 
you're pulling data repeatedly. Multi-row lookups can be cached too:

$rows = $cache-get(md5($query . '--' . serialize($args)));

if ( !$rows) {
  // query and cache for a few hours...
}

This is true even with mysql's caching features turned on. You spare the DB 
from doing identical queries that get repeated over and over. Memcache lets you 
pull those straight from the memory, allowing for the DB server to handle new 
queries exclusively.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-03 Thread Alan Hodgson
On May 3, 2011 11:48:35 am Joel Reymont wrote:
 What are the best practices for setting up PG 9.x on Amazon EC2 to get the
 best performance?


I am also interested in tips for this. EBS seems to suck pretty bad.
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-03 Thread Joel Reymont

On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:

 I am also interested in tips for this. EBS seems to suck pretty bad.

Alan, can you elaborate? Are you using PG on top of EBS?

--
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
-++---
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
-++---




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-03 Thread Alan Hodgson
On May 3, 2011 12:43:13 pm you wrote:
 On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
  I am also interested in tips for this. EBS seems to suck pretty bad.
 
 Alan, can you elaborate? Are you using PG on top of EBS?
 

Trying to, yes.

Let's see ...

EBS volumes seem to vary in speed. Some are relatively fast. Some are really 
slow. Some fast ones become slow randomly. Some are fast attached to one 
instance, but really slow attached to another.

Fast being a relative term, though. The fast ones seem to be able to do maybe 
400 random IOPS. And of course you can only get about 80MB/sec sequential 
access to them on a good day.

Which is why I'm interested in how other people are doing it. So far EC2 
doesn't seem well suited to running databases at all.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-03 Thread Greg Spiegelberg
On Tue, May 3, 2011 at 2:09 PM, Alan Hodgson ahodg...@simkin.ca wrote:

 On May 3, 2011 12:43:13 pm you wrote:
  On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
   I am also interested in tips for this. EBS seems to suck pretty bad.
 
  Alan, can you elaborate? Are you using PG on top of EBS?
 

 Trying to, yes.

 Let's see ...

 EBS volumes seem to vary in speed. Some are relatively fast. Some are
 really
 slow. Some fast ones become slow randomly. Some are fast attached to one
 instance, but really slow attached to another.


I ran pgbench tests late last year comparing EC2, GoGrid, a 5 year-old lab
server and a new server.  Whether I used a stock postgresql.conf or tweaked,
the current 8.4 or 9.0, or varied the EC2 instance size EC2 was always at
the bottom ranging from 409.834 to 693.100 tps.  GoGrid's pgbench TPS
numbers in similar tests were, on average, 3X that of EC2 (1,399.550 to
1,631.887 tps).  The tests I conducted were small with 10 connections and
total 5,000 transactions.  The single variable that helped pgbench tests in
EC2 was to select an instance size where the number of cores was equal to or
greater than the number of connections I used in the tests however this only
improved things slightly (715.931 tps).

For comparisons purposes, I ran the same tests on a 24-way X5650 with 12 GB
and SAS RAID 10.  This server typically ranged from 2,188.348 to 2,216.377
tps.

I attributed GoGrids superior performance over EC2 as EC2 simply
being over-allocated but that's just speculation on my part.  To test my
theory, I had wanted to put the database on a ramdisk, or like device, in
EC2 and GoGrid but never got around to it.



 Fast being a relative term, though. The fast ones seem to be able to do
 maybe
 400 random IOPS. And of course you can only get about 80MB/sec sequential
 access to them on a good day.

 Which is why I'm interested in how other people are doing it. So far EC2
 doesn't seem well suited to running databases at all.


I was doing this perhaps to convince management to give me some time to
validate our software (PG backed) on some of the cloud providers but with
those abysmal numbers I didn't even bother at the time.  I may revisit at
some point b/c I know Amazon at least has been making architecture
adjustments and updates.

Greg


Re: [PERFORM] amazon ec2

2011-05-03 Thread Mark Rostron
iowait is a problem on any platform that relies on spinning media, compared
to RAM.
no matter how fast a disk is, and no matter how intelligent the controller
is, you are still dealing with an access speed differential of 10^6 (speed
of disk access compared to memory access).
i have had good results by avoiding it.
if you can do this, ec2 is not too shabby, but beware - it doesn't come
free.
this is achievable under the following circumstances (and maybe there are
other ways to do this).
i use a technique of pro-actively querying enough of my anticipated result
set with a daemon procedure.
as long as the frequency of your query daemon execution is greater than that
of the competitor processes (eg ETL and other update activity), AND a
substantial part of the result set will fit in available RAM, then the
result set will be served from file system cache at the time you want it.
i have found that it doesn't take much to get this to happen, once you have
identified your critical result set.
like - you can get away with running it once/hour, and i'm still reducing
the frequency and getting good results.
this approach basically assumes a 90/10 rule - at any point in time, you
only want to access 10% of your data. if you can work out what the 10% is,
and it will fit into RAM, then you can set it up to cache it.
it also imposes no additional cost in ec2, because Amazon doesn't bill you
for CPU activity, although the large-RAM machines do cost more. Depends on
how big your critical result set is, and how much speed you need.

dont know if this helps - the success/failure of it depends on your typical
query activity, the size of your critical result set, and whether you are
able to get enough RAM to make this work.

as i said it doesn't come for free, but you can make it work.

as a further point, try also checking out greenplum - it is an excellent
postgres derivative with a very powerful free version. the reason why i
bring it up is because it offers block-level compression (with caveats - it
also doesn't come for free, so do due diligence and rtfm carefully). The
compression enabled me to improve the cache hit rate, and so you further
reduce the iowait problem.
greenplum is also a better parallel machine than postgres, so combining the
cache technique above with greenplum compression and parallel query, i have
been able to get 20:1 reduction in response times for some of our queries.
obviously introducing new database technology is a big deal, but we needed
the speed, and it kinda worked.

mr


On Tue, May 3, 2011 at 1:09 PM, Alan Hodgson ahodg...@simkin.ca wrote:

 On May 3, 2011 12:43:13 pm you wrote:
  On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
   I am also interested in tips for this. EBS seems to suck pretty bad.
 
  Alan, can you elaborate? Are you using PG on top of EBS?
 

 Trying to, yes.

 Let's see ...

 EBS volumes seem to vary in speed. Some are relatively fast. Some are
 really
 slow. Some fast ones become slow randomly. Some are fast attached to one
 instance, but really slow attached to another.

 Fast being a relative term, though. The fast ones seem to be able to do
 maybe
 400 random IOPS. And of course you can only get about 80MB/sec sequential
 access to them on a good day.

 Which is why I'm interested in how other people are doing it. So far EC2
 doesn't seem well suited to running databases at all.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] amazon ec2

2011-05-03 Thread david
phoronix did some benchmarks of the ec2 machines and they show pretty poor 
numbers, especially in the I/O side of things


http://www.phoronix.com/scan.php?page=articleitem=amazon_ec2_round1num=1 
http://www.phoronix.com/scan.php?page=articleitem=amazon_ec2_micronum=1


David Lang


On Tue, 3 May 2011, Alan Hodgson wrote:


Date: Tue, 3 May 2011 13:09:51 -0700
From: Alan Hodgson ahodg...@simkin.ca
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] amazon ec2

On May 3, 2011 12:43:13 pm you wrote:

On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:

I am also interested in tips for this. EBS seems to suck pretty bad.


Alan, can you elaborate? Are you using PG on top of EBS?



Trying to, yes.

Let's see ...

EBS volumes seem to vary in speed. Some are relatively fast. Some are really
slow. Some fast ones become slow randomly. Some are fast attached to one
instance, but really slow attached to another.

Fast being a relative term, though. The fast ones seem to be able to do maybe
400 random IOPS. And of course you can only get about 80MB/sec sequential
access to them on a good day.

Which is why I'm interested in how other people are doing it. So far EC2
doesn't seem well suited to running databases at all.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-03 Thread Josh Berkus
On 5/3/11 11:48 AM, Joel Reymont wrote:
 What are the best practices for setting up PG 9.x on Amazon EC2 to get the 
 best performance?

Yes.  Don't use EC2.

There is no best performance on EC2.  There's not even good
performance.  Basically, EC2 is the platform for when performance
doesn't matter.

Use a dedicated server, or use a better cloud host.

http://it.toolbox.com/blogs/database-soup/how-to-make-your-database-perform-well-on-amazon-ec2-45725

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-03 Thread Greg Smith

Mark Rostron wrote:
the success/failure of it depends on your typical query activity, the 
size of your critical result set, and whether you are able to get 
enough RAM to make this work.


Basically, it all comes down to does the working set of data I access 
frequently fit in RAM?  If it does, it's possible to get reasonable 
performance out of an EC2 instance.  The EBS disks are so slow, both on 
average and particularly in cases where you have contention with other 
users slowing you down, that any situation where you have to use them is 
never going to work well.  If most of the data fits in RAM, and the CPU 
resources available to your instance are sufficient to service your 
queries, you might see acceptable performance.


greenplum is also a better parallel machine than postgres, so 
combining the cache technique above with greenplum compression and 
parallel query, i have been able to get 20:1 reduction in response 
times for some of our queries.


I've also seen over a 20:1 speedup over PostgreSQL by using Greenplum's 
free Community Edition server, in situations where its column store + 
compression features work well on the data set.  That's easiest with an 
append-only workload, and the data set needs to fit within the 
constraints where indexes on compressed data are useful.  But if you fit 
the use profile it's good at, you end up with considerable ability to 
trade-off using more CPU resources to speed up queries.  It effectively 
increases the amount of data that can be cached in RAM by a large 
multiple, and in the EC2 context (where any access to disk is very slow) 
it can be quite valuable.  My colleague Gabrielle wrote something about 
setting this up on an earlier version of Greenplum's software at 
http://blog.2ndquadrant.com/en/2010/03/installing-greenplum-sne-ec2.html 
that gives an idea how that was setup.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] amazon ec2

2011-05-03 Thread Greg Smith

Greg Spiegelberg wrote:
I ran pgbench tests late last year comparing EC2, GoGrid, a 5 year-old 
lab server and a new server.  Whether I used a stock postgresql.conf 
or tweaked, the current 8.4 or 9.0, or varied the EC2 instance size 
EC2 was always at the bottom ranging from 409.834 to 693.100 tps. 
 GoGrid's pgbench TPS numbers in similar tests were, on average, 3X 
that of EC2 (1,399.550 to 1,631.887 tps).  The tests I conducted were 
small with 10 connections and total 5,000 transactions.  The single 
variable that helped pgbench tests in EC2 was to select an instance 
size where the number of cores was equal to or greater than the number 
of connections I used in the tests however this only improved things 
slightly (715.931 tps).


The standard pgbench test is extremely sensitive to how fast 
transactions can be committed to disk.  That doesn't reflect what 
performance looks like on most real-world workloads, which tend toward 
more reads.  The fact that GoGrid is much faster at doing commits than 
EC2 is interesting, but that's only one of many parameters that impact 
performance on more normal workloads.


The one parameter that can change how the test runs is turning off 
synchronous_commit, which pulls the commit time out of the results to 
some extent.  And if you'd switched pgbench to a more read-oriented 
test, you'd discover it becomes extremely sensitive to the size of the 
database, as set by pgbench's scale parameter during setup.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance