Re: [PERFORM] amazon ec2

Mark Rostron Tue, 03 May 2011 13:57:11 -0700

iowait is a problem on any platform that relies on spinning media, compared
to RAM.
no matter how fast a disk is, and no matter how intelligent the controller
is, you are still dealing with an access speed differential of 10^6 (speed
of disk access compared to memory access).
i have had good results by avoiding it.
if you can do this, ec2 is not too shabby, but beware - it doesn't come
free.
this is achievable under the following circumstances (and maybe there are
other ways to do this).
i use a technique of pro-actively querying enough of my anticipated result
set with a daemon procedure.
as long as the frequency of your query daemon execution is greater than that
of the competitor processes (eg ETL and other update activity), AND a
substantial part of the result set will fit in available RAM, then the
result set will be served from file system cache at the time you want it.
i have found that it doesn't take much to get this to happen, once you have
identified your critical result set.
like - you can get away with running it once/hour, and i'm still reducing
the frequency and getting good results.
this approach basically assumes a 90/10 rule - at any point in time, you
only want to access 10% of your data. if you can work out what the 10% is,
and it will fit into RAM, then you can set it up to cache it.
it also imposes no additional cost in ec2, because Amazon doesn't bill you
for CPU activity, although the large-RAM machines do cost more. Depends on
how big your critical result set is, and how much speed you need.

dont know if this helps - the success/failure of it depends on your typical
query activity, the size of your critical result set, and whether you are
able to get enough RAM to make this work.

as i said it doesn't come for free, but you can make it work.

as a further point, try also checking out greenplum - it is an excellent
postgres derivative with a very powerful free version. the reason why i
bring it up is because it offers block-level compression (with caveats - it
also doesn't come for free, so do due diligence and rtfm carefully). The
compression enabled me to improve the cache hit rate, and so you further
reduce the iowait problem.
greenplum is also a better parallel machine than postgres, so combining the
cache technique above with greenplum compression and parallel query, i have
been able to get 20:1 reduction in response times for some of our queries.
obviously introducing new database technology is a big deal, but we needed
the speed, and it kinda worked.

mr

On Tue, May 3, 2011 at 1:09 PM, Alan Hodgson <ahodg...@simkin.ca> wrote:

> On May 3, 2011 12:43:13 pm you wrote:
> > On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
> > > I am also interested in tips for this. EBS seems to suck pretty bad.
> >
> > Alan, can you elaborate? Are you using PG on top of EBS?
> >
>
> Trying to, yes.
>
> Let's see ...
>
> EBS volumes seem to vary in speed. Some are relatively fast. Some are
> really
> slow. Some fast ones become slow randomly. Some are fast attached to one
> instance, but really slow attached to another.
>
> Fast being a relative term, though. The fast ones seem to be able to do
> maybe
> 400 random IOPS. And of course you can only get about 80MB/sec sequential
> access to them on a good day.
>
> Which is why I'm interested in how other people are doing it. So far EC2
> doesn't seem well suited to running databases at all.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] amazon ec2

Reply via email to