Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Aidan Van Dyk
On Thu, Apr 2, 2015 at 6:33 AM, Pietro Pugni pietro.pu...@gmail.com wrote:


 *T420*
 work_mem = 512MB



 *MacMini*
 work_mem = 32MB


So that is why the T420 does memory sorts and the mini does disk sorts.

I'd start looking at why memory sorts on the T420 is so slow.   Check your
numa settings, etc (as already mentioned).

For a drastic test, disable the 2nd socket on the dell, and just use one
(eliminate any numa/QPI costs) and see how it compares to the no-numa
MacMini.

If you want to see how bad the NUMA/QPI is, play with stream to benchmark
memory performance.

a.


Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Aidan Van Dyk
On Thu, Apr 2, 2015 at 9:23 AM, Pietro Pugni pietro.pu...@gmail.com wrote:


 the command
 dmesg | grep -i numa
 doesn’t display me anything. I think T420 hasn’t NUMA on it. Is there a
 way to enable it from Ubuntu? I don’t have immediate access to BIOS (server
 is in another location).


NUMA stands for Non-Uniform-Memory-Access .  It's basically the label
for systems which have memory attached to different cpu sockets, such that
accessing all of the memory from a paritciular cpu thread has different
costs based on where the actual memory is located (i.e. on some other
socket, or the local socket).


 For QPI I don’t know what to do. Please, can you give me more details?


QPI is the the intel QuickPath Interconnect. It's the communication path
between CPU sockets.   Memory ready by one cpu thread that has to come from
another cpu socket's memory controller goes through QPI.

Google has lots of info on these, and how they impact performance, etc.

 If you want to see how bad the NUMA/QPI is, play with stream to benchmark
 memory performance.


 With stream you refer to this:
 https://sites.utexas.edu/jdm4372/tag/stream-benchmark/ ? Do you suggest
 me some way to do this kind of tests?


Ya, that's the one.  I don't have specific tests in mind.

A more simple overview might be numactl --hardware

a.


Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-01 Thread Aidan Van Dyk
On Wed, Apr 1, 2015 at 9:56 AM, Pietro Pugni pietro.pu...@gmail.com wrote:


 *Now let’s propose some query profiling times.*

 B type set are transactions, so it's impossible for me to post EXPLAIN
 ANALYZE results. I've extracted two querys from a single transactions and
 executed the twos on both system. Here are the results:

 *T420*

 Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM

 Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06

 *MacMini*

 Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx

 Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk


Looking at the 2 B_2 queries (since they are so drastically different), the
in-memory quicksorts stand out on the Dell as being *drastically* slower
than the disk-based sorts on your mac-mini


Re: [PERFORM] Tablespaces on a raid configuration

2012-03-31 Thread Aidan Van Dyk
On Fri, Mar 30, 2012 at 9:32 PM, Tomas Vondra t...@fuzzy.cz wrote:

 And it's not just about fsync operations - WAL is written in sequential
 manner. By placing it on the same device as data files you're
 effectively forcing it to be written randomly, because the the database
 has to write a WAL record, seeks somewhere else to read something, etc.

Or, if you put WAL on a journalled FS, even if it's on dedicated spindles ;-)

a.
-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Response time increases over time

2011-12-07 Thread Aidan Van Dyk
On Wed, Dec 7, 2011 at 5:13 PM, Havasvölgyi Ottó
havasvolgyi.o...@gmail.com wrote:

 So there seems to be something on this Debian machine that hinders
 PostgreSQL to perform better. With 8.4 I logged slow queries (with 9.1 not
 yet), and almost all were COMMIT, taking 10-20-30 or even more ms. But at
 the same time the fsync rate can be quite high based on pg_test_fsync, so
 probably not fsync is what makes it slow. Performance seems to degrade
 drastically as I increase the concurrency, mainly concurrent commit has
 problems as I can see.

 Do anybody have any idea based on this info about what can cause such
 behaviour, or what I could check or try?

Let me guess, debian squeeze, with data and xlog on both on a single
ext3 filesystem, and the fsync done by your commit (xlog) is flushing
all the dirty data of the entire filesystem (including PG data writes)
out before it can return...

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query uses index but takes too much time?

2011-11-24 Thread Aidan Van Dyk
On Thu, Nov 24, 2011 at 12:20 PM, MirrorX mirr...@gmail.com wrote:

 -32 cores,

 -not cpu bound (the cpu util was about 5% when i performed the tests)

A single query will only use a single CPU.

5% of 32 cores is 100% of 1.6 cores.

Are you sure that the 1 core doing the 1 postgresql query wasn't 100% utilized?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-17 Thread Aidan Van Dyk
On Thu, Nov 17, 2011 at 12:23 AM, Tory M Blue tmb...@gmail.com wrote:

 What do you mean by nothing? There are 3060 reads/s, servicing each one
 takes 0.33 ms - that means the drive is 100% utilized.

 The problem with the iostat results you've posted earlier is that they
 either use -xd or none of those switches. That means you can's see CPU
 stats and extended I/O stats at the same time - use just -x next time.

 Anyway the results show that %iowait is about 6% - as Scott Marlowe
 pointed out, this means 1 core is waiting for I/O. That's the core running
 your query. Try to execute the query 16x and you'll see the iowait is
 100%.

 Yes this I understand and is correct. But I'm wrestling with the idea
 that the Disk is completely saturated. I've seen where I actually run
 into high IO/Wait and see that load climbs as processes stack.

 I'm not arguing (please know this), I appreciate the help and will try
 almost anything that is offered here, but I think if I just threw
 money at the situation (hardware), I wouldn't get any closer to
 resolution of my issue. I am very interested in other solutions and
 more DB structure changes etc.

But remember, you're doing all that in a single query.  So your disk
subsystem might even be able to perform even more *througput* if it
was given many more concurrent request.  A big raid10 is really good
at handling multiple concurrent requests.  But it's pretty much
impossible to saturate a big raid array with only a single read
stream.

With a single query, the query can only run as fast as the single
stream of requests can be satisfied.  And as the next read is issued
as soon as the previous is done (the kernel readahead/buffering the
seq scan helps here), your iostat is going to show 100% util, because
the there is always the next read in progress, even if the average
queue size is low (1).  If you had a 24 spindle array, you could add
another 20 queries, and you could see the queue size go up, but the
util would still only be 100%, latency would stay about the same, even
though your throughput could be 20 times greater.

So, as long as you have a single query scanning that entire 83GB
table, and that table has to come from disk (i.e. not cached kernel
buffers in ram), you're going to be limited by the amount of time it
takes to read that table in 8K chunks.

Options for improving it are:

1) Making sure your array/controller/kernel are doing the maximum
read-ahead/buffering possible to make reading that 83GB as quick as
possible
2) Changing the query to not need to scan all 83GB.

#2 is where you're going to see orders-of-magnitude differences in
performance, and there are lots of options there.  But because there
are so many options, and so many variables in what type of other
queries, inserts, updates, and deletes are done on the data, no one of
them is necessarily the best for everyone.

But if you have the ability to alter queries/schema slightly, you've
got lots of avenues to explore ;-)  And folks here are more than
willing to offer advice and options that may be *very* fruitful.

1) Multicolumn index (depending on insert/update/delete patterns)
2) partition by date (depending on query types)
3) rollup views of history (depending on query types)
4) trigger based mat-view style rollups (depending on
insert/update/delete patterns coupled with query types)


a.
-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What's the state of postgresql on ext4 now?

2011-11-15 Thread Aidan Van Dyk
On Tue, Nov 15, 2011 at 10:48 PM, Greg Smith g...@2ndquadrant.com wrote:

     In just about every other way
 but commit performance, ext4 is faster than most other filesystems.

As someone who is looked at as an expert and knowledgable my many of
us, are you getting to the point of migrating large XFS filesystems to
ext4 for production databases yet?  Or at least using ext4 in new
large-scale filesystems for PG?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++

2011-09-12 Thread Aidan Van Dyk
On Mon, Sep 12, 2011 at 6:57 PM,  da...@lang.hm wrote:

 The barrier is the linux fs/block way of saying these writes need
 to be on persistent media before I can depend on them.  On typical
 spinning media disks, that means out of the disk cache (which is not
 persistent) and on platters.  The way it assures that the writes are
 on persistant media is with a flush cache type of command.  The
 flush cache is a close approximation to make sure it's persistent.

 If your cache is battery backed, it is now persistent, and there is no
 need to flush cache, hence the nobarrier option if you believe your
 cache is persistent.

 Now, make sure that even though your raid cache is persistent, your
 disks have cache in write-through mode, cause it would suck for your
 raid cache to work, but believe the data is safely on disk and only
 find out that it was in the disks (small) cache, and you're raid is
 out of sync after an outage because of that...  I believe most raid
 cards will handle that correctly for you automatically.

 if you don't have barriers enabled, the data may not get written out of main
 memory to the battery backed memory on the card as the OS has no reason to
 do the write out of the OS buffers now rather than later.

It's not quite so simple.  The sync calls (pick your flavour) is
what tells the OS buffers they have to go out.  The syscall (on a
working FS) won't return until the write and data has reached the
device safely, and is considered persistent.

But in linux, a barrier is actually a synchronization point, not
just a flush cache...  It's a guarantee everything up to now is
persistent, I'm going to start counting on it.  But depending on your
card, drivers and yes, kernel version, that barrier is sometimes a
drain/block I/O queue, issue cache flush, wait, write specific data,
flush, wait, open I/O queue.  The double flush is because it needs to
guarantee everything previous is good before it writes the critical
piece, and then needs to guarantee that too.

Now, on good raid hardware it's not usually that bad.

And then, just to confuse people more, LVM up until 2.6.29 (so that
includes all those RHEL5/CentOS5 installs out there which default to
using LVM) didn't handle barriers, it just sort of threw them out as
it came across them, meaning that you got the performance of
nobarrier, even if you thought you were using barriers on poor raid
hardware.

 Every raid card I have seen has ignored the 'flush cache' type of command if
 it has a battery and that battery is good, so you leave the barriers enabled
 and the card still gives you great performance.

XFS FAQ  goes over much of it, starting at Q24:
   
http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F

So, for pure performance, on a battery-backed controller, nobarrier is
the recommended *performance* setting.

But, to throw a wrench into the plan, what happens when during normal
battery tests, your raid controller decides the battery is failing...
of course, it's going to start screaming and send all your monitoring
alarms off (you're monitoring that, right?), but have you thought to
make sure that your FS is remounted with barriers at the first sign of
battery trouble?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++

2011-09-12 Thread Aidan Van Dyk
On Mon, Sep 12, 2011 at 8:47 PM,  da...@lang.hm wrote:

 XFS FAQ  goes over much of it, starting at Q24:

  http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F

 So, for pure performance, on a battery-backed controller, nobarrier is
 the recommended *performance* setting.

 But, to throw a wrench into the plan, what happens when during normal
 battery tests, your raid controller decides the battery is failing...
 of course, it's going to start screaming and send all your monitoring
 alarms off (you're monitoring that, right?), but have you thought to
 make sure that your FS is remounted with barriers at the first sign of
 battery trouble?

 yep.

 on a good raid card with battery backed cache, the performance difference
 between barriers being on and barriers being off should be minimal. If it's
 not, I think that you have something else going on.

The performance boost you'll get is that you don't have the temporary
stall in parallelization that the barriers have.  With barriers, even
if the controller cache doesn't really flush, you still have the
can't send more writes to the device until the barrier'ed write is
done, so at all those points, you have only a single write command in
flight.  The performance penalty of barriers on good cards comes
because barriers are written to prevent the devices from reordering of
write persistence, and do that by waiting for a write to be
persistent before allowing more to be queued to the device.

With nobarrier, you operate under the assumption that the block device
writes are persisted in the order commands are issued to the devices,
so you never have to drain the queue, as you do in the normal
barrier implementation, and can (in theory) always have more request
that the raid card can be working on processing, reordering, and
dispatching to platters for the maximum theoretical throughput...

Of course, linux has completely re-written/changed the
sync/barrier/flush methods over the past few years, and there is no
guarantee they don't keep changing the implementation details in the
future, so keep up on the filesystem details of whatever you're
using...

So keep doing burn-ins, with real pull-the-cord tests... They can't
prove it's 100% safe, but they can quickly prove when it's not ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++

2011-08-18 Thread Aidan Van Dyk
On Thu, Aug 18, 2011 at 1:35 AM, Craig Ringer ring...@ringerc.id.au wrote:
 On 18/08/2011 11:48 AM, Ogden wrote:

 Isn't this very dangerous? I have the Dell PERC H700 card - I see that it
 has 512Mb Cache. Is this the same thing and good enough to switch to
 nobarrier? Just worried if a sudden power shut down, then data can be lost
 on this option.


 Yeah, I'm confused by that too. Shouldn't a write barrier flush data to
 persistent storage - in this case, the RAID card's battery backed cache? Why
 would it force a RAID controller cache flush to disk, too?

The barrier is the linux fs/block way of saying these writes need
to be on persistent media before I can depend on them.  On typical
spinning media disks, that means out of the disk cache (which is not
persistent) and on platters.  The way it assures that the writes are
on persistant media is with a flush cache type of command.  The
flush cache is a close approximation to make sure it's persistent.

If your cache is battery backed, it is now persistent, and there is no
need to flush cache, hence the nobarrier option if you believe your
cache is persistent.

Now, make sure that even though your raid cache is persistent, your
disks have cache in write-through mode, cause it would suck for your
raid cache to work, but believe the data is safely on disk and only
find out that it was in the disks (small) cache, and you're raid is
out of sync after an outage because of that...  I believe most raid
cards will handle that correctly for you automatically.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DBT-5 Postgres 9.0.3

2011-08-17 Thread Aidan Van Dyk
On Wed, Aug 17, 2011 at 4:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 It sounds to me like bobbyw might have two separate installations of
 postgres (or at least two copies of psql), one compiled with /tmp as the
 default socket location and one compiled with /var/run/postgresql as the
 default.  /tmp is the out-of-the-box default but I think Debian likes to
 build it with /var/run/postgresql as the default.

It looked like the actual DBT-5 harness is built with system
libraries (libpqxx, linked to system libpq, with debian's
/var/run/postgresql), but the scaffolding around it uses a local
postgres (server and psql) using the source default of /tmp?

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Aidan Van Dyk
On Fri, Oct 29, 2010 at 11:43 AM, Robert Haas robertmh...@gmail.com wrote:

 Well, we COULD keep the data in shared buffers, and then copy it into
 an mmap()'d region rather than calling write(), but I'm not sure
 there's any advantage to it.  Managing address space mappings is a
 pain in the butt.

I could see this being a *theoretical* benefit in the case that the
background writer gains the ability to write out all blocks associated
with a file in order.  In that case, you might get a win because you
could get a single mmap of the entire file, and just wholesale memcpy
blocks across, then sync/unmap it.

This, of course assumes a few things that must be for it to be per formant:
0) a list of blocks to be written grouped by files is readily available.
1) The pages you write to must be in the page cache, or your memcpy is
going to fault them in.  With a plain write, you don't need the
over-written page in the cache.
2) Now, instead of the torn-page problem being FS block/sector sized
base, you can now actually have a possibly arbitrary amount of the
block memory written when the kernel writes out the page.  you
*really* need full-page-writes.
3) The mmap overhead required for the kernel to setup the mappings is
less than the repeated syscalls of a simple write().

All those things seem like something that somebody could synthetically
benchmark to prove value before even trying to bolt into PostgreSQL.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-23 Thread Aidan Van Dyk
* da...@lang.hm da...@lang.hm [100223 15:05]:

 However, one thing that you do not get protection against with software  
 raid is the potential for the writes to hit some drives but not others. 
 If this happens the software raid cannot know what the correct contents 
 of the raid stripe are, and so you could loose everything in that stripe  
 (including contents of other files that are not being modified that  
 happened to be in the wrong place on the array)

That's for stripe-based raid.  Mirror sets like raid-1 should give you
either the old data, or the new data, both acceptable responses since
the fsync/barreir hasn't completed.

Or have I missed another subtle interaction?

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-11 Thread Aidan Van Dyk
* Alvaro Herrera alvhe...@commandprompt.com [100211 12:58]:
 Hmm, so maybe the performance benefit is not from it being on a separate
 array, but from it being RAID1 instead of RAID5?

Or the cumulative effects of:
1) Dedicated spindles/Raid1
2) More BBU cache available (I can't imagine the OS pair writing much)
3) not being queued behind data writes before getting to controller
3) Not waiting for BBU cache to be available (which is shared with all data
   writes) which requires RAID5 writes to complete...

Really, there's *lots* of variables here.  The basics being that WAL on
the same FS as data, on a RAID5, even with BBU is worse than WAL on a
dedicated set of RAID1 spindles with it's own BBU.

Wow!

;-)


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] ext4 finally doing the right thing

2010-01-21 Thread Aidan Van Dyk
* Greg Smith g...@2ndquadrant.com [100121 00:58]:
 Greg Stark wrote:

 That doesn't sound right. The kernel having 10% of memory dirty  
 doesn't mean there's a queue you have to jump at all. You don't get  
 into any queue until the kernel initiates write-out which will be  
 based on the usage counters -- basically a lru. fsync and cousins like  
 sync_file_range and posix_fadvise(DONT_NEED) in initiate write-out  
 right away.


 Most safe ways ext3 knows how to initiate a write-out on something that  
 must go (because it's gotten an fsync on data there) requires flushing  
 every outstanding write to that filesystem along with it.  So as soon as  
 a single WAL write shows up, bam!  The whole cache is emptied (or at  
 least everything associated with that filesystem), and the caller who  
 asked for that little write is stuck waiting for everything to clear  
 before their fsync returns success.

Sure, if your WAL is on the same FS as your data, you're going to get
hit, and *especially* on ext3...

But, I think that's one of the reasons people usually recommend putting
WAL separate.  Even if it's just another partition on the same (set of)
disk(s), you get the benefit of not having to wait for all the dirty
ext3 pages from your whole database FS to be flushed before the WAL write
can complete on it's own FS.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] ext4 finally doing the right thing

2010-01-21 Thread Aidan Van Dyk
* Greg Smith g...@2ndquadrant.com [100121 09:49]:
 Aidan Van Dyk wrote:
 Sure, if your WAL is on the same FS as your data, you're going to get
 hit, and *especially* on ext3...

 But, I think that's one of the reasons people usually recommend putting
 WAL separate.

 Separate disks can actually concentrate the problem.  The writes to the  
 data disk by checkpoints will also have fsync behind them eventually, so  
 splitting out the WAL means you just push the big write backlog to a  
 later point.  So less frequently performance dives, but sometimes  
 bigger.  All of the systems I was mentioning seeing 10 second pauses on  
 had a RAID-1 pair of WAL disks split from the main array.

That's right, so with the WAL split off on it's own disk, you don't wait
on WAL for your checkpoint/data syncs, but you can build up a huge
wait in the queue for main data (which can even block reads).

Having WAL on the main disk means that (for most ext3), you sometimes
have WAL writes taking longer, but the WAL fsyncs are keeping the
backlog down in the main data area too.

Now, with ext4 moving to full barrier/fsync support, we could get to the
point where WAL in the main data FS can mimic the state where WAL is
seperate, namely that WAL writes can jump the queue and be written
without waiting for the data pages to be flushed down to disk, but also
that you'll get the big backlog of data pages to flush when
the first fsyncs on big data files start coming from checkpoints...

a.
-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Aidan Van Dyk
* fka...@googlemail.com fka...@googlemail.com [100114 09:29]:
 
 This takes about 50s, so, 800MB/50s = 16MB/s.
 
 However the harddisk (sata) could write 43 MB/s in the worst
 case! Why is write performance limited to 16 MB/s?
 
 I altered the binary column to STORAGE EXTERNAL.
 
 Some experiments with postgresql.conf (fsync off,
 shared_buffers=1000MB, checkpoint_segments=256) did not
 change the 50s- much (somtimes 60s sometimes a little less).
 
 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
 
 
 Do you have any further idea why 16MB/s seems to be the
 limit here?

So, your SATA disk can do 43MB/s of sequential writes, but you're example
is doing:
1) Sequential writes to WAL
2) Random writes to your index
3) Sequential writes to table heap
4) Sequential writes to table' toast heap
5) Any other OS-based FS overhead

Now, writes #2,3 and 4 don't happen completely concurrently with your
WAL, some of them are still in postgres buffers, but easily enough to
interrupt the stream of WAL enough to certainly make it believable that
with everything going on on the disk, you can only write WAL at a
*sustained* 16 MB/s

If you're running a whole system on a single SATA which can stream
43MB/s, remember that for *every* other read/write sent do the disk, you
lose up to 1MB/s (12ms seek time, read/write, and back).  And in that
every other, you have FS metadata updates, any other file writes the
FS flushes, etc...  20 aditional blocks being that are either read or
written to disk are going to completely chop your 43MB/s rate...

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Best suiting OS - now off topic

2009-10-02 Thread Aidan Van Dyk
* Mark Mielke m...@mark.mielke.cc [091002 11:41]:

 ... until you move on and leave the company with some hacked up Debian  
 installs that nobody knows how to manage.

Could be worse, they could leave a Redhat/CentOS box that *can't* be
managed

emacs anyone?

/duck and run, promising not to post on this again

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-25 Thread Aidan Van Dyk
* Dan Sugalski d...@sidhe.org [090925 06:06]:

 I'll have to go check, but I think it does. This box hasn't actually hit 
 swap since it started --  a good chunk of that RAM is used as  
 semi-permanent disk cache but unfortunately the regular day-to-day use of 
 this box (they won't let me have it as a dedicated DB-only machine. Go 
 figure :) doing other stuff the cache tends to turn over pretty quickly.

All the more reason to find a way to use it all as shared buffers and
lock it into ram...

Oh, sorry, you expect the DB to play nice with everything else?

;-)

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-25 Thread Aidan Van Dyk
* Scott Carey sc...@richrelevance.com [090925 11:57]:
 That won't work well anyway because the postgres shared_buffers dos not cache 
 things that are sequentially scanned (it uses a ring buffer for each scan).  
 So, for any data that is only accessed by sequential scan, you're relying on 
 the OS and the disks.  If you access a table via index scan though, all its 
 pages will go through shared_buffers.

In older version too, or only since synchronized scans got in?

a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] session servers in ram

2009-09-22 Thread Aidan Van Dyk
* Scott Marlowe scott.marl...@gmail.com [090921 19:39]:
 I'm looking at running session servers in ram.  All the data is
 throw-away data, so my plan is to have a copy of the empty db on the
 hard drive ready to go, and have a script that just copies it into ram
 and starts the db there.  We're currently IO write bound with
 fsync=off using a 15k5 seagate SAS drive, so I'm hoping that moving
 the db into /dev/shm will help quite a bit here.
 
 Does anybody any real world experience here or any words of sage
 advice before I go off and start testing this?

*If* fsync=off is really meaning that there are no sync commands
happening on your pg partitions (and nothing else, like syslog, is
causing syncs on them), and you're kernel is tuned to allow the maximum
dirty buffers/life, then I'm not sure that's going to gain you
anything...  If your pg processes are blocked writing, with no syncs,
then they are blocked because the kernel has no more buffers available
for buffering the writes...

Moving your backing store from a disk-based FS to disk-based swap is only
going to shift the route of being forced to hit the disk...

Of course, details matter, and results trump theory, so test it ;-)

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-31 Thread Aidan Van Dyk
* Merlin Moncure mmonc...@gmail.com [090831 10:38]:
 
 I agree, that's good analysis.  The main point I was making was that
 if you have say a 10 disk raid 5, you don't involve 10 disks, only
 two...a very common misconception.  I made another mistake that you
 didn't catch: you need to read *both* the data drive and the parity
 drive before writing, not just the parity drive.
 
 I wonder if flash SSD are a better fit for raid 5 since the reads are
 much cheaper than writes and there is no rotational latency.  (also,
 $/gb is different, and so are the failure cases).

The other thing that scares me about raid-5 is the write-hole, and the
possible delayed inconsistency that brings...

Again, hopefully mitigated by a dependable controller w/ BBU...

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Aidan Van Dyk
* Joshua D. Drake j...@commandprompt.com [090512 19:27]:
 
 Apache solved this problem back when it was still called NSCA HTTPD. Why
 aren't we preforking again?

Of course, preforking and connection pooling are totally different
beast...

But, what really does preforking give us?  A 2 or 3% improvement?  The
forking isn't the expensive part, the per-database setup that happens is
the expensive setup...  All pre-forking would save us is a tiny part of
the initial setup, and in turn make our robust postmaster controller no
longer have control.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Aidan Van Dyk
* Dimitri dimitrik...@gmail.com [090511 11:18]:
 Folks, it's completely crazy, but here is what I found:
 
 - if HISTORY table is analyzed with target 1000 my SELECT response
 time is jumping to 3ms, and the max throughput is limited to 6.000 TPS
 (it's what happenned to 8.3.7)
 
 -if HISTORY table is analyzed with target 5 - my SELECT response time
 is decreasing to 1.2ms (!)  and then my max TPS level is ~12.000 !
 and CPU is used up to 95% even by 8.3.7 :-)  and 8.4 performed better
 just because I left its analyze target to default 100 value.
 
 Anyone may explain me why analyze target may have so huge negative
 secondary effect?..

It's actually pretty straight forward.

The PostgreSQL query planner is a smart planner.  It takes into
consideration all the statistics available on the columns/tables,
expected outputs based on inputs, etc, to choose what it thinks will be
the best plan.  The more data you have in statistics (the larger
statistics target you have), the more CPU time and longer it's going to
take to plan your queries.  The tradeoff is hopefully better plans.

But, in your scenario, where you are hitting the database with the
absolute worst possible way to use PostgreSQL, with small, repeated,
simple queries, you're not getting the advantage  of better plans.  In
your case, you're throwing absolutely simple queries at PG as fast as
you can, and for each query, PostgreSQL has to:

1) Parse the given query string
2) Given the statistics available, plan the query and pick the best one
3) Actually run the query.

Part 2 is going to dominate the CPU time in your tests, more so the more
statistics it has to evaluate, and unless the data has to come from the
disks (i.e. not in shared buffers or cache) is thus going to dominate the
time before you get your results.  More statistics means more time
needed to do the planning/picking of the query.

If you were to use prepared statements, the cost of #1 and #2 is done
once, and then every time you throw a new execution of the query to
PostgreSQL, you get to just do #3, the easy quick part, especially for
small simple queries where all the data is in shared buffers or the cache.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-10 Thread Aidan Van Dyk
* Mario Weilguni [EMAIL PROTECTED] [081210 07:31]:

 Why not? I know it's not performing as good as RAID-10, but it does not  
 waste 50% diskspace. RAID-6 is no option, because the performance is  
 even worse. And, on another system with RAID-5 + spare and SAS drives,  
 the same controller is working very well.

Like Scott said, it's all about trade-offs.

With raid5, you get abysmal write performance, make me not sleep at
night inconsistent parity issues, and a degraded mode that will a
nightmare  ...

... and as a trade-off you save a little money, and get good read only
performance ...

... as long as you don't ever have a disk or system crash ...

... or can afford to rebuild if you do ...

... etc ...

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Aidan Van Dyk
* Joshua D. Drake [EMAIL PROTECTED] [081209 11:01]:
 
 Yes the SmartArray series is quite common and actually know to perform
 reasonably well, in RAID 10. You still appear to be trying RAID 5.

*boggle* 

Are people *still* using raid5?

/me gives up!

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Aidan Van Dyk
* Peter Eisentraut [EMAIL PROTECTED] [081209 11:28]:

 What do you suggest when there is not enough room for a RAID 10?

More disks ;-)

But if you've given up on performance and reliability in favour of
cheaper storage, I guess raid5 is ok.  But then I'm not sure what the
point of asking about it's poor performance is...

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Create and drop temp table in 8.3.4

2008-11-11 Thread Aidan Van Dyk
Seems like this didn't make it through to the list the first time...

* Aidan Van Dyk [EMAIL PROTECTED] [081106 22:19]:
 * David Rees [EMAIL PROTECTED] [081106 21:22]:
  
  2. A write barrier instructs the lower level hardware that commands
  issued before the barrier must be written to disk before commands
  issued after the barrier. Write barriers are used to ensure that data
  written to disk is written in such a way as to maintain filesystem
  consistency, without losing all the benefits of a write cache.
  
  3. A fsync call forces data to be synced to the controller.
  
  This means that whenever you call fsync, at the very minimum, the data
  will have made it to the controller. How much further down the line
  will depend on whether or not the controller is in WriteBack or
  WriteThrough mode and whether or not the disk is also caching writes.
  
  So in your example, if the OS is caching some writes and fsync is
  called, it won't be returned until at a minimum the controller has
  accepted all the data, regardless of whether or not write barriers are
  enabled.
  
  In theory, it should be safe to disable write barriers if you have a
  BBU because the BBU should guarantee that all writes will eventually
  make it to disk (or at least reduce the risk of that not happening to
  an acceptable level).
 
 All that's correct, but note that fsync doesn't guarentee *coherent*
 filesystem state has been made to controller.  And fsync *can* carry later
 writes to the controller.
 
 I belive the particular case the prompted the write-barriers to become default
 was ext3 + journals, where in certain (rare) cases, upon recovery, things were
 out of sync.  What was happening was that ext3 was syncing the journal, but
 extra writes were getting carried to the controller during the sync
 operation, and if something crashed at the right time, new data was on the
 disk where the old journal (because the new journal hadn't finished making
 it to the controller) didn't expect it.
 
 The write barriers give the FS the symantics to say all previous queue
 writes [BARRIER] flush to controller [BARRIER] any new writes, and thus
 guarentee the ordering of certian operations to disk, and guarentee coherency
 of the FS at all times.
 
 Of course, that guarenteed FS consistency comes at a cost.  As to it's
 necessity with the way PG uses the FS w/ WAL  or it's necessity with
 xfs...
 
 a.
 
 -- 
 Aidan Van Dyk Create like a god,
 [EMAIL PROTECTED]   command like a king,
 http://www.highrise.ca/   work like a slave.



-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-05 Thread Aidan Van Dyk
* Heikki Linnakangas [EMAIL PROTECTED] [070305 09:46]:

 If that is the case, why would anyone use the vacuum full approach if they 
 could use the cluster command on a table/database that will regen these 
 files for you. It almost seems like the vacuum full approach would, or 
 could, be obsoleted by the cluster command, especially if the timings in 
 their respective runs are that different (in our case the vacuum full took 
 15 minutes in our worst case, and the cluster command took under 1 second 
 for the same table and scenario).
 
 In fact, getting rid of vacuum full, or changing it to work like 
 cluster, has been proposed in the past. The use case really is pretty 
 narrow; cluster is a lot faster if there's a lot of unused space in the 
 table, and if there's not, vacuum full isn't going to do much so there's 
 not much point running it in the first place. The reason it exists is 
 largely historical, there hasn't been a pressing reason to remove it either.

I've never used CLUSTER, because I've always heard murmerings of it not
being completely MVCC safe.  From the TODO:
* CLUSTER
o Make CLUSTER preserve recently-dead tuples per MVCC
  requirements
But the documents don't mention anything about cluster being unsafe.

AFAIK, Vacuum full doesn't suffer the same MVCC issues that cluster
does.  Is this correct?

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature