Re: [PERFORM] Possible to find disk IOs for a Query?

2016-08-31 Thread Ben Chobot
On Aug 31, 2016, at 3:01 PM, Bobby Mozumder  wrote:
> 
> Is it possible to find the number of disk IOs performed for a query?  EXPLAIN 
> ANALYZE looks like it shows number of sequential rows scanned, but not number 
> of IOs.  

Postgres knows the number of rows it will need to pull to do your query, but it 
has no way of knowing if a block not in its own cache can be satisfied via 
filesystem cache, or if it will fall through to disk read. If you are on linux, 
you might be able to tell the effectiveness of your filesystem cache via 
something like 
http://www.brendangregg.com/blog/2014-12-31/linux-page-cache-hit-ratio.html 


…but that's hardly going to show you something as granular as a per-query cost.

Re: [PERFORM] VACUUM VERBOSE ANALYZE taking long time to process.

2015-09-15 Thread Ben Chobot
On Sep 9, 2015, at 3:43 AM, anil7385  wrote:
> 
> Hi, 
> We have a table which consists of 3 millions of records and when we try to
> delete them and run VACUUM VERBOSE ANALYZE on it in production environment ,
> it takes 6/7 hours to process.

You make it sound like you are deleting all records. If that's true, why not 
TRUNCATE? It'll be a lot faster, and also leave you with no need to vacuum.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Segment best size

2013-04-12 Thread Ben Chobot
On Apr 12, 2013, at 9:45 AM, Rodrigo Barboza wrote:

 Hi guys.
 I compiled my postrges server (9.1.4) with default segment size (16MB).
 Should it be enough? Should I increase this size in compilation?

Unlike some default values in the configuration file, the compiled-in defaults 
work well for most people.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Ben Chobot
On Jan 29, 2013, at 6:24 AM, Alex Vinnik wrote:

 random_page_cost=1 might be not what you really want. 
 it would mean that random reads are as fast as as sequential reads, which 
 probably is true only for SSD
 What randon_page_cost would be more appropriate for EC2 EBS Provisioned 
 volume that can handle 2,000 IOPS? 

For EC2 Provisioned IOPS volumes - not standard EBS - random_page_cost=1 is 
exactly what you want.



[PERFORM] shared_buffers on ubuntu precise

2012-11-30 Thread Ben Chobot
On Nov 30, 2012, at 8:06 AM, Shaun Thomas wrote:

 I say that because you mentioned you're using Ubuntu 12.04, and we were
 having some problems with PG on that platform. With shared_buffers over
 4GB, it starts doing really weird things to the memory subsystem.
 Whatever it does causes the kernel to purge cache rather aggressively.
 We saw a 60% reduction in read IO by reducing shared_buffers to 4GB.
 Without as many reads, your writes should be much less disruptive.

Hm, this sounds like something we should look into. Before we start digging do 
you have more to share, or did you leave it with the huh, that's weird; this 
seems to fix it solution?

[PERFORM] index usage for min() vs. order by asc limit 1

2011-11-17 Thread Ben Chobot
I have two queries in PG 9.1. One uses an index like I would like, the other 
does not. Is this expected behavior? If so, is there any way around it? 


postgres=# explain analyze select min(id) from delayed_jobs where 
strand='sis_batch:account:15' group by strand;
QUERY PLAN
--
 GroupAggregate  (cost=0.00..8918.59 rows=66 width=29) (actual 
time=226.759..226.760 rows=1 loops=1)
   -  Seq Scan on delayed_jobs  (cost=0.00..8553.30 rows=72927 width=29) 
(actual time=0.014..169.941 rows=72268 loops=1)
 Filter: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 226.817 ms
(4 rows)

postgres=# explain analyze select id from delayed_jobs where 
strand='sis_batch:account:15' order by id limit 1;
   QUERY 
PLAN
-
 Limit  (cost=0.00..0.33 rows=1 width=8) (actual time=0.097..0.098 rows=1 
loops=1)
   -  Index Scan using index_delayed_jobs_on_strand on delayed_jobs  
(cost=0.00..24181.74 rows=72927 width=8) (actual time=0.095..0.095 rows=1 
loops=1)
 Index Cond: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 0.129 ms
(4 rows)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage for min() vs. order by asc limit 1

2011-11-17 Thread Ben Chobot
On Nov 17, 2011, at 5:20 PM, Steve Atkins wrote:
 
 I don't think you want the group by in that first query.

Heh, I tried to simply the example, but in reality that = becomes an in clause 
of multiple values. So the group by is needed.


 
 
 postgres=# explain analyze select min(id) from delayed_jobs where 
 strand='sis_batch:account:15' group by strand;
   QUERY PLAN
 --
 GroupAggregate  (cost=0.00..8918.59 rows=66 width=29) (actual 
 time=226.759..226.760 rows=1 loops=1)
  -  Seq Scan on delayed_jobs  (cost=0.00..8553.30 rows=72927 width=29) 
 (actual time=0.014..169.941 rows=72268 loops=1)
Filter: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 226.817 ms
 (4 rows)
 
 postgres=# explain analyze select id from delayed_jobs where 
 strand='sis_batch:account:15' order by id limit 1;
  QUERY 
 PLAN
 -
 Limit  (cost=0.00..0.33 rows=1 width=8) (actual time=0.097..0.098 rows=1 
 loops=1)
  -  Index Scan using index_delayed_jobs_on_strand on delayed_jobs  
 (cost=0.00..24181.74 rows=72927 width=8) (actual time=0.095..0.095 rows=1 
 loops=1)
Index Cond: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 0.129 ms
 (4 rows)
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Large number of short lived connections - could a connection pool help?

2011-11-14 Thread Ben Chobot
On Nov 14, 2011, at 4:42 PM, Cody Caughlan wrote:

 We have anywhere from 60-80 background worker processes connecting to
 Postgres, performing a short task and then disconnecting. The lifetime
 of these tasks averages 1-3 seconds.

[snip]

 Is this something that I should look into or is it not much of an
 issue? Whats the best way to determine if I could benefit from using a
 connection pool?

Yes, this is precisely a kind of situation a connection pooler will help with. 
Not only with the the connection set up/tear down overhead, but also by using 
resources on your server better you probably don't actually have 60-80 
cores on your server, so reducing that number down to just a few that are 
actually working will the Postgres finish them faster to work on others. 
Basically, the queueing happens off the postgres server, letting postgres use 
the box with less interruptions. 

Now, is it a problem to not use a pooler? That depends on if it's causing you 
grief or not. But if you think you'll get more connection churn or larger 
numbers of workers, then a connection pooler will only help more.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How can i get record by data block not by sql?

2011-10-03 Thread Ben Chobot
On Oct 3, 2011, at 6:52 AM, 姜头 wrote:

 How can i get record by data block not by sql?
  
 I want to read and write lots of data by data blocks and write record to a 
 appointed data block and read it.
 so i can form a disk-resident tree by recording the block address. But i 
 don't know  how to implement in postgresql.
 Is there system function can do this? 
 Can someone help me?? Thank you very very much1

It sounds like you should look into the COPY command, or, if you're 
adventurous, the pg_bulkload project. They might get you the speed you're 
after, if not quite the implementation. But if what you're really after is to 
manipulate the table files directly - and I'm not sure why that would be a goal 
in itself - then perhaps SQL isn't for you.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] array_except -- Find elements that are not common to both arrays

2011-09-30 Thread Ben Chobot

On Sep 30, 2011, at 12:07 PM, bricklen wrote:

 I've been informed that this type of operation is called symmetric
 difference[1], and can be represented by A ∆ B.  A couple of
 alternative names were proposed, array_symmetric_difference and
 array_xor.
 Does anyone have a preference for the name? I assume that this
 function might potentially be used by others now that it is in the pg
 lists, so might as well give it an appropriate name now.
 Is this something that could be written in C to make it faster (I don't know 
 C)

FWIW, speaking as somebody who has no need of this function, array_xor is a 
pretty clear name that indicates what's going to happen.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres constraint triggers

2011-09-29 Thread Ben Chobot
On Sep 27, 2011, at 6:37 PM, Craig Ringer wrote:

 On 09/27/2011 12:54 PM, Ben Chobot wrote:
 
 My memory is fuzzy but as I recall, a possible downside to using
 deferred constraints was increased memory usage
 
 That's right. PostgreSQL doesn't currently support spilling of pending 
 constraint information to disk; it has to keep it in RAM, and with 
 sufficiently huge deferred updates/inserts/deletes it's possible for the 
 backend to run out of RAM to use.
 
 though I cannot see how at the moment.
 
 A list of which triggers to run, and on which tuples, must be maintained 
 until those triggers are fired. That list has to be kept somewhere.

Well when you put it like that, it's so obvious. :)
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres constraint triggers

2011-09-26 Thread Ben Chobot
On Sep 26, 2011, at 10:52 AM, Maria L. Wilson wrote:

 Our first try to solve this problem has been to convert these triggers into a 
 constraint trigger which allows for DEFERRABLE INITIALLY DEFERRED flags.  
 This, we are finding, is forcing the trigger function to run after the 
 triggering transaction is completed.  We believe this will fix our locking 
 problem and hopefully speed up our inserts again.
 
 Any comments or past experiences would certainly be helpful!

My memory is fuzzy but as I recall, a possible downside to using deferred 
constraints was increased memory usage, though I cannot see how at the moment. 
Regardless, I think the upshot is that they aren't without their cost but as 
long as you aren't doing massive transactions that cost is probably one that 
you can afford to pay without much worry. 

Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Ben Chobot
On May 9, 2011, at 1:32 PM, Chris Hoover wrote:

 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10)

Be careful here. What if the entire card hiccups, instead of just a device on 
it? (We've had that happen to us before.) Depending on how you've done your 
raid 10, either all your parity is gone or your data is.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Ben Chobot
On Feb 3, 2011, at 1:50 PM, Mladen Gogala wrote:

 So, I will have to go back on my decision to use Postgres and re-consider 
 MySQL? I will rather throw away the effort invested in studying Postgres than 
 to risk an unfixable application downtime.  I am not sure about the world 
 domination thing, though. Optimizer hints are a big feature that everybody 
 else has and Postgres does not have because of religious reasons.

As always, you should use the tool you consider best for the job. If you think 
MySQL as both a product and a community has a better chance of giving you what 
you want, then you should use MySQL.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question: BlockSize 8192 with FusionIO

2011-01-04 Thread Ben Chobot

On Jan 4, 2011, at 8:48 AM, Merlin Moncure wrote:

 
 most flash drives, especially mlc flash, use huge blocks anyways on
 physical level.  the numbers claimed here
 (http://www.fusionio.com/products/iodrive/)  (141k write iops) are
 simply not believable without write buffering.  i didn't see any note
 of how fault tolerance is maintained through the buffer (anyone
 know?).

FusionIO buffers. They have capacitors onboard to protect against crashing and 
power failure. They passed our crash attempts to corrupt writes to them before 
we put them into production, for whatever that's worth, but they do take a long 
time to come back online after an unclean shutdown.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] concurrent IO in postgres?

2010-12-23 Thread Ben Chobot
On Dec 23, 2010, at 11:58 AM, Andy wrote:

 
 Somewhat tangential to the current topics, I've heard that FusionIO uses 
 internal cache and hence is not crash-safe, and if the cache is turned off 
 performance will take a big hit. Is that your experience?

It does use an internal cache, but it also has onboard battery power. The 
driver needs to put its house in order when restarting after an unclean 
shutdown, however, and that can take up to 30 minutes per card.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Ben Chobot
On Nov 12, 2010, at 8:14 AM, Kyriacos Kyriacou wrote:

 We are still using PostgreSQL 8.2.4. We are running a 24x7 system and 
 database size is over 200Gb so upgrade is not an easy decision!

This is why we have slony, so you can slowly upgrade your 200Gb while you're 
live and then only suffer a minute or so of downtime while you switchover. Even 
if you only install slony for the point of the upgrade and then uninstall it 
after you're done, that seems well worth it to me rather than running on 8.2.4 
for a while.

Note there were some changes between 8.2 and 8.3 in regards to casting that 
might make you revisit your application.

Re: [PERFORM] BBU Cache vs. spindles

2010-10-08 Thread Ben Chobot
On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote:

 I'm weighing options for a new server. In addition to PostgreSQL, this 
 machine will handle some modest Samba and Rsync load.
 
 I will have enough RAM so the virtually all disk-read activity will be 
 cached. The average PostgreSQL read activity will be modest - a mix of 
 single-record and fairly large (reporting) result-sets. Writes will be modest 
 as well but will come in brief (1-5 second) bursts of individual inserts. The 
 rate of insert requests will hit 100-200/second for those brief bursts.
 
 So...
 
 Am I likely to be better off putting $$$ toward battery-backup on the RAID or 
 toward adding a second RAID-set and splitting off the WAL traffic? Or 
 something else?

A BBU is, what, $100 or so? Adding one seems a no-brainer to me. Dedicated WAL 
spindles are nice and all, but they're still spinning media. Raid card cache is 
wy faster, and while it's best at bursty writes, it sounds like bursty 
writes are precisely what you have.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-24 Thread Ben Chobot
On Jul 24, 2010, at 12:20 AM, Yeb Havinga wrote:

 The problem in this scenario is that even when the SSD would show not data 
 loss and the rotating disk would for a few times, a dozen tests without 
 failure isn't actually proof that the drive can write it's complete buffer to 
 disk after power failure.

Yes, this is always going to be the case with testing like this - you'll never 
be able to prove that it will always be safe. 
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-16 Thread Ben Chobot
On Jul 15, 2010, at 2:40 PM, Ryan Wexler wrote:

 On Thu, Jul 15, 2010 at 12:35 PM, Ben Chobot be...@silentmedia.com wrote:
 On Jul 15, 2010, at 9:30 AM, Scott Carey wrote:
 
  Many raid controllers are smart enough to always turn off write caching on 
  the drives, and also disable the feature on their own buffer without a 
  BBU. Add a BBU, and the cache on the controller starts getting used, but 
  *not* the cache on the drives.
 
  This does not make sense.
  Write caching on all hard drives in the last decade are safe because they 
  support a write cache flush command properly.  If the card is smart it 
  would issue the drive's write cache flush command to fulfill an fsync() or 
  barrier request with no BBU.
 
 You're missing the point. If the power dies suddenly, there's no time to 
 flush any cache anywhere. That's the entire point of the BBU - it keeps the 
 RAM powered up on the raid card. It doesn't keep the disks spinning long 
 enough to flush caches.
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 So you are saying write caching is a dangerous proposition on a raid card 
 with or without BBU?


Er, no, sorry, I am not being very clear it seems. 


Using a cache for write caching is dangerous, unless you protect it with a 
battery. Caches on a raid card can be protected by a BBU, so, when you use a 
BBU, write caching on the raid card is safe. (Just don't read the firmware 
changelog for your raid card or you will always be paranoid.) If you don't have 
a BBU, many raid cards default to disabling caching. You can still enable it, 
but the card will often tell you it's a bad idea.

There are also caches on all your disk drives. Write caching there is always 
dangerous, which is why almost all raid cards always disable the hard drive 
write caching, with or without a BBU. I'm not even sure how many raid cards let 
you enable the write cache on a drive... hopefully, not many.

Re: [PERFORM] performance on new linux box

2010-07-16 Thread Ben Chobot
On Jul 15, 2010, at 8:16 PM, Scott Carey wrote:

 On Jul 15, 2010, at 12:35 PM, Ben Chobot wrote:
 
 On Jul 15, 2010, at 9:30 AM, Scott Carey wrote:
 
 Many raid controllers are smart enough to always turn off write caching on 
 the drives, and also disable the feature on their own buffer without a 
 BBU. Add a BBU, and the cache on the controller starts getting used, but 
 *not* the cache on the drives.
 
 This does not make sense.
 Write caching on all hard drives in the last decade are safe because they 
 support a write cache flush command properly.  If the card is smart it 
 would issue the drive's write cache flush command to fulfill an fsync() or 
 barrier request with no BBU.
 
 You're missing the point. If the power dies suddenly, there's no time to 
 flush any cache anywhere. That's the entire point of the BBU - it keeps the 
 RAM powered up on the raid card. It doesn't keep the disks spinning long 
 enough to flush caches.
 
 If the power dies suddenly, then the data that is in the OS RAM will also be 
 lost.  What about that? 
 
 Well it doesn't matter because the DB is only relying on data being persisted 
 to disk that it thinks has been persisted to disk via fsync().

Right, we agree that only what has been fsync()'d has a chance to be safe

 The data in the disk cache is the same thing as RAM.  As long as fsync() 
 works _properly_ which is true for any file system + disk combination with a 
 damn (not HFS+ on OSX, not FAT, not a few other things), then it will tell 
 the drive to flush its cache _before_ fsync() returns.  There is NO REASON 
 for a raid card to turn off a drive cache unless it does not trust the drive 
 cache.  In write-through mode, it should not return to the OS with a fsync, 
 direct write, or other the OS thinks this data is persisted now call until 
 it has flushed the disk cache.  That does not mean it has to turn off the 
 disk cache.

...and here you are also right in that a write-through write cache is safe, 
with or without a battery. A write-through cache is a win for things that don't 
often fsync, but my understanding is that with a database, you end up fsyncing 
all the time, which makes a write-through cache not worth very much. The only 
good way to get good *database* performance out of spinning media is with a 
write-back cache, and the only way to make that safe is to hook up a BBU.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-15 Thread Ben Chobot
On Jul 14, 2010, at 6:57 PM, Scott Carey wrote:

 But none of this explains why a 4-disk raid 10 is slower than a 1 disk 
 system.  If there is no write-back caching on the RAID, it should still be 
 similar to the one disk setup.

Many raid controllers are smart enough to always turn off write caching on the 
drives, and also disable the feature on their own buffer without a BBU. Add a 
BBU, and the cache on the controller starts getting used, but *not* the cache 
on the drives.

Take away the controller, and most OS's by default enable the write cache on 
the drive. You can turn it off if you want, but if you know how to do that, 
then you're probably also the same kind of person that would have purchased a 
raid card with a BBU.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-15 Thread Ben Chobot

On Jul 15, 2010, at 12:40 PM, Ryan Wexler wrote:

 On Wed, Jul 14, 2010 at 7:50 PM, Ben Chobot be...@silentmedia.com wrote:
 On Jul 14, 2010, at 6:57 PM, Scott Carey wrote:
 
  But none of this explains why a 4-disk raid 10 is slower than a 1 disk 
  system.  If there is no write-back caching on the RAID, it should still be 
  similar to the one disk setup.
 
 Many raid controllers are smart enough to always turn off write caching on 
 the drives, and also disable the feature on their own buffer without a BBU. 
 Add a BBU, and the cache on the controller starts getting used, but *not* the 
 cache on the drives.
 
 Take away the controller, and most OS's by default enable the write cache on 
 the drive. You can turn it off if you want, but if you know how to do that, 
 then you're probably also the same kind of person that would have purchased a 
 raid card with a BBU.
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 Ben I don't quite follow your message.   Could you spell it out a little 
 clearer for me?
 thanks
 -ryan


Most (all?) hard drives have cache built into them. Many raid cards have cache 
built into them. When the power dies, all the data in any cache is lost, which 
is why it's dangerous to use it for write caching. For that reason, you can 
attach a BBU to a raid card which keeps the cache alive until the power is 
restored (hopefully). But no hard drive I am aware of lets you attach a 
battery, so using a hard drive's cache for write caching will always be 
dangerous.

That's why many raid cards will always disable write caching on the hard drives 
themselves, and only enable write caching using their own memory when a BBU is 
installed. 

Does that make more sense?



Re: [PERFORM] performance on new linux box

2010-07-15 Thread Ben Chobot
On Jul 15, 2010, at 9:30 AM, Scott Carey wrote:

 Many raid controllers are smart enough to always turn off write caching on 
 the drives, and also disable the feature on their own buffer without a BBU. 
 Add a BBU, and the cache on the controller starts getting used, but *not* 
 the cache on the drives.
 
 This does not make sense.
 Write caching on all hard drives in the last decade are safe because they 
 support a write cache flush command properly.  If the card is smart it 
 would issue the drive's write cache flush command to fulfill an fsync() or 
 barrier request with no BBU.

You're missing the point. If the power dies suddenly, there's no time to flush 
any cache anywhere. That's the entire point of the BBU - it keeps the RAM 
powered up on the raid card. It doesn't keep the disks spinning long enough to 
flush caches.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-08 Thread Ben Chobot
On Jul 8, 2010, at 12:37 PM, Ryan Wexler wrote:

 One thing I don't understand is why BBU will result in a huge performance 
 gain.  I thought BBU was all about power failures?

When you have a working BBU, the raid card can safely do write caching. Without 
it, many raid cards are good about turning off write caching on the disks and 
refusing to do it themselves. (Safety over performance.)
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Architecting a database

2010-06-30 Thread Ben Chobot
On Jun 30, 2010, at 11:12 AM, t...@exquisiteimages.com wrote:

  I read a post
 earlier today that mentioned in passing that it was better to have a
 faster processor than more cores.

This really depends on your workload and how much you value latency vs. 
throughput. If you tend to have a lot of very simple queries, more cores = 
more throughput, and it may not matter much if your queries take 20ms or 30ms 
if you can be doing a dozen or two more of them concurrently in an AMD system 
than in an Intel one. On the other hand, if you have less clients, or more 
latency-sensitive clients, then fewer-but-faster cores is usually a win.

Either way, the amount of power you can get for your money is pretty impressive.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[SPAM] Re: [PERFORM] shared_buffers advice

2010-05-24 Thread Ben Chobot
On May 24, 2010, at 4:25 AM, Konrad Garus wrote:

 Do shared_buffers duplicate contents of OS page cache? If so, how do I
 know if 25% RAM is the right value for me? Actually it would not seem
 to be true - the less redundancy the better.

You can look into the pg_buffercache contrib module. 

 Another question - is there a tool or built-in statistic that tells
 when/how often/how much a table is read from disk? I mean physical
 read, not poll from OS cache to shared_buffers.

Well, the pg_stat_* tables tell you how much logical IO is going on, but 
postgres has no way of knowing how effective the OS or disk controller caches 
are.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] function performs differently with different values

2010-04-10 Thread Ben Chobot
I've got a sql language function which does a fairly simple select from a 
table. If I give it one value, it performs quickly (half a ms). If I give it 
another value, it does not (1.1 seconds). When I run the equivalent select 
outside of the function, both values perform roughly the same (even though one 
value returns 140k more rows, as expected). 

My understanding is that this generally happens because the plan should be 
different for the different values, but the first time the function is run it 
caches the plan for one of the values and will never use the appropriate plan 
for the second value. However, when I do an explain analyze of the raw sql for 
both values, I get the same plan. So my understanding must be wrong?

I suppose the other possibility is that the slower value is slower in a 
function because it's returning 140k more rows and the function has to deal 
with that additional data. but that seems far-fetched, given that each row 
is just an int.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-17 Thread Ben Chobot
On Mar 17, 2010, at 7:41 AM, Brad Nicholson wrote:

 As an aside, some folks in our Systems Engineering department here did
 do some testing of FusionIO, and they found that the helper daemons were
 inefficient and placed a fair amount of load on the server.  That might
 be something to watch of for for those that are testing them.

As another anecdote, we have 4 of the 160GB cards in a 24-core Istanbul server. 
I don't know how efficient the helper daemons are, but they do take up about 
half of one core's cycles, regardless of how busy the box actually is. So that 
sounds bad until you take into account how much that one core costs, and 
compare it to how much it would cost to have the same amount of IOPs in a 
different form. 
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres DB maintainenance - vacuum and reindex

2010-03-16 Thread Ben Chobot
Autovacuum is your friend for minimal downtime. It is configurable to let you 
adjust how invasive it will be, and you can have different settings per table 
if you wish.

As for the reindex, why do you think you will be reindexing regularly?

On Mar 15, 2010, at 10:30 PM, Meena_Ramkumar wrote:

 
 How to run vacuumdb and reindex for Postgres DB in a non-stop server? Will it
 be made without shutting the server? If so, then what will be performance
 degradation percentage?
 -- 
 View this message in context: 
 http://old.nabble.com/Postgres-DB-maintainenance---vacuum-and-reindex-tp27913694p27913694.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-16 Thread Ben Chobot
On Mar 16, 2010, at 6:04 PM, Rob Wultsch wrote:

 Lets say I have a large table bigTable to which I would like to add
 two btree indexes. Is there a more efficient way to create indexes
 than:
 CREATE INDEX idx_foo on bigTable (foo);
 CREATE INDEX idx_baz on bigTable (baz);
 Or
 CREATE INDEX CONCURRENTLY idx_foo on bigTable (foo);
 CREATE INDEX CONCURRENTLY idx_baz on bigTable (baz);
 
 Are there any particular performance optimizations that would be in
 play in such a scenario?
 
 At a minimum I assume that if both of the commands were started at
 about the same time they would each scan the table in the same
 direction and whichever creation was slower would benefit from most of
 the table data it needed being prepopulated in shared buffers. Is this
 the case?

That sounds reasonable to me. You might also look at upping your 
maintenance_work_mem for your session, as well.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-10 Thread Ben Chobot
On Mar 10, 2010, at 6:22 PM, Paul McGarry wrote:

 Hi there,
 
 I'm after a little bit of advice on the shared_buffers setting (I have
 read the various docs on/linked from the performance tuning wiki page,
 some very helpful stuff there so thanks to those people).
 
 I am setting up a 64bit Linux server running Postgresql 8.3, the
 server has 64gigs of memory and Postgres is the only major application
 running on it. (This server is to go alongside some existing 8.3
 servers, we will look at 8.4/9 migration later)
 
 I'm basically wondering how the postgresql cache (ie shared_buffers)
 and the OS page_cache interact. The general advice seems to be to
 assign 1/4 of RAM to shared buffers.
 
 I don't have a good knowledge of the internals but I'm wondering if
 this will effectively mean that roughly the same amount of RAM being
 used for the OS page cache will be used for redundantly caching
 something the Postgres is caching as well?
 
 IE when Postgres reads something from disk it will go into both the OS
 page cache and the Postgresql shared_buffers and the OS page cache
 copy is unlikely to be useful for anything.
 
 If that is the case what are the downsides to having less overlap
 between the caches, IE heavily favouring one or the other, such as
 allocating shared_buffers to a much larger percentage (such as 90-95%
 of expected 'free' memory).

Cache isn't all you have to worry about. There's also work_mem and the number 
of concurrent queries that you expect, and those may end up leaving you less 
than 25% of ram for shared_buffers - though probably not in your case. Also, 
I've read that 10GB is the upper end of where shared_buffers becomes useful, 
though I'm not entirely sure why. I think that rule of thumb has its roots in 
some heuristics around the double buffering effects you're asking about.

I *can* say a 10GB shared_buffer value is working well with my 128GB of 
RAM. whether or not it's optimal, I couldn't say without a lot of 
experimentation I can't afford to do right now. You might have a look at the 
pg_buffercache contrib module. It can tell you how utilized your shared buffers 
are.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-08 Thread Ben Chobot
We've enjoyed our FusionIO drives very much. They can do 100k iops without 
breaking a sweat. Just make sure you shut them down cleanly - it can up to 30 
minutes per card to recover from a crash/plug pull test. 

I also have serious questions about their longevity and failure mode when the 
flash finally burns out. Our hardware guys claim they have overbuilt the amount 
of flash on the card to be able to do their heavy writes for 5 years, but I 
remain skeptical. 

On Mar 8, 2010, at 6:41 AM, Devrim GÜNDÜZ wrote:

 Hi,
 
 I have a FusionIO drive to test for a few days. I already ran iozone and
 bonnie++ against it. Does anyone have more suggestions for it?
 
 It is a single drive (unfortunately).
 
 Regards,
 -- 
 Devrim GÜNDÜZ
 PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
 PostgreSQL RPM Repository: http://yum.pgrpms.org
 Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
 http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing FusionIO

2010-03-08 Thread Ben Chobot
On Mar 8, 2010, at 12:50 PM, Greg Smith wrote:

 Ben Chobot wrote:
 We've enjoyed our FusionIO drives very much. They can do 100k iops without 
 breaking a sweat. Just make sure you shut them down cleanly - it can up to 
 30 minutes per card to recover from a crash/plug pull test.   
 
 Yeah...I got into an argument with Kenny Gorman over my concerns with how 
 they were handling durability issues on his blog, the reading I did about 
 them never left me satisfied Fusion was being completely straight with 
 everyone about this area:  http://www.kennygorman.com/wordpress/?p=398
 
 If it takes 30 minutes to recover, but it does recover, I guess that's better 
 than I feared was the case with them.  Thanks for reporting the plug pull 
 tests--I don't trust any report from anyone about new storage hardware that 
 doesn't include that little detail as part of the testing.  You're just 
 asking to have your data get lost without that basic due diligence, and I'm 
 sure not going to even buy eval hardware from a vendor that appears evasive 
 about it.  There's a reason I don't personally own any SSD hardware yet.

Of course, the plug pull test can never be conclusive, but we never lost any 
data the handful of times we did it. Normally we'd do it more, but with such a 
long reboot cycle

But from everything we can tell, FusionIO does do reliability right.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to troubleshoot high mem usage by postgres?

2010-02-27 Thread Ben Chobot
On Feb 27, 2010, at 2:29 PM, Chris wrote:

 Hi, I'm having an issue where a postgres process is taking too much
 memory when performing many consecutive inserts and updates from a PHP

[snip]

In your postgresql.conf file, what are the settings for work_mem and 
shared_buffers?




Re: [PERFORM] bgwriter, checkpoints, curious (seeing delays)

2010-02-26 Thread Ben Chobot
On Feb 26, 2010, at 11:23 AM, Tory M Blue wrote:

 On Fri, Feb 26, 2010 at 5:09 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Tory M Blue  wrote:
 
 2010-02-25 22:53:13 PST LOG: checkpoint starting: time
 2010-02-25 22:53:17 PST postgres postgres [local] LOG: unexpected
 EOF on client connection
 2010-02-25 22:55:43 PST LOG: checkpoint complete: wrote 34155
 buffers (17.8%); 0 transaction log file(s) added, 0 removed, 15
 recycled; write=150.045 s, sync=0.000 s, total=150.046 s
 
 Did that unexpected EOF correspond to a connection attempt that gave
 up based on time?
 
 -Kevin
 
 Kevin
 
 Good question, I'm unclear what that was. I mean it's a LOG, so not a
 client connection, that really kind of confused me. I don't normally
 see EOF of client and an EOF on client from local, that's really
 really weird

We see that from our monitoring software testing port 5432.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] moving pg_xlog -- yeah, it's worth it!

2010-02-23 Thread Ben Chobot
On Feb 23, 2010, at 2:23 PM, Kevin Grittner wrote:

 
 Here are the values from our two largest and busiest systems (where
 we found the pg_xlog placement to matter so much).  It looks to me
 like a more aggressive bgwriter would help, yes?
 
 cir= select * from pg_stat_bgwriter ;
 -[ RECORD 1 ]--+
 checkpoints_timed  | 125996
 checkpoints_req| 16932
 buffers_checkpoint | 342972024
 buffers_clean  | 343634920
 maxwritten_clean   | 9928
 buffers_backend| 575589056
 buffers_alloc  | 52397855471
 
 cir= select * from pg_stat_bgwriter ;
 -[ RECORD 1 ]--+
 checkpoints_timed  | 125992
 checkpoints_req| 16840
 buffers_checkpoint | 260358442
 buffers_clean  | 474768152
 maxwritten_clean   | 9778
 buffers_backend| 565837397
 buffers_alloc  | 71463873477
 
 Current settings:
 
 bgwriter_delay = '200ms'
 bgwriter_lru_maxpages = 1000
 bgwriter_lru_multiplier = 4
 
 Any suggestions on how far to push it?

I don't know how far to push it, but you could start by reducing the delay time 
and observe how that affects performance.

Re: [PERFORM] disk space usage unexpected

2010-02-17 Thread Ben Chobot
On Feb 15, 2010, at 1:25 PM, Rose Zhou wrote:

 Thanks Ben:
  
 I will adjust the auto vacuum parameters. It is on now, maybe not frequently 
 enough.
 How to get the disk space back to OS? Will a Vacuum Full Verbose get the disk 
 space back to OS?
  
  

Yes, but it might bloat your indexes. Do you actually need to get your disk 
space back? If you did, would the database just eat it up again after more 
activity?

[PERFORM] another 8.1-8.4 regression

2010-02-16 Thread Ben Chobot
I'm having problems with another one of my queries after moving from 8.1.19 to 
8.4.2. On 8.1.19, the plan looked like this:

http://wood.silentmedia.com/bench/8119

That runs pretty well. On 8.4.2, the same query looks like this:

http://wood.silentmedia.com/bench/842_bad

If I turn off mergejoin and hashjoin, I can get 8.4.2 to spit out this:

http://wood.silentmedia.com/bench/842_better

...which it thinks is going to suck but which does not. 

The query and relevant table definitions are here:

http://wood.silentmedia.com/bench/query_and_definitions


Any suggestions? I'm guessing the problem is with the absurd over-estimation on 
the nested loop under the sort node, but I'm not sure why it's so bad. 

Re: [PERFORM] another 8.1-8.4 regression

2010-02-16 Thread Ben Chobot
On Feb 16, 2010, at 1:29 PM, Ben Chobot wrote:

 I'm having problems with another one of my queries after moving from 8.1.19 
 to 8.4.2. On 8.1.19, the plan looked like this:
 
 http://wood.silentmedia.com/bench/8119
 
 That runs pretty well. On 8.4.2, the same query looks like this:
 
 http://wood.silentmedia.com/bench/842_bad
 
 If I turn off mergejoin and hashjoin, I can get 8.4.2 to spit out this:
 
 http://wood.silentmedia.com/bench/842_better
 
 ...which it thinks is going to suck but which does not. 
 
 The query and relevant table definitions are here:
 
 http://wood.silentmedia.com/bench/query_and_definitions
 
 
 Any suggestions? I'm guessing the problem is with the absurd over-estimation 
 on the nested loop under the sort node, but I'm not sure why it's so bad. 


After looking at this some more, I'm pretty confused at both of 8.4.2's plans. 
They both have a Nested Loop node in them where the expected row count is a bit 
over 2 million, and yet the inner nodes have expected row counts of 1 and 152. 
I was under the impression that a nested loop between R and S would return no 
more than R*S?

Re: [PERFORM] 8.1 - 8.4 regression

2010-02-15 Thread Ben Chobot
On Feb 15, 2010, at 7:59 AM, Kevin Grittner wrote:

 Could you show the query, along with table definitions (including
 indexes)?

Oh, yeah, I suppose that would help. :)

http://wood.silentmedia.com/bench/query_and_definitions

(I'd paste them here for posterity but I speculate the reason my first few 
attempts to ask this question never went through were because of the size of 
the email.)
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.1 - 8.4 regression

2010-02-15 Thread Ben Chobot
Awesome, that did the trick. Thanks Tom! So I understand better, why is my case 
not the normal, better case?

(I assume the long-term fix is post-9.0, right?)

On Feb 15, 2010, at 9:26 AM, Tom Lane wrote:

 Ben Chobot be...@silentmedia.com writes:
 On Feb 15, 2010, at 7:59 AM, Kevin Grittner wrote:
 Could you show the query, along with table definitions (including
 indexes)?
 
 Oh, yeah, I suppose that would help. :)
 
 http://wood.silentmedia.com/bench/query_and_definitions
 
 It looks like the problem is that the EXISTS sub-query is getting
 converted into a join; which is usually a good thing but in this case it
 interferes with letting the users table not be scanned completely.
 The long-term fix for that is to support nestloop inner indexscans where
 the index key comes from more than one join level up, but making that
 happen isn't too easy.
 
 In the meantime, I think you could defeat the optimization by
 inserting LIMIT 1 in the EXISTS sub-query.
 
   regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] disk space usage unexpected

2010-02-15 Thread Ben Chobot
On Feb 15, 2010, at 11:59 AM, Rose Zhou wrote:

 Good day,
 
 I have a PostgreSQL 8.4 database installed on WinXP x64 with very heavy
 writing and updating on a partitioned table. Sometimes within one minute,
 there are tens of file with size=1,048,576kb (such as
 filenode.1,filenode.2,...filenode.43) created in the database subdirectory
 within PGDATA/base. 
 
 This caused the disk space quickly used up. Is this expected?

It's expected if you're doing lots of inserts, and/or lots of updates or 
deletes without an appropriate amount of vacuuming.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 8.1 - 8.4 regression

2010-02-14 Thread Ben Chobot
(Apologies if this ends up coming through multiple times - my first attempts 
seem to have gotten stuck.)

We recently took the much needed step in moving from 8.1.19 to 8.4.2. We took 
the downtime opportunity to also massively upgrade our hardware. Overall, this 
has been the major improvement you would expect, but there is at least one 
query which has degraded in performance quite a bit. Here is the plan on 8.4.2:
http://wood.silentmedia.com/bench/842

Here is the very much less compact plan for the same query on 8.1.19:
http://wood.silentmedia.com/bench/8119

I think the problem might be that 8.1.19 likes to use a few indexes which 8.4.2 
doesn't seem to think would be worthwhile. Perhaps that's because on the new 
hardware almost everything fits into ram, but even so, it would be better if 
those indexes were used. The other differences I can think of are 
random_page_cost (2 on the new hardware vs. 2.5 on the old), a ten-fold 
increase in effective_cache_size, doubling work_mem from 8MB to 16MB, and that 
we analyze up to 100 samples per attribute on 8.4.2, while our 8.1.19 install 
does 10 at most. Still, the estimates for both plans seem fairly accurate, at 
least where there are differences in which indexes are getting used.

Everything has been analyzed recently, and given that 8.4.2 already has 10x 
more analysis samples than 8.1.19, I'm not sure what to do to coax it towards 
using those indexes.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] hardware priority for an SSD database?

2009-12-23 Thread Ben Chobot
We're looking to upgrade our database hardware so that it can sustain  
us while we re-architect some of the more fundamental issues with our  
applications. The first thing to spend money on is usually disks, but  
our database currently lives almost entirely on flash storage, so  
that's already nice and fast. My question is, what we should spend  
money on next?


With most data stored in flash, does it still make sense to buy as  
much ram as possible? RAM is still faster than flash, but while it's  
cheap, it isn't free, and our database is a couple hundred GB in size.


We also have several hundred active sessions. Does it makes sense to  
sacrifice some memory speed and go with 4 6-core Istanbul processors?  
Or does it make more sense to limit ourselves to 2 4-core Nehalem  
sockets and get Intel's 1333 MHz DDR3 memory and faster cores?


Our queries are mostly simple, but we have a lot of them, and their  
locality tends to be low. FWIW, about half are selects.


Does anybody have any experience with these kinds of tradeoffs in the  
absence of spinning media? Any insight would be much appreciated. From  
the information I have right now, trying to figuring out how to  
optimally spend our budget feels like a shot in the dark.


Thanks!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID card recommendation

2009-11-24 Thread Ben Chobot

On Nov 24, 2009, at 9:23 AM, Matthew Wakeling wrote:

We're about to purchase a new server to store some of our old  
databases, and I was wondering if someone could advise me on a RAID  
card. We want to make a 6-drive SATA RAID array out of 2TB drives,  
and it will be RAID 5 or 6 because there will be zero write traffic.  
The priority is stuffing as much storage into a small 2U rack as  
possible, with performance less important. We will be running Debian  
Linux.


People have mentioned Areca as making good RAID controllers. We're  
looking at the Areca ARC-1220 PCI-Express x8 SATA II as a  
possibility. Does anyone have an opinion on whether it is a turkey  
or a star?


We've used that card and have been quite happy with it. Looking  
through the release notes for firmware upgrades can be pretty worrying  
(you needed to fix what?!), but we never experienced any problems  
ourselves, and its not like 3ware release notes are any different.


But the main benefits of a RAID card are a write cache and easy hot  
swap. It sounds like you don't need a write cache. Can you be happy  
with the kernel's hotswap ability?

Re: [PERFORM] Databases vs Schemas

2009-10-09 Thread Ben Chobot

Scott Otis wrote:


I am seeking advice on what the best setup for the following would be.

 

My company provides a hosted web calendaring solution for school 
districts.  For each school district we have a separate database.  
Each database has 57 tables.






Over the next couple of months we will be creating an instance of our 
solution for each public school district in the US which is around 
18,000. 




Why are you trying to keep all this information on one server? It seems 
like you have such perfectly independent silos of data, why not take the 
opportunity to scale out horizontally? It's usually a lot cheaper to buy 
4 machines of power x than one machine of power (4x).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-21 Thread Ben Chobot

On Mon, 20 Apr 2009, da...@lang.hm wrote:

one huge advantage of putting the sql into the configuration is the ability 
to work around other users of the database.


+1 on this. We've always found tools much easier to work with when they 
could be adapted to our schema, as opposed to changing our process for the 
tool.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scheduling autovacuum at lean hours only.

2009-02-11 Thread Ben Chobot

On Wed, 11 Feb 2009, Scott Carey wrote:


On a large partitioned database, ordinary vacuum is a very very difficult 
option.

Most of the time on such a system, most tables are dormant with respect to 
writes and never need to be vacuumed.  A 'daily vacuum' is likely to take a 
full day to run on larger systems. Since ordinary vacuum can't be run on 
subsets of tables without explicitly naming them one at a time (can't just 
vacuum a schema, tablespace, or use a regex to match table names), good luck 
using it effectively if you have a few thousand tables in partitions.  You'll 
have to have application code or a script with knowledge of all the partition 
names and which are in need of an analyze/vacuum.

Autovacuum is good enough in recent versions to be tuned to have very low 
impact though.  If you have to, rather than stop and start it, just turn the 
delay or cost settings up and down during different parts of the day.  More 
than likely however, it will be able to keep up with a single set of settings.
In particular, rather than making the delay longer, make the appropriate cost 
larger -- page miss or page dirty affect how much I/O it will do, and page hit 
will mostly affect how much CPU it uses.

Perhaps a feature request is to have a version of the manual vacuum command 
that doesn't bother running on tables that autovacuum wouldn't touch due to 
insufficient data churn.  This way, at lean hours one can manually vacuum to 
help an autovacuum that was tuned for very low impact 'catch up'.
Also, if there was some way to make vacuum not stink so badly on tables that 
were just loaded with pg_load, where it causes huge write activity for tables 
that clearly have no bloat (I believe this is a hint bits thing?).


Oh, I agree with everything you say. I'm just pointing out that if you 
really do want control over when things get vacuumed (e.g. you have a 
mostly-read database 20 hours a day and then 4 hours of heavy churn at 
night) then you can still do that if you want.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Ben Chobot

On Tue, 9 Dec 2008, Robert Haas wrote:


I don't believe the thesis.  The gap between disk speeds and memory
speeds may narrow over time, but I doubt it's likely to disappear
altogether any time soon, and certainly not for all users.


I think the not for all users is the critical part. In 2 years, we may 
(or may not) start using SSD instead of traditional drives for new 
installs, but we certainly won't be throwing out our existing servers any 
time soon just because something (much) better is now available.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance tools

2008-03-17 Thread Ben Chobot
Have you tried the archive search tool for this very mailing list?  
There's a wealth of information imparted all the time, and tuning is  
generally about knowledge of what's happening, not blindly following  
the output of some program.



On Mar 17, 2008, at 8:46 AM, sathiya psql wrote:


i thought many a replies will come... but only one..

common guys... it may be helping you in tuning your database  
indirectly, post that tool also, give some informations such as


Tool Name: Schemaspy
Open Source: YES
Database: Postgres
URL: http://sourceforge.net/projects/schemaspy/1
Following can be taken as optional..
Easily Installable: YES
Performance TUNING tool: Partially YES ( YES / NO / Partially Yes )
ER diagram tool: Yes / No
Query Analysis Tool: Yes / No

Probably other informations also

common start sharing...