Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3

2007-07-22 Thread Greg Smith

On Sat, 21 Jul 2007, Tom Lane wrote:


With autovac off I see 8.3 as faster than 8.2 in pgbench.


Indeed.  I'm seeing much better pgbench results from HEAD than 8.2 when I 
set the configurations up identically.  I'm hoping to have a comparison 
set to show everyone this week.


and use -t at least 1000 or so (otherwise startup transients are 
significant).


I personally consider any pgbench run that lasts less than several minutes 
noise.  On a system that hits 500 TPS like Pavel's, I'd want to see around 
100,000 transactions before I consider the results significant.  And then 
I'd want a set of 3 at each configuration because even with longer runs, 
you occasionally get really odd results.  Until you have 3 it can be 
unclear which is the weird one.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3

2007-07-22 Thread Pavel Stehule

Hello,

I checked my tests again I have different results. Now I tested
PostgreSQL on dedicated server. Now 8.3 is about 20% faster. I didn't
see strong  impression of autovacuum.  All numbers are approximate
only. I did pgbench 3x for folowing configuration: (autovacuum on,
autovacuum off, statistics off) and for -tntransaction (100, 1000,
4000)

-t 100   629/638/639 630/630/646 581/654/656 -- 8.3
   443/519/519 542/555/566  670/692/694 -- 8.2

-t 1000 622/626/653 630/635/653 631/631/652 -- 8.3
   523/528/541 522/535/550 679/681/690 -- 8.2

-t 4000 632/635/644 248/385/651* 191/401/641* -- 8.3
   465/472/520 237/336/538*  249/379/702* -- 8.2

8.2 is faster only if statistics are off


Note:
*I didn't use parametr v (do_vacuum_accounts)

I am sorry for noise

Regards
Pavel Stehule

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3

2007-07-22 Thread Josh Berkus

Pavel Stehule wrote:

Hello,

I checked my tests again I have different results. Now I tested
PostgreSQL on dedicated server. Now 8.3 is about 20% faster. I didn't
see strong  impression of autovacuum.  All numbers are approximate
only. I did pgbench 3x for folowing configuration: (autovacuum on,
autovacuum off, statistics off) and for -tntransaction (100, 1000,
4000)


In other news, 8.3 with current HOT is 13% faster than 8.2 at TPCE in 
the first 1/2 hour.  Performance does not fall over 5 hours of test run, 
and most of the main tables never have autovacuum triggered at all. 
Unfortnately, we don't yet have a 5-hour 8.2 run to compare 
last-half-hour performance.


--Josh

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3

2007-07-22 Thread Josh Berkus

Tom,


Note to all: we ***HAVE TO*** settle on some reasonable default
vacuum_cost_delay settings before we can ship 8.3.  With no cost delay
and two or three workers active, 8.3's autovac does indeed send
performance into the tank.


I've been using 20ms for most of my setups.  That's aimed at reducing 
autovac to almost no impact at all, but taking a long time.  Maybe 10ms?


--Josh


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Florian G. Pflug

Hi

I'm currently working on correctly flushing the
catalog/relation/sgmr caches on a readonly PITR
slave during recovery. These are the things that
currently cause me headache.

1) It seems that the btree code sends out relcache
   invalidation msgs during normal operation
   (No DDL statements are executed). This lets any
   simple flush-all-caches-if-ddl-was-execute scheme
   fail.

2) When a full page image is written to the wal, the
   information about what tuple was updated is lost.
   So synthesizing cache invalidation msgs from the
   WAL records would need to reverseengineer a full
   page image, which seems hard and errorprone.

3) Most cache invalidations seem to be generated by
   heap_insert (via PrepareForTupleInvalidation). Those
   seems to be reconstructable from the WAL quite easily.
   Those sent out via CacheInvalidateRelcache*, however,
   seem to leave no trace in the WAL.

What I'm wondering is how much performance is lost if
I just let the slave flush all it's caches whenever it
replayed a commit record of a transaction that executed
DDL. To me it looks like that would only seriously harm
performance if a lot of temporary tables are created on
the master. Since there seem to be quite people who are
unhappiy about the current temptable implementation,
optimizing for that case might prove worthless if 8.4 or
8.5 will change the way that temptables are handled.

If this brute-force approach turns out to perform really
bad, does anyone see an elegant way around (2) and (3)?
(2) seems solveable by writing logical and physical records
to the wal - similar to what that xlog compression idea
needs (I, however, lost track of what came out of that
discussion). But (3) seems to be messy..

greetings, Florian Pflug


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 I'm currently working on correctly flushing the
 catalog/relation/sgmr caches on a readonly PITR
 slave during recovery.

I don't believe there is any workable solution to that short of logging
cache-flush operations in WAL.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 I'm currently working on correctly flushing the
 catalog/relation/sgmr caches on a readonly PITR
 slave during recovery.

I don't believe there is any workable solution to that short of logging
cache-flush operations in WAL.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3

2007-07-22 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 I'd want a set of 3 at each configuration because even with longer runs, 
 you occasionally get really odd results.  Until you have 3 it can be 
 unclear which is the weird one.

Yeah, pgbench results are notoriously unrepeatable.  One issue is that
the first run after pgbench -i sees conditions a lot different from
subsequent runs (no free space in tables, indexes are more tightly
packed than they will be later, etc).  The recently added option to
initialize the tables with a selected fillfactor might help here,
but personally I've not experimented with it.

There seems to be also some of the good average but bad worst case
behavior that Josh and others have pointed out in bigger benchmarks.
I've always assumed this was due to checkpointing (and autovac if
enabled).  If your test run isn't long enough to cover one full
checkpoint cycle then the results will be quite variable depending
on whether it included a checkpoint or not.  8.3 might alleviate
this effect to some extent.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

I'm currently working on correctly flushing the
catalog/relation/sgmr caches on a readonly PITR
slave during recovery.


I don't believe there is any workable solution to that short of logging
cache-flush operations in WAL.


I still don't fully understand if and what problems are caused by overly 
aggresive cache flushing - what puzzles me is that DDL statements seems

to be considered something rare on a production system by most people on
this list, yet the caches seem to be highly optimized to avoid 
unnecessary invalidates.


Leaving aside the btree issues, are you worried about performance
problems, or can aggressive cache flushing hurt correctness?

The reason that I dislike WAL-logging of the flush operations so much is
that it since peopel are concerned about the amount of wal traffic 
postgres generated, such a solution would introduce yet another GUC.

And to make this reasonable foolproof, the slave would need a way to
detect if that GUC is set correctly on the master. All in all, that
seems to be quite hackish...

greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Leaving aside the btree issues, are you worried about performance
 problems, or can aggressive cache flushing hurt correctness?

It shouldn't hurt correctness, but I don't think you entirely grasp
the magnitude of the performance hit you'll take.  The last time I
tried running the regression tests with CLOBBER_CACHE_ALWAYS enabled,
they took about one hundred times longer than normal.  Now you are
evidently hoping not to disable the caches entirely, but I don't
understand how you are going to handle the various random
CacheInvalidateRelcache calls that are here and there in the system
(not only in btree).

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

Leaving aside the btree issues, are you worried about performance
problems, or can aggressive cache flushing hurt correctness?


It shouldn't hurt correctness, but I don't think you entirely grasp
the magnitude of the performance hit you'll take.  The last time I
tried running the regression tests with CLOBBER_CACHE_ALWAYS enabled,
they took about one hundred times longer than normal.  Now you are
evidently hoping not to disable the caches entirely, but I don't
understand how you are going to handle the various random
CacheInvalidateRelcache calls that are here and there in the system
(not only in btree).


I must be missing something... A quick grep for CacheInvalidateRelcache
turned of these places:
src/backend/rewrite/rewriteDefine.c (EnableDisableRule)
src/backend/rewrite/rewriteSupport.c (SetRelationRuleStatus)
src/backend/access/nbtree/nbtinsert.c (_bt_insertonpg, _bt_newroot)
src/backend/access/nbtree/nbtpage.c (_bt_getroot, _bt_pagedel)
src/backend/commands/trigger.c (renametrig, EnableDisableTrigger)
src/backend/commands/cluster.c (mark_index_clustered)
src/backend/commands/indexcmds.c (DefineIndex)
src/backend/commands/tablecmds.c (setRelhassubclassInRelation)
src/backend/commands/vacuum.c (vac_update_relstats)
src/backend/catalog/heap.c (SetRelationNumChecks)
src/backend/catalog/index.c (index_drop, index_update_stats)

For CacheInvalidateHeapTuple, there is an additional hit in
src/backend/commands/vacuum.c (move_chain_tuple, move_plain_tuple).
Note that move_chain_tuple and move_plain_tuple are only called
in repair_frag, which in turn is only used in full_vacuum_rel.

Now, to me all of these with the exception of the btree functions,
vac_update_relstats and move_chain/plain_tuple look like they are only
called during DDL statements.

My basic assumption is that DDL is something quite uncommon on a
production system. This is obviously *totally* *wrong* for the
regression tests, and I don't doubt that my scheme will show quite
bad performance if you use that as a benchmark. But if you, say,
use pgbench for testing, than the picture will be quite different
I imagine.

My strategy would be the following
1) Mark the commit record if a transaction generated any invalidation
   events apart from the btree ones. The only other source of inval
   events seems to be VACUUM FULL on a system relation, which won't
   happen on a modestly well-tuned system I think - any VACCUM FULL
   will need a special treatement anyway.
2) At replay time, the caches are flushed after that record was
   replayed.

greetings, Florian Pflug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 My basic assumption is that DDL is something quite uncommon on a
 production system.

I'm not sure I believe that, because of temp tables.  There's also
the problem that plain VACUUM (or ANALYZE) causes a relcache flush
to update the relation-size statistics.

The real problem with the scheme you propose is that it turns a
cache flush on one table into a system-wide cache flush.

We might be able to do something about the temp-table case upstream:
AFAICS there's no reason for backends to broadcast cache flushes for
their own temp tables to other backends.  But that's just a sketch
of a thought at the moment.

Anyway, if you believe that DDL is infrequent, why are you resistant
to the idea of WAL-logging cache flushes?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 8.2 is 30% better in pgbench than 8.3

2007-07-22 Thread Greg Smith

On Sun, 22 Jul 2007, Tom Lane wrote:


There seems to be also some of the good average but bad worst case
behavior that Josh and others have pointed out in bigger benchmarks.
I've always assumed this was due to checkpointing


At lot of variation is from checkpoints, some comes from the index 
fullness/fragmentation, some from vacuuming, a bit is added based on 
what's in the buffer cache from previous runs, and there's a touch of 
randomness thrown on top even if you control all those things.  My tests 
suggest the bad worst case results in pgbench tests are almost always 
from checkpoints that happen when the buffer cache is filled with almost 
exclusively dirty buffers (which can happen very easily with pgbench).


I am in the process of squashing these issues when running pgbench against 
8.3.  A before/after look at pg_stat_bgwriter lets you measure the 
checkpoint variation.  The ability to directly play with the index 
fillfactor lets you test/control the impact of that.  Wrap all that into a 
scripting framework that runs the tests many times for you in a consistant 
fashion and then summarizes the results, and pgbench becomes a rough but 
completely servicable tool.  Expect to see a pgbench-tools project that 
does all that from me soon, my copy works but it's not ready for public 
consumption yet.


I've also got a working backport of all the pg_stat_bgwriter instrumenting 
that applies to 8.2, so people can run all this against the current 
version as well if they're in a position where they can hack a custom 
build.  The hold-up on that is that the code that measures buffers 
allocated and those written by back-ends for 8.3 is wrapped into the 
Automatic adjustment of bgwriter_lru_maxpages patch, and I'm not going 
to finalize my unofficial backport until that gets applied.  Once HOT 
wraps up that loose end should get snipped easily enough.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

My basic assumption is that DDL is something quite uncommon on a
production system.


I'm not sure I believe that, because of temp tables.  There's also
the problem that plain VACUUM (or ANALYZE) causes a relcache flush
to update the relation-size statistics.



The real problem with the scheme you propose is that it turns a
cache flush on one table into a system-wide cache flush.


Yes.. It really builds on the idea that those flushes happen not
too frequently.


We might be able to do something about the temp-table case upstream:
AFAICS there's no reason for backends to broadcast cache flushes for
their own temp tables to other backends.  But that's just a sketch
of a thought at the moment.


I was actually hoping that some day temptables wouldn't be stored in
the pg_class and friends at all. I was actually wondering if it
wouldn't be possible to keep the information about them soley in
the catcache and relcache (Making the word cache a big lie). Didn't
check if that is feasible at all, though - just an idea I got at
one point.


Anyway, if you believe that DDL is infrequent, why are you resistant
to the idea of WAL-logging cache flushes?

For multiple reasons.

First, cache invalidations are not the only problem caused by replaying 
system-table updates. The whole SnapshotNow

business doesn't exactly make things easier too. So it feels like a
lot of added complexity and code for little gain - unless a *lot*
more things (like locking requests) are logged too.

Second, I'm sure that people would insist on a GUC to turn logging
those records off if they don't need them in their setup. Similar to
that make_wal_compressable GUC that was proposed a few weeks ago.
And if it's a GUC, the slave should be able to verify that it was
set correctly on the master, otherwise this becomes a huge footgun.

Third, I try to keep the changes necessary on the master at a
minimum - I feel that this will make merging the code at some point
easier, because the risk of breaking something is smaller. Bugs
in the slave code will maybe cause crashes and wrong results, but
at least they won't cause data corruption on the master.

And last but not least, I have only limited time for this project -
so I try to find the simplest workable solution, and maybe tune things
later when pratical experience shows where the real bottlenecks are.

greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The real problem with the scheme you propose is that it turns a
 cache flush on one table into a system-wide cache flush.

 Yes.. It really builds on the idea that those flushes happen not
 too frequently.

The problem is that flushes may be infrequent for any one table,
but if you use an implementation that converts every per-table
flush to a system-wide flush, it's not so infrequent anymore.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Anyway, if you believe that DDL is infrequent, why are you resistant
 to the idea of WAL-logging cache flushes?

 First, cache invalidations are not the only problem caused by replaying 
 system-table updates. The whole SnapshotNow
 business doesn't exactly make things easier too. So it feels like a
 lot of added complexity and code for little gain - unless a *lot*
 more things (like locking requests) are logged too.

The mention of locking requests brought to mind the following
gedankenexperiment:

1. slave server backend is running some long-running query on table X.

2. WAL-reading process receives and executes DROP TABLE X.

(It doesn't even have to be a DROP; most varieties of ALTER are enough
to create problems for a concurrently-running query.)

It's really hard to see how to defend against that without a fairly
complete simulation of locking on the slave side.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Simon Riggs
On Sun, 2007-07-22 at 19:58 +0200, Florian G. Pflug wrote:
 Tom Lane wrote:
  Florian G. Pflug [EMAIL PROTECTED] writes:
  I'm currently working on correctly flushing the
  catalog/relation/sgmr caches on a readonly PITR
  slave during recovery.
  
  I don't believe there is any workable solution to that short of logging
  cache-flush operations in WAL.

 The reason that I dislike WAL-logging of the flush operations so much is
 that it since peopel are concerned about the amount of wal traffic 
 postgres generated, such a solution would introduce yet another GUC.
 And to make this reasonable foolproof, the slave would need a way to
 detect if that GUC is set correctly on the master. All in all, that
 seems to be quite hackish...

Seems like we should WAL log flush operations first. It's fairly
straightforward to do that and we can then measure its effect on the
primary easily enough. Your other suggestions seem much more complex.

I think we have a reasonable tolerance for increases in WAL and as you
said earlier, we may balance that out with other optimisations. Or we
may find a more efficient way of doing it later.

Let's aim to get that first query running, then go back and tune it
later.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Simon Riggs
On Sun, 2007-07-22 at 17:54 +0200, Florian G. Pflug wrote:

 1) It seems that the btree code sends out relcache
 invalidation msgs during normal operation
 (No DDL statements are executed). This lets any
 simple flush-all-caches-if-ddl-was-execute scheme
 fail.

Cache invalidation is uniquely associated with these WAL record types:

XLOG_BTREE_NEWROOT
XLOG_BTREE_INSERT_META
XLOG_BTREE_DELETE_PAGE_META

None of those things happen with any real frequency in most use cases.
We don't increase or reduce the number of levels of an index very
frequently.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Full page images in WAL Cache Invalidation

2007-07-22 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 2. WAL-reading process receives and executes DROP TABLE X.

 (It doesn't even have to be a DROP; most varieties of ALTER are enough
 to create problems for a concurrently-running query.)

 It's really hard to see how to defend against that without a fairly
 complete simulation of locking on the slave side.

Well, it's not hard to see ways to do it. It may be hard to see *good* ways to
do it. In the limit you can just block WAL replay of such records (I think
vacuum page defragmenting is also in this category) until there are no queries
running. This means a single long-running query can block log replay for a
long time though.

There's an additional problem from commands like create index concurrently and
vacuum's deleting of dead tuples which aren't safe to execute if there are any
standby queries running unless we can prove they're not too old. CLUSTER and
most varieties of ALTER suffer from this problem as well.

It does seem like the shortest path solution is just to get things working
first with a one big lock implementation and then work on refining it after
that.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] syslogging oddity

2007-07-22 Thread Andrew Dunstan


Somewhere along the way we seem to have made the syslogger's shutdown 
message go to stderr, even if we have redirected it:


[EMAIL PROTECTED] inst.test.5703]$ bin/pg_ctl -D data/ -w stop
waiting for server to shut downLOG:  logger shutting down
done
server stopped

Not sure if this is something we should worry about. I don't recall it 
happening previously.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] syslogging oddity

2007-07-22 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Somewhere along the way we seem to have made the syslogger's shutdown 
 message go to stderr, even if we have redirected it:

I'm pretty sure it has done that all along; at least the design
intention is that messages generated by syslogger itself should go to
its stderr.  (Else, if the logger is having trouble, you might never get
to find out why at all.)

It might be reasonable to reduce logger shutting down to DEBUG1
or so, now that the facility has been around for awhile.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate