Re: [HACKERS] reducing statistics write overhead

2009-01-22 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Martin Pihlak escribió:
 [ patch to fool with stats refresh logic in autovac ]

(1) I still don't understand why we don't just make the launcher request
a new stats file once per naptime cycle, and then allow the workers to
work from that.

(2) The current code in autovacuum.c seems to be redundant with the
logic that now exists in the stats mechanism itself to decide whether a
stats file is too old.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-22 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Martin Pihlak escribi�:
  [ patch to fool with stats refresh logic in autovac ]
 
 (1) I still don't understand why we don't just make the launcher request
 a new stats file once per naptime cycle, and then allow the workers to
 work from that.

The problem is workers that spend too much time on a single database.
If the stats at the time they both start say that a given table must be
vacuumed (consider for example that the first one spent too much time
vacuuming some other big table), they could end up both vacuuming that
table.  The second vacuum would be a waste.

This could be solved if the workers kept the whole history of tables
that they have vacuumed.  Currently we keep only a single table (the one
being vacuumed right now).  I proposed writing these history files back
when workers were first implemented, but the idea was shot down before
flying very far because it was way too complex (the rest of the patch
was more than complex enough.)  Maybe we can implement this now.

 (2) The current code in autovacuum.c seems to be redundant with the
 logic that now exists in the stats mechanism itself to decide whether a
 stats file is too old.

Hmm, yeah, possibly.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-22 Thread Euler Taveira de Oliveira
Alvaro Herrera escreveu:
 This could be solved if the workers kept the whole history of tables
 that they have vacuumed.  Currently we keep only a single table (the one
 being vacuumed right now).  I proposed writing these history files back
 when workers were first implemented, but the idea was shot down before
 flying very far because it was way too complex (the rest of the patch
 was more than complex enough.)  Maybe we can implement this now.
 
[I don't remember your proposal...] Isn't it just add a circular linked list
at AutoVacuumShmemStruct? Of course some lock mechanism needs to exist to
guarantee that we don't write at the same time. The size of this linked list
would be scale by a startup-time-guc or a reasonable fixed value.


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-22 Thread Alvaro Herrera
Euler Taveira de Oliveira escribió:
 Alvaro Herrera escreveu:
  This could be solved if the workers kept the whole history of tables
  that they have vacuumed.  Currently we keep only a single table (the one
  being vacuumed right now).  I proposed writing these history files back
  when workers were first implemented, but the idea was shot down before
  flying very far because it was way too complex (the rest of the patch
  was more than complex enough.)  Maybe we can implement this now.
  
 [I don't remember your proposal...] Isn't it just add a circular linked list
 at AutoVacuumShmemStruct? Of course some lock mechanism needs to exist to
 guarantee that we don't write at the same time. The size of this linked list
 would be scale by a startup-time-guc or a reasonable fixed value.

Well, the problem is precisely how to size the list.  I don't like the
idea of keeping an arbitrary number in memory; it adds another
mostly-useless tunable that we'll need to answer questions about for all
eternity.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-22 Thread Euler Taveira de Oliveira
Alvaro Herrera escreveu:
 Euler Taveira de Oliveira escribió:
 Alvaro Herrera escreveu:
 This could be solved if the workers kept the whole history of tables
 that they have vacuumed.  Currently we keep only a single table (the one
 being vacuumed right now).  I proposed writing these history files back
 when workers were first implemented, but the idea was shot down before
 flying very far because it was way too complex (the rest of the patch
 was more than complex enough.)  Maybe we can implement this now.

 [I don't remember your proposal...] Isn't it just add a circular linked list
 at AutoVacuumShmemStruct? Of course some lock mechanism needs to exist to
 guarantee that we don't write at the same time. The size of this linked list
 would be scale by a startup-time-guc or a reasonable fixed value.
 
 Well, the problem is precisely how to size the list.  I don't like the
 idea of keeping an arbitrary number in memory; it adds another
 mostly-useless tunable that we'll need to answer questions about for all
 eternity.
 
[Poking the code a little...] You're right. We could do that but it isn't an
elegant solution. What about tracking that information at table_oids?

struct table_oids {
bool skipit;/* initially false */
Oid relid;
};


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-22 Thread Tom Lane
Euler Taveira de Oliveira eu...@timbira.com writes:
 Alvaro Herrera escreveu:
 Well, the problem is precisely how to size the list.  I don't like the
 idea of keeping an arbitrary number in memory; it adds another
 mostly-useless tunable that we'll need to answer questions about for all
 eternity.

Is it so hard?  In particular, rather than making it a tunable, what say
we freeze the list size at exactly two, ie each AV worker advertises its
current and most recent target table in shared memory.  Other workers
avoid re-vacuuming those.  Then the most work you can waste by extra
vacuuming is less than the maximum allowed stats file age.  I'd have no
problem whatsoever in letting that run into multiple seconds, as long
as it doesn't get into minutes or hours.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-21 Thread Alvaro Herrera
Martin Pihlak escribió:
 I wrote:
  I was thinking that the launcher should only request fresh stats at wakeup,
  the workers could then reuse that file. This could be implemented by calling
  pgstat_clear_snapshot only at launcher wakeup and setting max stats age to
  to autovacuum_naptime for the workers.
 
 Attached is a patch that increases the autovacuum stats age tolerance to
 autovacuum_naptime. This is handled by autovac_refresh_stats() by not clearing
 the stats snapshot unless nap time elapsed or explicitly forced by an error
 or SIGHUP.

You missed putting back the BUG comment that used to be there about
this.

In other words I think this is a bad idea, because there is a very wide
window for a table to be vacuumed twice.  Since naptime can be
arbitrarily large, this is an arbitrarily large bug.  I'm sure there are
other ways to fix this, but please propose those before this patch.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-21 Thread Martin Pihlak
I wrote:
 I was thinking that the launcher should only request fresh stats at wakeup,
 the workers could then reuse that file. This could be implemented by calling
 pgstat_clear_snapshot only at launcher wakeup and setting max stats age to
 to autovacuum_naptime for the workers.
 

Attached is a patch that increases the autovacuum stats age tolerance to
autovacuum_naptime. This is handled by autovac_refresh_stats() by not clearing
the stats snapshot unless nap time elapsed or explicitly forced by an error
or SIGHUP.

For the time being, I left the table vacuum recheck in place. Removing the
table_recheck_autovac function requires some further work. I have started on
this, but decided to defer until it is clear whether the whole approach is
acceptable or not.

regards,
Martin

*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***
*** 44,54 
   * Note that there can be more than one worker in a database concurrently.
   * They will store the table they are currently vacuuming in shared memory, so
   * that other workers avoid being blocked waiting for the vacuum lock for that
!  * table.  They will also reload the pgstats data just before vacuuming each
!  * table, to avoid vacuuming a table that was just finished being vacuumed by
!  * another worker and thus is no longer noted in shared memory.  However,
!  * there is a window (caused by pgstat delay) on which a worker may choose a
!  * table that was already vacuumed; this is a bug in the current design.
   *
   * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
--- 44,53 
   * Note that there can be more than one worker in a database concurrently.
   * They will store the table they are currently vacuuming in shared memory, so
   * that other workers avoid being blocked waiting for the vacuum lock for that
!  * table. There is a possibility that a worker might pick up a table that was
!  * already vacuumed by another process. This isn't really a problem, as the
!  * odds of this happening are low and the revacuum is made cheap by the use of
!  * visibility map.
   *
   * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
***
*** 120,128  int			autovacuum_vac_cost_limit;
  
  int			Log_autovacuum_min_duration = -1;
  
- /* how long to keep pgstat data in the launcher, in milliseconds */
- #define STATS_READ_DELAY 1000
- 
  
  /* Flags to tell if we are in an autovacuum process */
  static bool am_autovacuum_launcher = false;
--- 119,124 
***
*** 298,304  static void avl_sighup_handler(SIGNAL_ARGS);
  static void avl_sigusr1_handler(SIGNAL_ARGS);
  static void avl_sigterm_handler(SIGNAL_ARGS);
  static void avl_quickdie(SIGNAL_ARGS);
! static void autovac_refresh_stats(void);
  
  
  
--- 294,300 
  static void avl_sigusr1_handler(SIGNAL_ARGS);
  static void avl_sigterm_handler(SIGNAL_ARGS);
  static void avl_quickdie(SIGNAL_ARGS);
! static void autovac_refresh_stats(bool force);
  
  
  
***
*** 500,509  AutoVacLauncherMain(int argc, char *argv[])
  		DatabaseList = NULL;
  
  		/*
! 		 * Make sure pgstat also considers our stat data as gone.  Note: we
! 		 * mustn't use autovac_refresh_stats here.
  		 */
! 		pgstat_clear_snapshot();
  
  		/* Now we can allow interrupts again */
  		RESUME_INTERRUPTS();
--- 496,504 
  		DatabaseList = NULL;
  
  		/*
! 		 * Make sure pgstat also considers our stat data as gone.
  		 */
! 		autovac_refresh_stats(true);
  
  		/* Now we can allow interrupts again */
  		RESUME_INTERRUPTS();
***
*** 598,603  AutoVacLauncherMain(int argc, char *argv[])
--- 593,601 
  		if (got_SIGTERM)
  			break;
  
+ 		/* Refresh stats. Force it, if reloaded via SIGHUP */
+ 		autovac_refresh_stats(got_SIGHUP);
+ 
  		if (got_SIGHUP)
  		{
  			got_SIGHUP = false;
***
*** 851,859  rebuild_database_list(Oid newdb)
  	int			nelems;
  	HTAB	   *dbhash;
  
- 	/* use fresh stats */
- 	autovac_refresh_stats();
- 
  	newcxt = AllocSetContextCreate(AutovacMemCxt,
     AV dblist,
     ALLOCSET_DEFAULT_MINSIZE,
--- 849,854 
***
*** 1078,1086  do_start_worker(void)
     ALLOCSET_DEFAULT_MAXSIZE);
  	oldcxt = MemoryContextSwitchTo(tmpcxt);
  
- 	/* use fresh stats */
- 	autovac_refresh_stats();
- 
  	/* Get a list of databases */
  	dblist = get_database_list();
  
--- 1073,1078 
***
*** 2145,2158  do_autovacuum(void)
  		}
  
  		/*
! 		 * Check whether pgstat data still says we need to vacuum this table.
! 		 * It could have changed if something else processed the table while
! 		 * we weren't looking.
! 		 *
! 		 * Note: we have a special case in pgstat code to ensure that the stats
! 		 * we read are as up-to-date as possible, to avoid the problem that

Re: [HACKERS] reducing statistics write overhead

2009-01-21 Thread Martin Pihlak
Alvaro Herrera wrote:
 You missed putting back the BUG comment that used to be there about
 this.
 

This was deliberate, I did mention the condition in the comment at
the beginning of the file. This actually makes it a feature :)

Seriously though, do you think that this is still a problem? Given
the rare occurrence of the revacuum and the fact that it is made
cheap by visibility map? In my initial testing, I couldn't reproduce
the revacuum. But I'll keep at it.

 In other words I think this is a bad idea, because there is a very wide
 window for a table to be vacuumed twice.  Since naptime can be
 arbitrarily large, this is an arbitrarily large bug.  I'm sure there are
 other ways to fix this, but please propose those before this patch.
 

I was wondering that maybe the stats subsystem shouldn't be used for
vacuum tracking at all. It maybe convenient to use, but has several
deficiencies (pobig file, lossy, no crash safety, etc). Could we move
vacuum tracking to pg_class instead?

regards,
Martin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-21 Thread Alvaro Herrera
Martin Pihlak escribió:
 Alvaro Herrera wrote:
  You missed putting back the BUG comment that used to be there about
  this.
 
 This was deliberate, I did mention the condition in the comment at
 the beginning of the file. This actually makes it a feature :)
 
 Seriously though, do you think that this is still a problem? Given
 the rare occurrence of the revacuum and the fact that it is made
 cheap by visibility map?

Hmm, maybe it's no longer an issue with the visibility map, yes.

 I was wondering that maybe the stats subsystem shouldn't be used for
 vacuum tracking at all. It maybe convenient to use, but has several
 deficiencies (pobig file, lossy, no crash safety, etc). Could we move
 vacuum tracking to pg_class instead?

I agree that pgstats is not ideal (we've said this from the very
beginning), but I doubt that updating pg_class is the answer; you'd be
generating thousands of dead tuples there.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-21 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Martin Pihlak escribió:

Alvaro Herrera wrote:

You missed putting back the BUG comment that used to be there about
this.

This was deliberate, I did mention the condition in the comment at
the beginning of the file. This actually makes it a feature :)

Seriously though, do you think that this is still a problem? Given
the rare occurrence of the revacuum and the fact that it is made
cheap by visibility map?


Hmm, maybe it's no longer an issue with the visibility map, yes.


You still have to scan all indexes, so it's still not free by any means.

(I haven't been paying attention to what kind of a risk we're talking 
about..)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-21 Thread Martin Pihlak
Alvaro Herrera wrote:
 I agree that pgstats is not ideal (we've said this from the very
 beginning), but I doubt that updating pg_class is the answer; you'd be
 generating thousands of dead tuples there.
 

But we already do update pg_class after vacuum -- in vac_update_relstats().
Hmm, that performs a heap_inplace_update() ... I assume that this is cheap,
but have no idea as if it is suitable for the purpouse.

regards,
Martin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-21 Thread Alvaro Herrera
Martin Pihlak escribió:
 Alvaro Herrera wrote:
  I agree that pgstats is not ideal (we've said this from the very
  beginning), but I doubt that updating pg_class is the answer; you'd be
  generating thousands of dead tuples there.
 
 But we already do update pg_class after vacuum -- in vac_update_relstats().
 Hmm, that performs a heap_inplace_update() ... I assume that this is cheap,
 but have no idea as if it is suitable for the purpouse.

Oh, sorry, I thought you were suggesting to use pg_class to store number
of tuples dead/alive/etc.

I had a patch to introduce a new type of table, which would only be used
for non-transactional updates.  That would allow what you're proposing.
I think we discussed something similar to what you propose and rejected
it for some reason I can't recall offhand.  Search the archives for
pg_class_nt and pg_ntclass, that might give you some ideas.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-12 Thread Martin Pihlak
Tom Lane wrote:
 I never understood why autovacuum should need a particularly short fuse
 on the stats file age to start with.  If the launcher is launching
 multiple workers into the same database with only a few milliseconds
 between them, isn't the launcher pretty broken anyhow?  ISTM that stats
 files even several seconds old ought to be plenty good enough.
 

I was thinking that the launcher should only request fresh stats at wakeup,
the workers could then reuse that file. This could be implemented by calling
pgstat_clear_snapshot only at launcher wakeup and setting max stats age to
to autovacuum_naptime for the workers.

This would effectively disable the table vacuum recheck though. If this is
OK, I'll start working on it.

regards,
Martin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-07 Thread Martin Pihlak
Alvaro Herrera wrote:
 Tom Lane escribió:
 
 (In fact, maybe this patch ought to include some sort of maximum update
 rate tunable?  The worst case behavior could actually be WORSE than now.)
 
 Some sort of if stats were requested in the last 500 ms, just tell the
 requester to read the existing file.
 
 Things that come to mind:
 
 - autovacuum could use a more frequent stats update in certain cases
 

[ digging up an old tread ... ]

I recently tested the on-demand stats write on an installation with over 100
databases on a single cluster (ca 5MB stats file). As I saw no visible reduction
in the stats file updates, I started investigating. Turned out that autovacuum
was configured with 1minute naptime, and was constantly walking the database 
list
and launching workers. Each worker does several stats file requests, and often
it turns out that the file is older than the allowed 10ms. Increasing the 
naptime
somewhat alleviates the problem, but still introduces peaks.

As I understand the autovacuum workers need up to date stats to minimize the
risk of re-vacuuming a table (in case it was already vacuumed by someone else).
However, with the visibility map based vacuums the cost of re-vacuuming  should
be reasonably low. So I'd propose to increase the max allowed stats age for
autovacuum workers. Perhaps even as high as to allow reusing of the file 
requested
by the launcher. Or am I missing something?

regards,
Martin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2009-01-07 Thread Tom Lane
Martin Pihlak martin.pih...@gmail.com writes:
 As I understand the autovacuum workers need up to date stats to minimize the
 risk of re-vacuuming a table (in case it was already vacuumed by someone 
 else).

I never understood why autovacuum should need a particularly short fuse
on the stats file age to start with.  If the launcher is launching
multiple workers into the same database with only a few milliseconds
between them, isn't the launcher pretty broken anyhow?  ISTM that stats
files even several seconds old ought to be plenty good enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-11-02 Thread Tom Lane
Martin Pihlak [EMAIL PROTECTED] writes:
 Attached is a patch that implements the described signalling. Additionally
 following non-related changes have been made:
 1. fopen/fclose replaced with AllocateFile/FreeFile
 2. pgstat_report_stat() now also checks if there are functions to report
 before returning. Previous behavior was to return immediately if no table
 stats were available for reporting.

Applied with minor corrections.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-09 Thread Martin Pihlak
Alvaro Herrera wrote:
 Attached is a patch that implements the described signalling.
 
 Are we keeping the idea that the reader sends a stat message when it
 needs to read stats?  What about the lossiness of the transport?
 

As the message is resent in the wait loop, the collector should receive
it sooner or later. And initial testing shows that its not really easy to
make the collector lose messages.

I used a modified version of the patch to run a simple load test on a 4 core
amd64 Linux box:

- patch modified so that pgstat_send_inquiry() is sent only once - before wait 
loop,
  so it times out if message is lost.
- stats file bloated to ~ 5MB by creating 40k tables.
- 4 pgbench instances running: -c 500 -t 500
- 2 clients constantly pulling stats
- all cores near 100% busy, tx traffic over loopback ~ 200kB/sec.

Most of the stats requests required 1 or 2 file wait iterations (5ms sleep 
each).
Occasionally 3, but no timeouts (ie. no lost messages). Maybe other platforms 
are
more sensitive, but I have none available for testing at the moment.

regards,
Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Magnus Hagander
Martin Pihlak wrote:
 Magnus Hagander wrote:
 I wrote a patch for this some time back, that was actually applied.
 Turns out it didn't work, and I ran out of time to fix it, so it was
 backed out again. And then I forgot about it :-) If you look through the
 cvs history of pgstat you should be able to find it - maybe it can give
 you some further ideas.
 
 Got it - this was 1.126. Looks very familiar indeed :)

:-)


 I had also previously experimented with stat() based polling but ran into
 the same issues - no portable high resolution timestamp on files. I guess
 stat() is unusable unless we can live with 1 second update interval for the
 stats (eg. backend reads the file if it is within 1 second of the request).

If the filesystem has inodes, noticing that the inode changes should
usually be enough, no? Since we always create a new file and rename it
into place, there is no way that inode can be reused right away. But it
does require that the stat info is not cached somewhere in between...


//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Magnus Hagander
Tom Lane wrote:
 Martin Pihlak [EMAIL PROTECTED] writes:
 I had also previously experimented with stat() based polling but ran into
 the same issues - no portable high resolution timestamp on files. I guess
 stat() is unusable unless we can live with 1 second update interval for the
 stats (eg. backend reads the file if it is within 1 second of the request).
 
 One alternative is to include a timestamp in the stats file header - the
 backend can then wait on that -- check the timestamp, sleep, resend the
 request, loop. Not particularly elegant, but easy to implement. Would this
 be acceptable?
 
 Timestamp within the file is certainly better than trying to rely on
 filesystem timestamps.  I doubt 1 sec resolution is good enough, and

We'd need half a second resolution just to keep up with the level we
have *now*, don't we?

 I'd also be worried about issues like clock skew between the
 postmaster's time and the filesystem's time.

Can that even happen on a local filesystem? I guess you could put the
file on NFS though, but that seems to be.. eh. sub-optimal.. in more
than one way..

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'd also be worried about issues like clock skew between the
 postmaster's time and the filesystem's time.

 Can that even happen on a local filesystem? I guess you could put the
 file on NFS though, but that seems to be.. eh. sub-optimal.. in more
 than one way..

Lots of people use NAS/SAN type setups.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'd also be worried about issues like clock skew between the
 postmaster's time and the filesystem's time.
 
 Can that even happen on a local filesystem? I guess you could put the
 file on NFS though, but that seems to be.. eh. sub-optimal.. in more
 than one way..
 
 Lots of people use NAS/SAN type setups.

For NAS it could happen, but certainly not for SAN, no? SANs have all
the filesystem processing in the kernel of the server...

I still agree that it's not a good thing to rely on, though :-)

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Martin Pihlak
Tom Lane wrote:
 Timestamp within the file is certainly better than trying to rely on
 filesystem timestamps.  I doubt 1 sec resolution is good enough, and
 I'd also be worried about issues like clock skew between the
 postmaster's time and the filesystem's time.
 

Attached is a patch which adds a timestamp to pgstat.stat file header,
backend_read_statsfile uses this to determine if the file is fresh.
During the wait loop, the stats request message is retransmitted to
compensate for possible loss of message(s).

The collector only writes the file mostly at PGSTAT_STAT_INTERVAL frequency,
currently no extra custom timeouts can be passed in the message. This can
of course be added if need arises.

regards,
Martin


Index: src/backend/postmaster/pgstat.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.181
diff -c -r1.181 pgstat.c
*** src/backend/postmaster/pgstat.c	25 Aug 2008 18:55:43 -	1.181
--- src/backend/postmaster/pgstat.c	8 Sep 2008 12:17:21 -
***
*** 85,90 
--- 85,94 
  #define PGSTAT_SELECT_TIMEOUT	2		/* How often to check for postmaster
  		 * death; in seconds. */
  
+ #define PGSTAT_POLL_RETRY_DELAY	10		/* How long to pause between statistics
+ 		 * update requests; in milliseconds. */
+ 
+ #define PGSTAT_POLL_RETRIES		500		/* How many times to repeat stats inquiry */
  
  /* --
   * The initial size hints for the hash tables used in the collector.
***
*** 201,206 
--- 205,215 
   */
  static PgStat_GlobalStats globalStats;
  
+ /* Last time the collector wrote the stats file */
+ static TimestampTz last_statwrite;
+ /* Last time a backend requested a new file */
+ static TimestampTz last_statrequest;
+ 
  static volatile bool need_exit = false;
  static volatile bool need_statwrite = false;
  static volatile bool got_SIGHUP = false;
***
*** 223,229 
  
  NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]);
  static void pgstat_exit(SIGNAL_ARGS);
- static void force_statwrite(SIGNAL_ARGS);
  static void pgstat_beshutdown_hook(int code, Datum arg);
  static void pgstat_sighup_handler(SIGNAL_ARGS);
  
--- 232,237 
***
*** 254,259 
--- 262,268 
  static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
  static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
  static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
+ static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len);
  
  
  /* 
***
*** 2463,2468 
--- 2472,2491 
  	hdr-m_type = mtype;
  }
  
+ /* --
+  * pgstat_send_inquiry() -
+  *
+  *		Notify collector that we need fresh data.
+  * --
+  */
+ static void
+ pgstat_send_inquiry(void)
+ {
+ 	PgStat_MsgInquiry msg;
+ 
+ 	pgstat_setheader(msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
+ 	pgstat_send(msg, sizeof(msg));
+ }
  
  /* --
   * pgstat_send() -
***
*** 2538,2545 
  NON_EXEC_STATIC void
  PgstatCollectorMain(int argc, char *argv[])
  {
- 	struct itimerval write_timeout;
- 	bool		need_timer = false;
  	int			len;
  	PgStat_Msg	msg;
  
--- 2561,2566 
***
*** 2571,2583 
  
  	/*
  	 * Ignore all signals usually bound to some action in the postmaster,
! 	 * except SIGQUIT and SIGALRM.
  	 */
  	pqsignal(SIGHUP, pgstat_sighup_handler);
  	pqsignal(SIGINT, SIG_IGN);
  	pqsignal(SIGTERM, SIG_IGN);
  	pqsignal(SIGQUIT, pgstat_exit);
- 	pqsignal(SIGALRM, force_statwrite);
  	pqsignal(SIGPIPE, SIG_IGN);
  	pqsignal(SIGUSR1, SIG_IGN);
  	pqsignal(SIGUSR2, SIG_IGN);
--- 2592,2603 
  
  	/*
  	 * Ignore all signals usually bound to some action in the postmaster,
! 	 * except SIGQUIT 
  	 */
  	pqsignal(SIGHUP, pgstat_sighup_handler);
  	pqsignal(SIGINT, SIG_IGN);
  	pqsignal(SIGTERM, SIG_IGN);
  	pqsignal(SIGQUIT, pgstat_exit);
  	pqsignal(SIGPIPE, SIG_IGN);
  	pqsignal(SIGUSR1, SIG_IGN);
  	pqsignal(SIGUSR2, SIG_IGN);
***
*** 2598,2608 
  	 */
  	need_statwrite = true;
  
- 	/* Preset the delay between status file writes */
- 	MemSet(write_timeout, 0, sizeof(struct itimerval));
- 	write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
- 	write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
- 
  	/*
  	 * Read in an existing statistics stats file or initialize the stats to
  	 * zero.
--- 2618,2623 
***
*** 2649,2668 
  		}
  
  		/*
! 		 * If time to write the stats file, do so.	Note that the alarm
! 		 * interrupt isn't re-enabled immediately, but only after we next
! 		 * receive a stats message; so no cycles are wasted when there is
! 		 * nothing going on.
  		 */
  		if (need_statwrite)
  		{
  			/* Check for postmaster death; if so we'll write file below */
  			if (!PostmasterIsAlive(true))
  break;
  
! 			

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Tom Lane
Martin Pihlak [EMAIL PROTECTED] writes:
 Attached is a patch which adds a timestamp to pgstat.stat file header,
 backend_read_statsfile uses this to determine if the file is fresh.
 During the wait loop, the stats request message is retransmitted to
 compensate for possible loss of message(s).

 The collector only writes the file mostly at PGSTAT_STAT_INTERVAL frequency,
 currently no extra custom timeouts can be passed in the message. This can
 of course be added if need arises.

Hmm.  With the timestamp in the file, ISTM that we could put all the
intelligence on the reader side.  Reader checks file, sends message if
it's too stale.  The collector just writes when it's told to, no
filtering.  In this design, rate limiting relies on the readers to not
be unreasonable in how old a file they'll accept; and there's no problem
with different readers having different requirements.

A possible problem with this idea is that two readers might send request
messages at about the same time, resulting in two writes where there
need have been only one.  However I think that could be fixed if we add
a little more information to the request messages and have the collector
remember the file timestamp it last wrote out.  There are various ways
you could design it but what comes to mind here is for readers to send
a timestamp defined as minimum acceptable file timestamp (ie, their
current clock time less whatever slop they want to allow).  Then,
when the collector gets a request with that timestamp = its last
write timestamp, it knows it need not do anything.

With signaling like that, there's no excess writes *and* no delay in
responding to a new live write request.  It's sort of annoying that
the readers have to sleep for an arbitrary interval though.  If we could
get rid of that part...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Martin Pihlak
Tom Lane wrote:
 Hmm.  With the timestamp in the file, ISTM that we could put all the
 intelligence on the reader side.  Reader checks file, sends message if
... snip ...
 remember the file timestamp it last wrote out.  There are various ways
 you could design it but what comes to mind here is for readers to send
 a timestamp defined as minimum acceptable file timestamp (ie, their
 current clock time less whatever slop they want to allow).  Then,
 when the collector gets a request with that timestamp = its last
 write timestamp, it knows it need not do anything.

Attached is a patch that implements the described signalling. Additionally
following non-related changes have been made:
1. fopen/fclose replaced with AllocateFile/FreeFile
2. pgstat_report_stat() now also checks if there are functions to report
before returning. Previous behavior was to return immediately if no table
stats were available for reporting.

 responding to a new live write request.  It's sort of annoying that
 the readers have to sleep for an arbitrary interval though.  If we could
 get rid of that part...
 
It is, but I guess its unavoidable if we have to wait for the file to be
written. I'll try to do some load testing tomorrow, to see if something
nasty comes out.

regards,
Martin

Index: src/backend/postmaster/pgstat.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.181
diff -c -r1.181 pgstat.c
*** src/backend/postmaster/pgstat.c	25 Aug 2008 18:55:43 -	1.181
--- src/backend/postmaster/pgstat.c	8 Sep 2008 21:00:17 -
***
*** 85,90 
--- 85,97 
  #define PGSTAT_SELECT_TIMEOUT	2		/* How often to check for postmaster
  		 * death; in seconds. */
  
+ #define PGSTAT_POLL_RETRY_DELAY	5		/* How long to pause between statistics
+ 		 * update requests; in milliseconds. */
+ 
+ #define PGSTAT_POLL_TIME		5000	/* How long are we allowed to poll for 
+ 		 * the stats file; in milliseconds. */
+ 
+ #define PGSTAT_POLL_LOOP_COUNT	(PGSTAT_POLL_TIME / PGSTAT_POLL_RETRY_DELAY)
  
  /* --
   * The initial size hints for the hash tables used in the collector.
***
*** 159,164 
--- 166,177 
  static HTAB *pgStatFunctions = NULL;
  
  /*
+  * Indicates if backend has some function stats which it hasn't yet
+  * sent to the collector.
+  */
+ static bool have_function_stats = false;
+ 
+ /*
   * Tuple insertion/deletion counts for an open transaction can't be propagated
   * into PgStat_TableStatus counters until we know if it is going to commit
   * or abort.  Hence, we keep these counts in per-subxact structs that live
***
*** 201,208 
   */
  static PgStat_GlobalStats globalStats;
  
  static volatile bool need_exit = false;
- static volatile bool need_statwrite = false;
  static volatile bool got_SIGHUP = false;
  
  /*
--- 214,225 
   */
  static PgStat_GlobalStats globalStats;
  
+ /* Last time the collector wrote the stats file */
+ static TimestampTz last_statwrite;
+ /* Latest statistics request from backend */
+ static TimestampTz last_statrequest;
+ 
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
  
  /*
***
*** 223,229 
  
  NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]);
  static void pgstat_exit(SIGNAL_ARGS);
- static void force_statwrite(SIGNAL_ARGS);
  static void pgstat_beshutdown_hook(int code, Datum arg);
  static void pgstat_sighup_handler(SIGNAL_ARGS);
  
--- 240,245 
***
*** 254,259 
--- 270,276 
  static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
  static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
  static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
+ static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len);
  
  
  /* 
***
*** 655,663 
  	int			i;
  
  	/* Don't expend a clock check if nothing to do */
! 	/* Note: we ignore pending function stats in this test ... OK? */
! 	if (pgStatTabList == NULL ||
! 		pgStatTabList-tsa_used == 0)
  		return;
  
  	/*
--- 672,679 
  	int			i;
  
  	/* Don't expend a clock check if nothing to do */
! 	if ((pgStatTabList == NULL || pgStatTabList-tsa_used == 0)
! 		 !have_function_stats)
  		return;
  
  	/*
***
*** 823,828 
--- 839,846 
  	if (msg.m_nentries  0)
  		pgstat_send(msg, offsetof(PgStat_MsgFuncstat, m_entry[0]) +
  	msg.m_nentries * sizeof(PgStat_FunctionEntry));
+ 
+ 	have_function_stats = false;
  }
  
  
***
*** 1341,1346 
--- 1359,1367 
  		fs-f_numcalls++;
  	fs-f_time = f_total;
  	INSTR_TIME_ADD(fs-f_time_self, f_self);
+ 
+ 	/* indicate that we have something to upload */
+ 	have_function_stats = true;
  }
  
  
***
*** 2463,2468 
--- 2484,2505 
  	hdr-m_type = mtype;
  }
  

Re: [HACKERS] reducing statistics write overhead

2008-09-08 Thread Alvaro Herrera
Martin Pihlak escribió:
 Tom Lane wrote:
  Hmm.  With the timestamp in the file, ISTM that we could put all the
  intelligence on the reader side.  Reader checks file, sends message if

 Attached is a patch that implements the described signalling.

Are we keeping the idea that the reader sends a stat message when it
needs to read stats?  What about the lossiness of the transport?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-07 Thread Magnus Hagander
Martin Pihlak wrote:
 Attached is a WIP patch, which basically implements this:
 This patch breaks deadlock checking and statement_timeout, because
 backends already use SIGALRM.  You can't just take over that signal.
 It's possible that you could get things to work by treating this as an
 additional reason for SIGALRM, but that code is unreasonably complex
 already.  I'd suggest finding some other way.

 
 I suspected that, but somehow managed to overlook it :( I guess it was
 too tempting to use it. I'll start looking for alternatives.

I wrote a patch for this some time back, that was actually applied.
Turns out it didn't work, and I ran out of time to fix it, so it was
backed out again. And then I forgot about it :-) If you look through the
cvs history of pgstat you should be able to find it - maybe it can give
you some further ideas.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-07 Thread Magnus Hagander
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 As for signalling, maybe we could implement something like we do for the
 postmaster signal stuff: the requestor stores a dbid in shared memory
 and sends a SIGUSR2 to pgstat or some such.
 
 No, no, no.  Martin already had a perfectly sane design for that
 direction of signalling: send a special stats message to the collector.
 That can carry whatever baggage it needs to.  It's the reverse direction
 of the data you requested is available now, sir that is tricky.

IIRC, my previous patch looked at the inode of the stats file, then sent
of the gimme a new file signal, and then read the file once the inode
change.

But also IIRC, that's the area where there was a problem - sometimes it
didn't properly pick up changes...

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-07 Thread Dimitri Fontaine

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Le 7 sept. 08 à 00:45, Tom Lane a écrit :

I dislike the alternative of communicating through shared memory,
though.  Right now the stats collector isn't even connected to shared
memory.


Maybe Markus Wanner work for Postgres-R internal messaging, now it has  
been reworked to follow your advices, could be of some use here?

  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01114.php
  http://archives.postgresql.org/pgsql-hackers/2008-07/msg01420.php

Regards,
- --
dim



- --
Dimitri Fontaine
PostgreSQL DBA, Architecte



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjEF0sACgkQlBXRlnbh1bl/FACeORN+NjEFC9wi22suNaSoWmi5
LBEAnj9Qo2E6GWqVjdtsSCG7JILBPmX6
=5jPo
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-07 Thread Martin Pihlak
Magnus Hagander wrote:
 I wrote a patch for this some time back, that was actually applied.
 Turns out it didn't work, and I ran out of time to fix it, so it was
 backed out again. And then I forgot about it :-) If you look through the
 cvs history of pgstat you should be able to find it - maybe it can give
 you some further ideas.

Got it - this was 1.126. Looks very familiar indeed :)

I had also previously experimented with stat() based polling but ran into
the same issues - no portable high resolution timestamp on files. I guess
stat() is unusable unless we can live with 1 second update interval for the
stats (eg. backend reads the file if it is within 1 second of the request).

One alternative is to include a timestamp in the stats file header - the
backend can then wait on that -- check the timestamp, sleep, resend the
request, loop. Not particularly elegant, but easy to implement. Would this
be acceptable?

regards,
Martin




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-07 Thread Tom Lane
Martin Pihlak [EMAIL PROTECTED] writes:
 I had also previously experimented with stat() based polling but ran into
 the same issues - no portable high resolution timestamp on files. I guess
 stat() is unusable unless we can live with 1 second update interval for the
 stats (eg. backend reads the file if it is within 1 second of the request).

 One alternative is to include a timestamp in the stats file header - the
 backend can then wait on that -- check the timestamp, sleep, resend the
 request, loop. Not particularly elegant, but easy to implement. Would this
 be acceptable?

Timestamp within the file is certainly better than trying to rely on
filesystem timestamps.  I doubt 1 sec resolution is good enough, and
I'd also be worried about issues like clock skew between the
postmaster's time and the filesystem's time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Asko Oja
On Sat, Sep 6, 2008 at 2:29 AM, Euler Taveira de Oliveira [EMAIL PROTECTED]
 wrote:

 Martin Pihlak escreveu:
  I suspected that, but somehow managed to overlook it :( I guess it was
  too tempting to use it. I'll start looking for alternatives.
 
 If you can't afford a 500 msec pgstat time, then you need to make it
 tunable.

Additional parameter in config file. Not good.


 Another ideas are (i) turn on/off pgstat per table or database
 and (ii) make the pgstat time tunable per table or database. You can use
 the reloptions column to store these info. These workarounds are much
 simpler than that you proposed and they're almost for free.

Does not seem simple to me. Why would dba's want extra management work. We
want all the stats to be there as we don't know when we need to look at it.



 --
  Euler Taveira de Oliveira
  http://www.timbira.com/

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Simon Riggs

On Fri, 2008-09-05 at 15:23 -0400, Tom Lane wrote:

 How necessary is this given the recent fixes to allow the stats file to
 be kept on a ramdisk?

I would prefer this approach and back-out the other change.

On-demand is cheaper and easier to use.

  Attached is a WIP patch, which basically implements this:
 
 This patch breaks deadlock checking and statement_timeout, because
 backends already use SIGALRM.  You can't just take over that signal.
 It's possible that you could get things to work by treating this as an
 additional reason for SIGALRM, but that code is unreasonably complex
 already.  I'd suggest finding some other way.

There are other ways already in use in backend, so just use those.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Fri, 2008-09-05 at 15:23 -0400, Tom Lane wrote:
 How necessary is this given the recent fixes to allow the stats file to
 be kept on a ramdisk?

 I would prefer this approach and back-out the other change.

Even if we get on-demand done, I wouldn't see it as a reason to back out
the statfile relocation work.  In an environment where the stats are
demanded frequently, you could still need that for performance.

(In fact, maybe this patch ought to include some sort of maximum update
rate tunable?  The worst case behavior could actually be WORSE than now.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Alvaro Herrera
Tom Lane escribió:

 (In fact, maybe this patch ought to include some sort of maximum update
 rate tunable?  The worst case behavior could actually be WORSE than now.)

Some sort of if stats were requested in the last 500 ms, just tell the
requester to read the existing file.

Things that come to mind:

- autovacuum could use a more frequent stats update in certain cases

- Maybe we oughta have separate files, one for each database?  That way
we'd reduce unnecessary I/O traffic for both the reader and the writer.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Some sort of if stats were requested in the last 500 ms, just tell the
 requester to read the existing file.

Hmm, I was thinking of delaying both the write and the reply signal
until 500ms had elapsed.  But the above behavior would certainly be
easier to implement, and would probably be good enough (TM).

 - Maybe we oughta have separate files, one for each database?  That way
 we'd reduce unnecessary I/O traffic for both the reader and the writer.

The signaling would become way too complex, I think.  Also what do you
do about shared tables?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Asko Oja
Too frequent read protection is already handled in the patch but these
comments might lead it into new directions. Current implementation had this
same limit that file was written no more than once per 500 ms.

On Sat, Sep 6, 2008 at 9:12 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Alvaro Herrera [EMAIL PROTECTED] writes:
  Some sort of if stats were requested in the last 500 ms, just tell the
  requester to read the existing file.

  Things that come to mind:

  - autovacuum could use a more frequent stats update in certain cases

 BTW, we could implement that by, instead of having a global tunable,
 including a field in the request message saying how stale an existing
 file is acceptable for this requestor.  500ms might be the standard
 value but autovac could use a smaller number.

regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Some sort of if stats were requested in the last 500 ms, just tell the
 requester to read the existing file.

 Things that come to mind:

 - autovacuum could use a more frequent stats update in certain cases

BTW, we could implement that by, instead of having a global tunable,
including a field in the request message saying how stale an existing
file is acceptable for this requestor.  500ms might be the standard
value but autovac could use a smaller number.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera [EMAIL PROTECTED] writes:

  - Maybe we oughta have separate files, one for each database?  That way
  we'd reduce unnecessary I/O traffic for both the reader and the writer.
 
 The signaling would become way too complex, I think.  Also what do you
 do about shared tables?

They are already stored in a separate database (denoted with
InvalidOid dbid), and autovacuum grabs it separately.  I admit I don't
know what do regular backends do about it.

As for signalling, maybe we could implement something like we do for the
postmaster signal stuff: the requestor stores a dbid in shared memory
and sends a SIGUSR2 to pgstat or some such.  We'd have enough shmem
space for a reasonable number of requests, and pgstat consumes them from
there into local memory (similar to what Andrew proposes for
LISTEN/NOTIFY); it stores the dbid and PID of the requestor.  As soon as
the request has been fulfilled, pgstat responds by fill in magical
mechanism that Martin is about to propose to that particular backend.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 As for signalling, maybe we could implement something like we do for the
 postmaster signal stuff: the requestor stores a dbid in shared memory
 and sends a SIGUSR2 to pgstat or some such.

No, no, no.  Martin already had a perfectly sane design for that
direction of signalling: send a special stats message to the collector.
That can carry whatever baggage it needs to.  It's the reverse direction
of the data you requested is available now, sir that is tricky.
And I fear that having to keep track of multiple stats-collector output
files would make it very significantly trickier --- both for the stats
collector's own bookkeeping and for the signaling mechanism itself.
I don't believe it's gonna be worth that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-06 Thread Tom Lane
I wrote:
 No, no, no.  Martin already had a perfectly sane design for that
 direction of signalling: send a special stats message to the collector.

Actually ... given that the stats message mechanism is designed to be
lossy under high load, maybe that isn't so sane.  At the very least
there would have to be timeout-and-resend logic on the backend side.

I dislike the alternative of communicating through shared memory,
though.  Right now the stats collector isn't even connected to shared
memory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-05 Thread Tom Lane
Martin Pihlak [EMAIL PROTECTED] writes:
 So, as a simple optimization I am proposing that the file should be
 only written when some backend requests statistics. This would
 significantly reduce the undesired write traffic at the cost of
 slightly slower stats access.

How necessary is this given the recent fixes to allow the stats file to
be kept on a ramdisk?

 Attached is a WIP patch, which basically implements this:

This patch breaks deadlock checking and statement_timeout, because
backends already use SIGALRM.  You can't just take over that signal.
It's possible that you could get things to work by treating this as an
additional reason for SIGALRM, but that code is unreasonably complex
already.  I'd suggest finding some other way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-05 Thread Joshua Drake
On Fri, 05 Sep 2008 15:23:18 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Martin Pihlak [EMAIL PROTECTED] writes:
  So, as a simple optimization I am proposing that the file should be
  only written when some backend requests statistics. This would
  significantly reduce the undesired write traffic at the cost of
  slightly slower stats access.
 
 How necessary is this given the recent fixes to allow the stats file
 to be kept on a ramdisk?

From an usability and integration perspective this patch is a nice
touch. On demand is a nice feature when used correctly.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-05 Thread Martin Pihlak
Tom Lane wrote:
 Martin Pihlak [EMAIL PROTECTED] writes:
 So, as a simple optimization I am proposing that the file should be
 only written when some backend requests statistics. This would
 significantly reduce the undesired write traffic at the cost of
 slightly slower stats access.
 
 How necessary is this given the recent fixes to allow the stats file to
 be kept on a ramdisk?
 

Ramdisk helps, but requires additional effort to set up. Also the stats
file has a tendency to creep up on you -- as the database evolves the file
size gradually increases and suddenly the DBA is left wondering what
happened to performance.

 Attached is a WIP patch, which basically implements this:
 
 This patch breaks deadlock checking and statement_timeout, because
 backends already use SIGALRM.  You can't just take over that signal.
 It's possible that you could get things to work by treating this as an
 additional reason for SIGALRM, but that code is unreasonably complex
 already.  I'd suggest finding some other way.
 

I suspected that, but somehow managed to overlook it :( I guess it was
too tempting to use it. I'll start looking for alternatives.

regards,
Martin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-05 Thread Euler Taveira de Oliveira
Martin Pihlak escreveu:
 I suspected that, but somehow managed to overlook it :( I guess it was
 too tempting to use it. I'll start looking for alternatives.
 
If you can't afford a 500 msec pgstat time, then you need to make it
tunable. Another ideas are (i) turn on/off pgstat per table or database
and (ii) make the pgstat time tunable per table or database. You can use
the reloptions column to store these info. These workarounds are much
simpler than that you proposed and they're almost for free.


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing statistics write overhead

2008-09-05 Thread Tom Lane
Euler Taveira de Oliveira [EMAIL PROTECTED] writes:
 If you can't afford a 500 msec pgstat time, then you need to make it
 tunable. Another ideas are (i) turn on/off pgstat per table or database
 and (ii) make the pgstat time tunable per table or database. You can use
 the reloptions column to store these info. These workarounds are much
 simpler than that you proposed and they're almost for free.

For normal usage on-demand dumping would be a really good thing; it'd
cut the overhead of having stats on tremendously, especially for people
who don't really use 'em.  The particular signaling proposed here is
bogus, but if Martin can make it work in a cleaner fashion I think it's
likely a good idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers