Re: [HACKERS] Reducing stats collection overhead

2007-07-31 Thread Arjen van der Meijden

On 31-7-2007 5:07 Alvaro Herrera wrote:

Arjen van der Meijden wrote:
Afaik Tom hadn't finished his patch when I was testing things, so I don't 
know. But we're in the process of benchmarking a new system (dual quad-core 
Xeon) and we'll have a look at how it performs in the postgres 8.2dev we 
used before, the stable 8.2.4 and a fresh HEAD-checkout (which we'll call 
8.3dev). I'll let you guys (or at least Tom) know how they compare in our 
benchmark.


So, ahem, did it work? :-)


The machine turned out to have a faulty mainboard, so we had to 
concentrate on first figuring out why it was unstable and then whether 
the replacement mainboard did make it stable in a long durability 
test Of course that behaviour only appeared with mysql and not with 
postgresql, so we had to run our mysql-version of the benchmark a few 
hundred times, rather than testing various versions, untill the machine 
had to go in production.


So we haven't tested postgresql 8.3dev on that machine, sorry.

Best regards,

Arjen





On 18-5-2007 15:12 Alvaro Herrera wrote:

Tom Lane wrote:

Arjen van der Meijden told me that according to the tweakers.net
benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
here that for small SELECT queries issued as separate transactions,
there's a significant difference.  I think much of the difference stems
from the fact that we now have stats_row_level ON by default, and so
every transaction sends a stats message that wasn't there by default
in 8.2.  When you're doing a few thousand transactions per second
(not hard for small read-only queries) that adds up.

So, did this patch make the performance problem go away?





---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Reducing stats collection overhead

2007-07-30 Thread Alvaro Herrera
Arjen van der Meijden wrote:
 Afaik Tom hadn't finished his patch when I was testing things, so I don't 
 know. But we're in the process of benchmarking a new system (dual quad-core 
 Xeon) and we'll have a look at how it performs in the postgres 8.2dev we 
 used before, the stable 8.2.4 and a fresh HEAD-checkout (which we'll call 
 8.3dev). I'll let you guys (or at least Tom) know how they compare in our 
 benchmark.

So, ahem, did it work? :-)


 On 18-5-2007 15:12 Alvaro Herrera wrote:
 Tom Lane wrote:
 Arjen van der Meijden told me that according to the tweakers.net
 benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
 here that for small SELECT queries issued as separate transactions,
 there's a significant difference.  I think much of the difference stems
 from the fact that we now have stats_row_level ON by default, and so
 every transaction sends a stats message that wasn't there by default
 in 8.2.  When you're doing a few thousand transactions per second
 (not hard for small read-only queries) that adds up.
 So, did this patch make the performance problem go away?


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Reducing stats collection overhead

2007-05-19 Thread Arjen van der Meijden
Afaik Tom hadn't finished his patch when I was testing things, so I 
don't know. But we're in the process of benchmarking a new system (dual 
quad-core Xeon) and we'll have a look at how it performs in the postgres 
8.2dev we used before, the stable 8.2.4 and a fresh HEAD-checkout (which 
we'll call 8.3dev). I'll let you guys (or at least Tom) know how they 
compare in our benchmark.


Best regards,

Arjen

On 18-5-2007 15:12 Alvaro Herrera wrote:

Tom Lane wrote:

Arjen van der Meijden told me that according to the tweakers.net
benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
here that for small SELECT queries issued as separate transactions,
there's a significant difference.  I think much of the difference stems
from the fact that we now have stats_row_level ON by default, and so
every transaction sends a stats message that wasn't there by default
in 8.2.  When you're doing a few thousand transactions per second
(not hard for small read-only queries) that adds up.


So, did this patch make the performance problem go away?



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Reducing stats collection overhead

2007-05-18 Thread Alvaro Herrera
Tom Lane wrote:
 Arjen van der Meijden told me that according to the tweakers.net
 benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
 here that for small SELECT queries issued as separate transactions,
 there's a significant difference.  I think much of the difference stems
 from the fact that we now have stats_row_level ON by default, and so
 every transaction sends a stats message that wasn't there by default
 in 8.2.  When you're doing a few thousand transactions per second
 (not hard for small read-only queries) that adds up.

So, did this patch make the performance problem go away?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Bruce Momjian

Yes, it seems we will have to do something for 8.3.  I assume the method
below would reduce frequent updates of the stats_command_string too.

---

Tom Lane wrote:
 Arjen van der Meijden told me that according to the tweakers.net
 benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
 here that for small SELECT queries issued as separate transactions,
 there's a significant difference.  I think much of the difference stems
 from the fact that we now have stats_row_level ON by default, and so
 every transaction sends a stats message that wasn't there by default
 in 8.2.  When you're doing a few thousand transactions per second
 (not hard for small read-only queries) that adds up.
 
 It seems to me that this could be fixed fairly easily by allowing the
 stats to accumulate across multiple small transactions before sending
 a message.  There's surely not much point in kicking stats out quickly
 when the stats collector only reports them to the world every half
 second anyway.
 
 The first design that comes to mind is that at transaction end
 (pgstat_report_tabstat() time) we send a stats message only if at least
 X milliseconds have elapsed since we last sent one, where X is
 PGSTAT_STAT_INTERVAL or closely related to it.  We also make sure to
 flush stats out before process exit.  This approach ensures that in a
 lots-of-short-transactions scenario, we only need to send one stats
 message every X msec, not one per query.  The cost is possible delay of
 stats reports.  I claim that any transaction that makes a really sizable
 change in the stats will run longer than X msec and therefore will send
 its stats immediately.  Cases where a client does a small transaction
 after sleeping for awhile (more than X msec) will also send immediately.
 You might get a delay in reporting the last few transactions of a burst
 of short transactions, but how much does it matter?  So I think that
 complicating the design with, say, a timeout counter to force out the
 stats after a sleep interval is not necessary.  Doing so would add a
 couple of kernel calls to every client interaction so I'd really rather
 avoid that.
 
 Any thoughts, better ideas?
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Yes, it seems we will have to do something for 8.3.

Yeah, we've kind of ignored any possible overhead of the stats mechanism
for awhile, but I think we've got to face up to this if we're gonna have
it on by default.

 I assume the method
 below would reduce frequent updates of the stats_command_string too.

No, stats_command_string is entirely independent now.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Yes, it seems we will have to do something for 8.3.
 
 Yeah, we've kind of ignored any possible overhead of the stats mechanism
 for awhile, but I think we've got to face up to this if we're gonna have
 it on by default.
 
  I assume the method
  below would reduce frequent updates of the stats_command_string too.
 
 No, stats_command_string is entirely independent now.

Oh, right, we used shared memory.  No wonder it isn't on the TODO list
anymore.  ;-)

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Simon Riggs
On Sun, 2007-04-29 at 00:44 -0400, Tom Lane wrote:

 The first design that comes to mind is that at transaction end
 (pgstat_report_tabstat() time) we send a stats message only if at least
 X milliseconds have elapsed since we last sent one, where X is
 PGSTAT_STAT_INTERVAL or closely related to it.  We also make sure to
 flush stats out before process exit.

Sounds like a good general, long term solution.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 So I think that complicating the design with, say, a timeout counter to
 force out the stats after a sleep interval is not necessary. Doing so would
 add a couple of kernel calls to every client interaction so I'd really
 rather avoid that.

 Any thoughts, better ideas?

If we want to have an idle_in_statement_timeout then we'll need to introduce a
select loop instead of just directly blocking on recv anyways. Does that mean
we may as well bite the bullet now?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Lukas Kahwe Smith

Tom Lane wrote:


The first design that comes to mind is that at transaction end
(pgstat_report_tabstat() time) we send a stats message only if at least
X milliseconds have elapsed since we last sent one, where X is
PGSTAT_STAT_INTERVAL or closely related to it.  We also make sure to
flush stats out before process exit.  This approach ensures that in a
lots-of-short-transactions scenario, we only need to send one stats
message every X msec, not one per query.  The cost is possible delay of
stats reports.  I claim that any transaction that makes a really sizable
change in the stats will run longer than X msec and therefore will send
its stats immediately.  Cases where a client does a small transaction
after sleeping for awhile (more than X msec) will also send immediately.
You might get a delay in reporting the last few transactions of a burst
of short transactions, but how much does it matter?  So I think that
complicating the design with, say, a timeout counter to force out the
stats after a sleep interval is not necessary.  Doing so would add a
couple of kernel calls to every client interaction so I'd really rather
avoid that.


Well and if this delaying of updating the stats has an effect on query 
time, then it also increases the likelihood of going past the X msec 
limit of that previously small query. So its sort of self fixing 
with the only risk of one query getting overly long due to lack of stats 
updating.


regards,
Lukas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Alvaro Herrera
Tom Lane wrote:

 The first design that comes to mind is that at transaction end
 (pgstat_report_tabstat() time) we send a stats message only if at least
 X milliseconds have elapsed since we last sent one, where X is
 PGSTAT_STAT_INTERVAL or closely related to it.  We also make sure to
 flush stats out before process exit.  This approach ensures that in a
 lots-of-short-transactions scenario, we only need to send one stats
 message every X msec, not one per query.

If you're going to make it depend on the timestamp set by transaction
start, I'm all for it.

 The cost is possible delay of stats reports.  I claim that any
 transaction that makes a really sizable change in the stats will run
 longer than X msec and therefore will send its stats immediately.

I agree with this, particularly if it means we don't get to add another
gettimeofday().


FWIW, am I reading the code wrong or do we send the number of xact
commit and rollback multiple times in pgstat_report_one_tabstat, with
only the first one having non-zero counts?  Maybe we could put these
counters in a separate message to reduce the size of the tabstat
messages themselves.  (It may be that the total impact in bytes is
minimal, and the added overhead of an additional message is greater?)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The first design that comes to mind is that at transaction end
 (pgstat_report_tabstat() time) we send a stats message only if at least
 X milliseconds have elapsed since we last sent one, where X is
 PGSTAT_STAT_INTERVAL or closely related to it.  We also make sure to
 flush stats out before process exit.  This approach ensures that in a
 lots-of-short-transactions scenario, we only need to send one stats
 message every X msec, not one per query.

 If you're going to make it depend on the timestamp set by transaction
 start, I'm all for it.

Ah, you're worried about not adding an extra gettimeofday() call.
Actually I was going to make it use the transaction-commit timestamp,
which xact.c already does a kernel call for so it can put a timestamp in
the xlog commit or abort record.  We don't save that aside at the moment
but easily could.

Doing this would probably mean wanting to convert the timestamps
stored in xlog commit/abort records from time_t to timestamptz;
anyone have a problem with that?

 FWIW, am I reading the code wrong or do we send the number of xact
 commit and rollback multiple times in pgstat_report_one_tabstat, with
 only the first one having non-zero counts?  Maybe we could put these
 counters in a separate message to reduce the size of the tabstat
 messages themselves.  (It may be that the total impact in bytes is
 minimal, and the added overhead of an additional message is greater?)

Yeah, that design seems fine to me as-is.  We'd only be sending multiple
messages if we have more than about 1K of tabstat records, so the
overhead is only 16 bytes out of each additional 1K ... not a lot.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Reducing stats collection overhead

2007-04-29 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 If we want to have an idle_in_statement_timeout then we'll need to introduce a
 select loop instead of just directly blocking on recv anyways. Does that mean
 we may as well bite the bullet now?

If we wanted such a timeout (which I personally don't) we wouldn't
implement it with select because OpenSSL wouldn't cooperate.  AFAICS
this'd require setting a timer interrupt ... and then unsetting it when
the client response comes back.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Reducing stats collection overhead

2007-04-28 Thread Tom Lane
Arjen van der Meijden told me that according to the tweakers.net
benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
here that for small SELECT queries issued as separate transactions,
there's a significant difference.  I think much of the difference stems
from the fact that we now have stats_row_level ON by default, and so
every transaction sends a stats message that wasn't there by default
in 8.2.  When you're doing a few thousand transactions per second
(not hard for small read-only queries) that adds up.

It seems to me that this could be fixed fairly easily by allowing the
stats to accumulate across multiple small transactions before sending
a message.  There's surely not much point in kicking stats out quickly
when the stats collector only reports them to the world every half
second anyway.

The first design that comes to mind is that at transaction end
(pgstat_report_tabstat() time) we send a stats message only if at least
X milliseconds have elapsed since we last sent one, where X is
PGSTAT_STAT_INTERVAL or closely related to it.  We also make sure to
flush stats out before process exit.  This approach ensures that in a
lots-of-short-transactions scenario, we only need to send one stats
message every X msec, not one per query.  The cost is possible delay of
stats reports.  I claim that any transaction that makes a really sizable
change in the stats will run longer than X msec and therefore will send
its stats immediately.  Cases where a client does a small transaction
after sleeping for awhile (more than X msec) will also send immediately.
You might get a delay in reporting the last few transactions of a burst
of short transactions, but how much does it matter?  So I think that
complicating the design with, say, a timeout counter to force out the
stats after a sleep interval is not necessary.  Doing so would add a
couple of kernel calls to every client interaction so I'd really rather
avoid that.

Any thoughts, better ideas?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq