Re: [HACKERS] Reducing stats collection overhead
On 31-7-2007 5:07 Alvaro Herrera wrote: Arjen van der Meijden wrote: Afaik Tom hadn't finished his patch when I was testing things, so I don't know. But we're in the process of benchmarking a new system (dual quad-core Xeon) and we'll have a look at how it performs in the postgres 8.2dev we used before, the stable 8.2.4 and a fresh HEAD-checkout (which we'll call 8.3dev). I'll let you guys (or at least Tom) know how they compare in our benchmark. So, ahem, did it work? :-) The machine turned out to have a faulty mainboard, so we had to concentrate on first figuring out why it was unstable and then whether the replacement mainboard did make it stable in a long durability test Of course that behaviour only appeared with mysql and not with postgresql, so we had to run our mysql-version of the benchmark a few hundred times, rather than testing various versions, untill the machine had to go in production. So we haven't tested postgresql 8.3dev on that machine, sorry. Best regards, Arjen On 18-5-2007 15:12 Alvaro Herrera wrote: Tom Lane wrote: Arjen van der Meijden told me that according to the tweakers.net benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed here that for small SELECT queries issued as separate transactions, there's a significant difference. I think much of the difference stems from the fact that we now have stats_row_level ON by default, and so every transaction sends a stats message that wasn't there by default in 8.2. When you're doing a few thousand transactions per second (not hard for small read-only queries) that adds up. So, did this patch make the performance problem go away? ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Reducing stats collection overhead
Arjen van der Meijden wrote: > Afaik Tom hadn't finished his patch when I was testing things, so I don't > know. But we're in the process of benchmarking a new system (dual quad-core > Xeon) and we'll have a look at how it performs in the postgres 8.2dev we > used before, the stable 8.2.4 and a fresh HEAD-checkout (which we'll call > 8.3dev). I'll let you guys (or at least Tom) know how they compare in our > benchmark. So, ahem, did it work? :-) > On 18-5-2007 15:12 Alvaro Herrera wrote: >> Tom Lane wrote: >>> Arjen van der Meijden told me that according to the tweakers.net >>> benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed >>> here that for small SELECT queries issued as separate transactions, >>> there's a significant difference. I think much of the difference stems >>> from the fact that we now have stats_row_level ON by default, and so >>> every transaction sends a stats message that wasn't there by default >>> in 8.2. When you're doing a few thousand transactions per second >>> (not hard for small read-only queries) that adds up. >> So, did this patch make the performance problem go away? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Reducing stats collection overhead
Afaik Tom hadn't finished his patch when I was testing things, so I don't know. But we're in the process of benchmarking a new system (dual quad-core Xeon) and we'll have a look at how it performs in the postgres 8.2dev we used before, the stable 8.2.4 and a fresh HEAD-checkout (which we'll call 8.3dev). I'll let you guys (or at least Tom) know how they compare in our benchmark. Best regards, Arjen On 18-5-2007 15:12 Alvaro Herrera wrote: Tom Lane wrote: Arjen van der Meijden told me that according to the tweakers.net benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed here that for small SELECT queries issued as separate transactions, there's a significant difference. I think much of the difference stems from the fact that we now have stats_row_level ON by default, and so every transaction sends a stats message that wasn't there by default in 8.2. When you're doing a few thousand transactions per second (not hard for small read-only queries) that adds up. So, did this patch make the performance problem go away? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Reducing stats collection overhead
Tom Lane wrote: > Arjen van der Meijden told me that according to the tweakers.net > benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed > here that for small SELECT queries issued as separate transactions, > there's a significant difference. I think much of the difference stems > from the fact that we now have stats_row_level ON by default, and so > every transaction sends a stats message that wasn't there by default > in 8.2. When you're doing a few thousand transactions per second > (not hard for small read-only queries) that adds up. So, did this patch make the performance problem go away? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Reducing stats collection overhead
Gregory Stark <[EMAIL PROTECTED]> writes: > If we want to have an idle_in_statement_timeout then we'll need to introduce a > select loop instead of just directly blocking on recv anyways. Does that mean > we may as well bite the bullet now? If we wanted such a timeout (which I personally don't) we wouldn't implement it with select because OpenSSL wouldn't cooperate. AFAICS this'd require setting a timer interrupt ... and then unsetting it when the client response comes back. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Reducing stats collection overhead
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> The first design that comes to mind is that at transaction end >> (pgstat_report_tabstat() time) we send a stats message only if at least >> X milliseconds have elapsed since we last sent one, where X is >> PGSTAT_STAT_INTERVAL or closely related to it. We also make sure to >> flush stats out before process exit. This approach ensures that in a >> lots-of-short-transactions scenario, we only need to send one stats >> message every X msec, not one per query. > If you're going to make it depend on the timestamp set by transaction > start, I'm all for it. Ah, you're worried about not adding an extra gettimeofday() call. Actually I was going to make it use the transaction-commit timestamp, which xact.c already does a kernel call for so it can put a timestamp in the xlog commit or abort record. We don't save that aside at the moment but easily could. Doing this would probably mean wanting to convert the timestamps stored in xlog commit/abort records from time_t to timestamptz; anyone have a problem with that? > FWIW, am I reading the code wrong or do we send the number of xact > commit and rollback multiple times in pgstat_report_one_tabstat, with > only the first one having non-zero counts? Maybe we could put these > counters in a separate message to reduce the size of the tabstat > messages themselves. (It may be that the total impact in bytes is > minimal, and the added overhead of an additional message is greater?) Yeah, that design seems fine to me as-is. We'd only be sending multiple messages if we have more than about 1K of tabstat records, so the overhead is only 16 bytes out of each additional 1K ... not a lot. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Reducing stats collection overhead
Tom Lane wrote: > The first design that comes to mind is that at transaction end > (pgstat_report_tabstat() time) we send a stats message only if at least > X milliseconds have elapsed since we last sent one, where X is > PGSTAT_STAT_INTERVAL or closely related to it. We also make sure to > flush stats out before process exit. This approach ensures that in a > lots-of-short-transactions scenario, we only need to send one stats > message every X msec, not one per query. If you're going to make it depend on the timestamp set by transaction start, I'm all for it. > The cost is possible delay of stats reports. I claim that any > transaction that makes a really sizable change in the stats will run > longer than X msec and therefore will send its stats immediately. I agree with this, particularly if it means we don't get to add another gettimeofday(). FWIW, am I reading the code wrong or do we send the number of xact commit and rollback multiple times in pgstat_report_one_tabstat, with only the first one having non-zero counts? Maybe we could put these counters in a separate message to reduce the size of the tabstat messages themselves. (It may be that the total impact in bytes is minimal, and the added overhead of an additional message is greater?) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Reducing stats collection overhead
Tom Lane wrote: The first design that comes to mind is that at transaction end (pgstat_report_tabstat() time) we send a stats message only if at least X milliseconds have elapsed since we last sent one, where X is PGSTAT_STAT_INTERVAL or closely related to it. We also make sure to flush stats out before process exit. This approach ensures that in a lots-of-short-transactions scenario, we only need to send one stats message every X msec, not one per query. The cost is possible delay of stats reports. I claim that any transaction that makes a really sizable change in the stats will run longer than X msec and therefore will send its stats immediately. Cases where a client does a small transaction after sleeping for awhile (more than X msec) will also send immediately. You might get a delay in reporting the last few transactions of a burst of short transactions, but how much does it matter? So I think that complicating the design with, say, a timeout counter to force out the stats after a sleep interval is not necessary. Doing so would add a couple of kernel calls to every client interaction so I'd really rather avoid that. Well and if this delaying of updating the stats has an effect on query time, then it also increases the likelihood of going past the X msec limit of that previously "small" query. So its sort of "self fixing" with the only risk of one query getting overly long due to lack of stats updating. regards, Lukas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Reducing stats collection overhead
"Tom Lane" <[EMAIL PROTECTED]> writes: > So I think that complicating the design with, say, a timeout counter to > force out the stats after a sleep interval is not necessary. Doing so would > add a couple of kernel calls to every client interaction so I'd really > rather avoid that. > > Any thoughts, better ideas? If we want to have an idle_in_statement_timeout then we'll need to introduce a select loop instead of just directly blocking on recv anyways. Does that mean we may as well bite the bullet now? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Reducing stats collection overhead
On Sun, 2007-04-29 at 00:44 -0400, Tom Lane wrote: > The first design that comes to mind is that at transaction end > (pgstat_report_tabstat() time) we send a stats message only if at least > X milliseconds have elapsed since we last sent one, where X is > PGSTAT_STAT_INTERVAL or closely related to it. We also make sure to > flush stats out before process exit. Sounds like a good general, long term solution. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reducing stats collection overhead
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Yes, it seems we will have to do something for 8.3. > > Yeah, we've kind of ignored any possible overhead of the stats mechanism > for awhile, but I think we've got to face up to this if we're gonna have > it on by default. > > > I assume the method > > below would reduce frequent updates of the stats_command_string too. > > No, stats_command_string is entirely independent now. Oh, right, we used shared memory. No wonder it isn't on the TODO list anymore. ;-) -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Reducing stats collection overhead
Bruce Momjian <[EMAIL PROTECTED]> writes: > Yes, it seems we will have to do something for 8.3. Yeah, we've kind of ignored any possible overhead of the stats mechanism for awhile, but I think we've got to face up to this if we're gonna have it on by default. > I assume the method > below would reduce frequent updates of the stats_command_string too. No, stats_command_string is entirely independent now. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reducing stats collection overhead
Yes, it seems we will have to do something for 8.3. I assume the method below would reduce frequent updates of the stats_command_string too. --- Tom Lane wrote: > Arjen van der Meijden told me that according to the tweakers.net > benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed > here that for small SELECT queries issued as separate transactions, > there's a significant difference. I think much of the difference stems > from the fact that we now have stats_row_level ON by default, and so > every transaction sends a stats message that wasn't there by default > in 8.2. When you're doing a few thousand transactions per second > (not hard for small read-only queries) that adds up. > > It seems to me that this could be fixed fairly easily by allowing the > stats to accumulate across multiple small transactions before sending > a message. There's surely not much point in kicking stats out quickly > when the stats collector only reports them to the world every half > second anyway. > > The first design that comes to mind is that at transaction end > (pgstat_report_tabstat() time) we send a stats message only if at least > X milliseconds have elapsed since we last sent one, where X is > PGSTAT_STAT_INTERVAL or closely related to it. We also make sure to > flush stats out before process exit. This approach ensures that in a > lots-of-short-transactions scenario, we only need to send one stats > message every X msec, not one per query. The cost is possible delay of > stats reports. I claim that any transaction that makes a really sizable > change in the stats will run longer than X msec and therefore will send > its stats immediately. Cases where a client does a small transaction > after sleeping for awhile (more than X msec) will also send immediately. > You might get a delay in reporting the last few transactions of a burst > of short transactions, but how much does it matter? So I think that > complicating the design with, say, a timeout counter to force out the > stats after a sleep interval is not necessary. Doing so would add a > couple of kernel calls to every client interaction so I'd really rather > avoid that. > > Any thoughts, better ideas? > > regards, tom lane > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend