Re: [HACKERS] Wait events monitoring future development
Hi, On 2016-08-07 14:03:17 +0200, Ilya Kosmodemiansky wrote: > Wait event monitoring looks ones again stuck on the way through community > approval in spite of huge progress done last year in that direction. I see little evidence of that. If you consider "please do some reasonable benchmarks" as being stuck... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Tue, Aug 9, 2016 at 12:07 AM, Tsunakawa, Takayukiwrote: > As another idea, we can stand on the middle ground. Interestingly, MySQL > also enables their event monitoring (Performance Schema) by default, but not > all events are collected. I guess highly encountered events are not > collected by default to minimize the overhead. Yes, I think that's a sensible approach. I can't see enabling by default a feature that significantly regresses performance. We work too hard to improve performance to throw very much of it away for any one feature, even a feature that a lot of people like. What I really like about what got committed to 9.6 is that it's so cheap we should be able to use for lots of other things - latch events, network I/O, disk I/O, etc. without hurting performance at all. But if we start timing those events, it's going to be really expensive. Even just counting them or keeping a history will cost a lot more than just publishing them while they're active, which is what we're doing now. > BTW, I remember EnterpriseDB has a wait event monitoring feature. Is it > disabled by default? What was the overhead? Timed events in Advanced Server are disabled by default. I haven't actually tested the overhead myself and I don't remember exactly what the numbers were the last time someone else did, but I think if you turned edb_timed_statistics on, it's pretty expensive. If we can agree on something sensible here, I imagine we'll get rid of that feature in Advanced Server in favor of whatever the community settles on. But if the community agrees to turn on something by default that costs a measurable percentage in performance, I predict that Advanced Server 10 will ship with a different default for that feature than PostgreSQL 10. Personally, I think too much of this thread (and previous threads) is devoted to arguing about whether it's OK to make performance worse, and by how much we'd be willing to make it worse. What I think we ought to be talking about is how to design a feature that produces the most useful data for the least performance cost possible, like by avoiding measuring wait times for events that are very frequent or waits that are very short. Or, maybe we could have a background process that updates a timestamp in shared memory every millisecond, and other processes can read that value instead of making a system call. I think on Linux systems with fast clocks the operating system basically does something like that for you, but there might be other systems where it helps. Of course, it could also skew the results if the system is so overloaded that the clock-updater process gets descheduled for a lengthy period of time. Anyway, I disagree with the idea that this feature is stalled or blocked in some way. I (and quite a few other people, though not everyone) oppose making performance significantly worse in the default configuration. I oppose that regardless of whether it is a hypothetical patch for this feature that causes the problem or whether it is a hypothetical patch for some other feature that causes the problem. I am not otherwise opposed to more work in this area; in fact, I'm rather in favor of it. But you can count on me to argue against pretty much everything that causes a performance regression, whatever the reason. Virtually every release, at least one developer proposes some patch that slows the server down by "only" 1-2%. If we'd accepted all of the patches that were shot down because of such impacts, we'd have lost a very big chunk of performance between the time I started working on PostgreSQL and now. As it is, our single-threaded performance seems to have regressed noticeably since 9.1: http://bonesmoses.org/2016/01/08/pg-phriday-how-far-weve-come/ I think that's awful. But if we'd accepted all of those patches that cost "only" one or two percentage points, it would probably be -15% or -25% rather than -4.4%. I think that if we want to really be successful as a project, we need to make that number go UP, not down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Wed, Aug 10, 2016 at 11:37:36PM +0900, Satoshi Nagayasu wrote: > Agreed. > > If people are facing with some difficult situation in terms of performance, > they may accept some (one-time) overhead to resolve the issue. > But if they don't have (recognize) any issue, they may not. > > That's one of the realities according to my experiences. Yes. Many people are arguing for specific defaults based on what _they_ would want, not what the average user would want. Sophisticated users will know about this and turn it on when desired. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
2016/08/10 23:22 "Bruce Momjian": > > On Wed, Aug 10, 2016 at 05:14:52PM +0300, Alexander Korotkov wrote: > > On Tue, Aug 9, 2016 at 5:37 AM, Bruce Momjian wrote: > > > > On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: > > > I hope wait event monitoring will be on by default even if the overhead > > is not > > > almost zero, because the data needs to be readily available for faster > > > troubleshooting. IMO, the benefit would be worth even 10% overhead. If > > you > > > disable it by default because of overhead, how can we convince users to > > enable > > > it in production systems to solve some performance problem? I’m afraid > > severe > > > users would say “we can’t change any setting that might cause more > > trouble, so > > > investigate the cause with existing information.” > > > > If you want to know why people are against enabling this monitoring by > > default, above is the reason. What percentage of people do you think > > would be willing to take a 10% performance penalty for monitoring like > > this? I would bet very few, but the argument above doesn't seem to > > address the fact it is a small percentage. > > > > > > Just two notes from me: > > > > 1) 10% overhead from monitoring wait events is just an idea without any proof > > so soon. > > 2) We already have functionality which trades insight into database with way > > more huge overhead. auto_explain.log_analyze = true can slowdown queries *in > > times*. Do you think we should remove it? > > The point is not removing it, the point is whether > auto_explain.log_analyze = true should be enabled by default, and I > think no one wants to do that. Agreed. If people are facing with some difficult situation in terms of performance, they may accept some (one-time) overhead to resolve the issue. But if they don't have (recognize) any issue, they may not. That's one of the realities according to my experiences. Regards,
Re: [HACKERS] Wait events monitoring future development
On Wed, Aug 10, 2016 at 05:14:52PM +0300, Alexander Korotkov wrote: > On Tue, Aug 9, 2016 at 5:37 AM, Bruce Momjianwrote: > > On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: > > I hope wait event monitoring will be on by default even if the overhead > is not > > almost zero, because the data needs to be readily available for faster > > troubleshooting. IMO, the benefit would be worth even 10% overhead. If > you > > disable it by default because of overhead, how can we convince users to > enable > > it in production systems to solve some performance problem? I’m afraid > severe > > users would say “we can’t change any setting that might cause more > trouble, so > > investigate the cause with existing information.” > > If you want to know why people are against enabling this monitoring by > default, above is the reason. What percentage of people do you think > would be willing to take a 10% performance penalty for monitoring like > this? I would bet very few, but the argument above doesn't seem to > address the fact it is a small percentage. > > > Just two notes from me: > > 1) 10% overhead from monitoring wait events is just an idea without any proof > so soon. > 2) We already have functionality which trades insight into database with way > more huge overhead. auto_explain.log_analyze = true can slowdown queries *in > times*. Do you think we should remove it? The point is not removing it, the point is whether auto_explain.log_analyze = true should be enabled by default, and I think no one wants to do that. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Tue, Aug 9, 2016 at 5:37 AM, Bruce Momjianwrote: > On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: > > I hope wait event monitoring will be on by default even if the overhead > is not > > almost zero, because the data needs to be readily available for faster > > troubleshooting. IMO, the benefit would be worth even 10% overhead. If > you > > disable it by default because of overhead, how can we convince users to > enable > > it in production systems to solve some performance problem? I’m afraid > severe > > users would say “we can’t change any setting that might cause more > trouble, so > > investigate the cause with existing information.” > > If you want to know why people are against enabling this monitoring by > default, above is the reason. What percentage of people do you think > would be willing to take a 10% performance penalty for monitoring like > this? I would bet very few, but the argument above doesn't seem to > address the fact it is a small percentage. > Just two notes from me: 1) 10% overhead from monitoring wait events is just an idea without any proof so soon. 2) We already have functionality which trades insight into database with way more huge overhead. auto_explain.log_analyze = true can slowdown queries *in times*. Do you think we should remove it? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Wait events monitoring future development
On Tue, Aug 9, 2016 at 12:47 AM, Ilya Kosmodemiansky < ilya.kosmodemian...@postgresql-consulting.com> wrote: > On Mon, Aug 8, 2016 at 7:03 PM, Bruce Momjianwrote: > > It seems asking users to run pg_test_timing before deploying to check > > the overhead would be sufficient. > > I'am not sure. Time measurement for waits is slightly more complicated > than a time measurement for explain analyze: a good workload plus > using gettimeofday in a straightforward manner can cause huge > overhead. What makes you think so? Both my thoughts and observations are opposite: it's way easier to get huge overhead from EXPLAIN ANALYZE than from measuring wait events. Current wait events are quite huge events itself related to syscalls, context switches and so on. In contrast EXPLAIN ANALYZE calls gettimeofday for very cheap operations like transfer tuple from one executor node to another. > Thats why a proper testing is important - if we can see a > significant performance drop if we have for example large > shared_buffers with the same concurrency, that shows gettimeofday is > too expensive to use. Am I correct, that we do not have such accurate > tests now? > Do you think that large shared buffers is a kind a stress test for wait events monitoring? If so, why? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Wait events monitoring future development
On 10 August 2016 at 07:09, Jim Nasbywrote: > > The downside to leaving stuff like this off by default is users won't > remember it's there when they need it. At best, that means they spend more > time debugging something than they need to. At worse, it means they suffer > a production outage for longer than they need to, and that can easily > exceed many months/years worth of the extra cost from the monitoring > overhead. > Yeah.. and I've got to say, the whole "it'll hurt benchmarks if it's on by default" argument falls flat on its face when you look at our defaults for shared_buffers, etc. If you don't tune Pg, it runs reliably, but slowly. If this proves to have "reasonable" overhead, I'd be inclined to say it should just be on. I frequently wish auto_explain and pg_stat_statements were in-core and on-by-default so when someone calls saying things got slow the historical data is already there. I'm sure this'll be the same. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Wait events monitoring future development
From: pgsql-hackers-ow...@postgresql.org > Lets put this in perspective: there's tons of companies that spend thousands > of dollars per month extra by running un-tuned systems in cloud environments. > I almost called that "waste" but in reality it should be a simple business > question: is it worth more to the company to spend resources on reducing > the AWS bill or rolling out new features? > It's something that can be estimated and a rational business decision made. > > Where things become completely *irrational* is when a developer reads > something like "plpgsql blocks with an EXCEPTION handler are more expensive" > and they freak out and spend a bunch of time trying to avoid them, without > even the faintest idea of what that overhead actually is. > More important, they haven't the faintest idea of what that overhead costs > the company, vs what it costs the company for them to spend an extra hour > trying to avoid the EXCEPTION (and probably introducing code that's far > more bug-prone in the process). > > So in reality, the only people likely to notice even something as large > as a 10% hit are those that were already close to maxing out their hardware > anyway. > > The downside to leaving stuff like this off by default is users won't > remember it's there when they need it. At best, that means they spend more > time debugging something than they need to. At worse, it means they suffer > a production outage for longer than they need to, and that can easily exceed > many months/years worth of the extra cost from the monitoring overhead. I'd rather like this way of positive thinking. It will be better to think of the event monitoring as a positive feature for (daily) proactive improvement, not only as a debugging feature which gives negative image. For example, pgAdmin4 can display 10 most time-consuming events and their solutions. The DBA initially places the database and WAL on the same volume. As the system grows and the write workload increases, the DBA can get a suggestion from pgAdmin4 that he can prepare for the system growth by placing WAL on another volume to reduce WALWriteLock wait events. This is not debugging, but proactive monitoring. > > As another idea, we can stand on the middle ground. Interestingly, MySQL > also enables their event monitoring (Performance Schema) by default, but > not all events are collected. I guess highly encountered events are not > collected by default to minimize the overhead. > > That's what we currently do with several track_* and log_*_stats GUCs, > several of which I forgot even existed until just now. Since there's question > over the actual overhead maybe that's a prudent approach for now, but I > think we should be striving to enable these things ASAP. Agreed. And as Bruce said, it may be better to be able to disable collection of some events that have visible impact on performance. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On 8/8/16 11:07 PM, Tsunakawa, Takayuki wrote: From: pgsql-hackers-ow...@postgresql.org > If you want to know why people are against enabling this monitoring by > default, above is the reason. What percentage of people do you think would > be willing to take a 10% performance penalty for monitoring like this? I > would bet very few, but the argument above doesn't seem to address the fact > it is a small percentage. > > In fact, the argument above goes even farther, saying that we should enable > it all the time because people will be unwilling to enable it on their own. > I have to question the value of the information if users are not willing > to enable it. And the solution proposed is to force the 10% default overhead > on everyone, whether they are currently doing debugging, whether they will > ever do this level of debugging, because people will be too scared to enable > it. (Yes, I think Oracle took this > approach.) Lets put this in perspective: there's tons of companies that spend thousands of dollars per month extra by running un-tuned systems in cloud environments. I almost called that "waste" but in reality it should be a simple business question: is it worth more to the company to spend resources on reducing the AWS bill or rolling out new features? It's something that can be estimated and a rational business decision made. Where things become completely *irrational* is when a developer reads something like "plpgsql blocks with an EXCEPTION handler are more expensive" and they freak out and spend a bunch of time trying to avoid them, without even the faintest idea of what that overhead actually is. More important, they haven't the faintest idea of what that overhead costs the company, vs what it costs the company for them to spend an extra hour trying to avoid the EXCEPTION (and probably introducing code that's far more bug-prone in the process). So in reality, the only people likely to notice even something as large as a 10% hit are those that were already close to maxing out their hardware anyway. The downside to leaving stuff like this off by default is users won't remember it's there when they need it. At best, that means they spend more time debugging something than they need to. At worse, it means they suffer a production outage for longer than they need to, and that can easily exceed many months/years worth of the extra cost from the monitoring overhead. > We can talk about this feature all we want, but if we are not willing to > be realistic in how much performance penalty the _average_ user is willing > to lose to have this monitoring, I fear we will make little progress on > this feature. OK, 10% was an overstatement. Anyway, As Amit said, we can discuss the default value based on the performance evaluation before release. As another idea, we can stand on the middle ground. Interestingly, MySQL also enables their event monitoring (Performance Schema) by default, but not all events are collected. I guess highly encountered events are not collected by default to minimize the overhead. That's what we currently do with several track_* and log_*_stats GUCs, several of which I forgot even existed until just now. Since there's question over the actual overhead maybe that's a prudent approach for now, but I think we should be striving to enable these things ASAP. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Tue, Aug 9, 2016 at 04:17:28AM +, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org > > I used to think of that this kind of features should be enabled by default, > > because when I was working at the previous company, I had only few features > > to understand what is happening inside PostgreSQL by observing production > > databases. I needed those features enabled in the production databases when > > I was called. > > > > However, now I have another opinion. When we release the next major release > > saying 10.0 with the wait monitoring, many people will start their benchmark > > test with a configuration with *the default values*, and if they see some > > performance decrease, for example around 10%, they will be talking about > > it as the performance decrease in PostgreSQL 10.0. It means PostgreSQL will > > be facing difficult reputation. > > > > So, I agree with the features should be disabled by default for a while. > > I understand your feeling well. This is a difficult decision. Let's hope > for trivial overhead. I think the goal is that some internal tracking can be enabled by default and some internal or external tool can be turned on and off to get more fine-grained statistics about the event durations. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
From: pgsql-hackers-ow...@postgresql.org > I used to think of that this kind of features should be enabled by default, > because when I was working at the previous company, I had only few features > to understand what is happening inside PostgreSQL by observing production > databases. I needed those features enabled in the production databases when > I was called. > > However, now I have another opinion. When we release the next major release > saying 10.0 with the wait monitoring, many people will start their benchmark > test with a configuration with *the default values*, and if they see some > performance decrease, for example around 10%, they will be talking about > it as the performance decrease in PostgreSQL 10.0. It means PostgreSQL will > be facing difficult reputation. > > So, I agree with the features should be disabled by default for a while. I understand your feeling well. This is a difficult decision. Let's hope for trivial overhead. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
From: pgsql-hackers-ow...@postgresql.org > If you want to know why people are against enabling this monitoring by > default, above is the reason. What percentage of people do you think would > be willing to take a 10% performance penalty for monitoring like this? I > would bet very few, but the argument above doesn't seem to address the fact > it is a small percentage. > > In fact, the argument above goes even farther, saying that we should enable > it all the time because people will be unwilling to enable it on their own. > I have to question the value of the information if users are not willing > to enable it. And the solution proposed is to force the 10% default overhead > on everyone, whether they are currently doing debugging, whether they will > ever do this level of debugging, because people will be too scared to enable > it. (Yes, I think Oracle took this > approach.) > > We can talk about this feature all we want, but if we are not willing to > be realistic in how much performance penalty the _average_ user is willing > to lose to have this monitoring, I fear we will make little progress on > this feature. OK, 10% was an overstatement. Anyway, As Amit said, we can discuss the default value based on the performance evaluation before release. As another idea, we can stand on the middle ground. Interestingly, MySQL also enables their event monitoring (Performance Schema) by default, but not all events are collected. I guess highly encountered events are not collected by default to minimize the overhead. http://dev.mysql.com/doc/refman/5.7/en/performance-schema-quick-start.html -- Assuming that the Performance Schema is available, it is enabled by default. ... [mysqld] performance_schema=ON ... Initially, not all instruments and consumers are enabled, so the performance schema does not collect all events. To turn all of these on and enable event timing, execute two statements (the row counts may differ depending on MySQL version): mysql> UPDATE setup_instruments SET ENABLED = 'YES', TIMED = 'YES'; Query OK, 560 rows affected (0.04 sec) mysql> UPDATE setup_consumers SET ENABLED = 'YES'; Query OK, 10 rows affected (0.00 sec) -- BTW, I remember EnterpriseDB has a wait event monitoring feature. Is it disabled by default? What was the overhead? Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
2016-08-07 21:03 GMT+09:00 Ilya Kosmodemiansky: > I've summarized Wait events monitoring discussion at Developer unconference > in Ottawa this year on wiki: > > https://wiki.postgresql.org/wiki/PgCon_2016_Developer_Unconference/Wait_events_monitoring > > (Thanks to Alexander Korotkov for patiently pushing me to make this thing > finally done) Thanks for your effort to make us move forward. > If you attended, fill free to point me out if I missed something, I will put > it on the wiki too. > > Wait event monitoring looks ones again stuck on the way through community > approval in spite of huge progress done last year in that direction. The > importance of the topic is beyond discussion now, if you talk to any > PostgreSQL person about implementing such a tool in Postgres and if the > person does not get exited, probably you talk to a full-time PostgreSQL > developer;-) Obviously it needs a better design, both the user interface and > implementation, and perhaps this is why full-time developers are still > sceptical. > > In order to move forward, imho we need at least some steps, whose steps can > be done in parallel > > 1. Further requirements need to be collected from DBAs. > >If you are a PostgreSQL DBA with Oracle experience and use perf for > troubleshooting Postgres - you are an ideal person to share your experience, > but everyone is welcome. > > 2. Further pg_wait_sampling performance testing needed and in different > environments. > >According to developers, overhead is small, but many people have doubts > that it can be much more significant for intensive workloads. Obviously, it > is not an easy task to test, because you need to put doubtfully > non-production ready code into mission-critical production for such tests. >As a result it will be clear if this design should be abandoned and we > need to think about less-invasive solutions or this design is acceptable. > > Any thoughts? Seems a good starting point. I'm interested in both, and I would like to contribute with running (or writing) several tests. Regards, -- Satoshi Nagayasu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
2016-08-09 11:49 GMT+09:00 Joshua D. Drake: > On 08/08/2016 07:37 PM, Bruce Momjian wrote: >> >> On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: >>> >>> I hope wait event monitoring will be on by default even if the overhead >>> is not >>> almost zero, because the data needs to be readily available for faster >>> troubleshooting. IMO, the benefit would be worth even 10% overhead. If >>> you >>> disable it by default because of overhead, how can we convince users to >>> enable >>> it in production systems to solve some performance problem? I’m afraid >>> severe >>> users would say “we can’t change any setting that might cause more >>> trouble, so >>> investigate the cause with existing information.” >> >> >> If you want to know why people are against enabling this monitoring by >> default, above is the reason. What percentage of people do you think >> would be willing to take a 10% performance penalty for monitoring like >> this? I would bet very few, but the argument above doesn't seem to >> address the fact it is a small percentage. > > > I would argue it is zero. There are definitely users for this feature but to > enable it by default is looking for trouble. *MOST* users do not need this. I used to think of that this kind of features should be enabled by default, because when I was working at the previous company, I had only few features to understand what is happening inside PostgreSQL by observing production databases. I needed those features enabled in the production databases when I was called. However, now I have another opinion. When we release the next major release saying 10.0 with the wait monitoring, many people will start their benchmark test with a configuration with *the default values*, and if they see some performance decrease, for example around 10%, they will be talking about it as the performance decrease in PostgreSQL 10.0. It means PostgreSQL will be facing difficult reputation. So, I agree with the features should be disabled by default for a while. Regards, -- Satoshi Nagayasu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On 08/08/2016 07:37 PM, Bruce Momjian wrote: On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: I hope wait event monitoring will be on by default even if the overhead is not almost zero, because the data needs to be readily available for faster troubleshooting. IMO, the benefit would be worth even 10% overhead. If you disable it by default because of overhead, how can we convince users to enable it in production systems to solve some performance problem? I’m afraid severe users would say “we can’t change any setting that might cause more trouble, so investigate the cause with existing information.” If you want to know why people are against enabling this monitoring by default, above is the reason. What percentage of people do you think would be willing to take a 10% performance penalty for monitoring like this? I would bet very few, but the argument above doesn't seem to address the fact it is a small percentage. I would argue it is zero. There are definitely users for this feature but to enable it by default is looking for trouble. *MOST* users do not need this. Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: > I hope wait event monitoring will be on by default even if the overhead is not > almost zero, because the data needs to be readily available for faster > troubleshooting. IMO, the benefit would be worth even 10% overhead. If you > disable it by default because of overhead, how can we convince users to enable > it in production systems to solve some performance problem? I’m afraid severe > users would say “we can’t change any setting that might cause more trouble, so > investigate the cause with existing information.” If you want to know why people are against enabling this monitoring by default, above is the reason. What percentage of people do you think would be willing to take a 10% performance penalty for monitoring like this? I would bet very few, but the argument above doesn't seem to address the fact it is a small percentage. In fact, the argument above goes even farther, saying that we should enable it all the time because people will be unwilling to enable it on their own. I have to question the value of the information if users are not willing to enable it. And the solution proposed is to force the 10% default overhead on everyone, whether they are currently doing debugging, whether they will ever do this level of debugging, because people will be too scared to enable it. (Yes, I think Oracle took this approach.) We can talk about this feature all we want, but if we are not willing to be realistic in how much performance penalty the _average_ user is willing to lose to have this monitoring, I fear we will make little progress on this feature. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Ilya Kosmodemiansky I've summarized Wait events monitoring discussion at Developer unconference in Ottawa this year on wiki: https://wiki.postgresql.org/wiki/PgCon_2016_Developer_Unconference/Wait_events_monitoring I hope wait event monitoring will be on by default even if the overhead is not almost zero, because the data needs to be readily available for faster troubleshooting. IMO, the benefit would be worth even 10% overhead. If you disable it by default because of overhead, how can we convince users to enable it in production systems to solve some performance problem? I’m afraid severe users would say “we can’t change any setting that might cause more trouble, so investigate the cause with existing information.” We should positively consider the performance with wait event monitoring on as the new normal. Then, we should develop more features that leverage the wait event data, so that wait event data is crucial. The manual explains to users that wait event monitoring can be turned off for maximal performance but it’s not recommended. BTW, taking advantage of this chance, why don’t we enrich the content of performance tuning in the manual? At least it needs to be explained how to analyze the wait event data and tune the system. Performance Tips https://www.postgresql.org/docs/devel/static/performance-tips.html Regards Takayuki Tsunakawa
Re: [HACKERS] Wait events monitoring future development
On Mon, Aug 8, 2016 at 11:47:11PM +0200, Ilya Kosmodemiansky wrote: > On Mon, Aug 8, 2016 at 7:03 PM, Bruce Momjianwrote: > > It seems asking users to run pg_test_timing before deploying to check > > the overhead would be sufficient. > > I'am not sure. Time measurement for waits is slightly more complicated > than a time measurement for explain analyze: a good workload plus > using gettimeofday in a straightforward manner can cause huge > overhead. Thats why a proper testing is important - if we can see a > significant performance drop if we have for example large > shared_buffers with the same concurrency, that shows gettimeofday is > too expensive to use. Am I correct, that we do not have such accurate > tests now? Well, if we find that pg_test_timing is insufficient, we can perhaps add a parallel test option to that utility. > My another concern is, that it is a bad idea to release a feature, > which allegedly has huge performance impact even if it is not turned > on by default. I often meet people who do not use exceptions in > plpgsql because a tip "A block containing an EXCEPTION clause is > significantly more expensive to enter ..." in PostgreSQL documentation Well, if we document that is can be slow, it is up to the user to decide if they want to use it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Mon, Aug 8, 2016 at 7:03 PM, Bruce Momjianwrote: > It seems asking users to run pg_test_timing before deploying to check > the overhead would be sufficient. I'am not sure. Time measurement for waits is slightly more complicated than a time measurement for explain analyze: a good workload plus using gettimeofday in a straightforward manner can cause huge overhead. Thats why a proper testing is important - if we can see a significant performance drop if we have for example large shared_buffers with the same concurrency, that shows gettimeofday is too expensive to use. Am I correct, that we do not have such accurate tests now? My another concern is, that it is a bad idea to release a feature, which allegedly has huge performance impact even if it is not turned on by default. I often meet people who do not use exceptions in plpgsql because a tip "A block containing an EXCEPTION clause is significantly more expensive to enter ..." in PostgreSQL documentation -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Mon, Aug 8, 2016 at 10:03 AM, Bruce Momjianwrote: > On Mon, Aug 8, 2016 at 04:43:40PM +0530, Amit Kapila wrote: >> >According to developers, overhead is small, but many people have doubts >> > that it can be much more significant for intensive workloads. Obviously, it >> > is not an easy task to test, because you need to put doubtfully >> > non-production ready code into mission-critical production for such tests. >> >As a result it will be clear if this design should be abandoned and we >> > need to think about less-invasive solutions or this design is acceptable. >> > >> >> I think here main objection was that gettimeofday can cause >> performance regression which can be taken care by using configurable >> knob. I am not aware if any other part of the design has been >> discussed in detail to conclude whether it has any obvious problem. > > It seems asking users to run pg_test_timing before deploying to check > the overhead would be sufficient. They should also run it in parallel, as sometimes the real overhead is in synchronization between multiple CPUs and doesn't show up when only a single CPU is involved. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Mon, Aug 8, 2016 at 04:43:40PM +0530, Amit Kapila wrote: > >According to developers, overhead is small, but many people have doubts > > that it can be much more significant for intensive workloads. Obviously, it > > is not an easy task to test, because you need to put doubtfully > > non-production ready code into mission-critical production for such tests. > >As a result it will be clear if this design should be abandoned and we > > need to think about less-invasive solutions or this design is acceptable. > > > > I think here main objection was that gettimeofday can cause > performance regression which can be taken care by using configurable > knob. I am not aware if any other part of the design has been > discussed in detail to conclude whether it has any obvious problem. It seems asking users to run pg_test_timing before deploying to check the overhead would be sufficient. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Sun, Aug 7, 2016 at 5:33 PM, Ilya Kosmodemianskywrote: > Hi, > > I've summarized Wait events monitoring discussion at Developer unconference > in Ottawa this year on wiki: > > https://wiki.postgresql.org/wiki/PgCon_2016_Developer_Unconference/Wait_events_monitoring > > > (Thanks to Alexander Korotkov for patiently pushing me to make this thing > finally done) > > If you attended, fill free to point me out if I missed something, I will put > it on the wiki too. > Thanks for summarization. > Wait event monitoring looks ones again stuck on the way through community > approval in spite of huge progress done last year in that direction. The > importance of the topic is beyond discussion now, if you talk to any > PostgreSQL person about implementing such a tool in Postgres and if the > person does not get exited, probably you talk to a full-time PostgreSQL > developer;-) Obviously it needs a better design, both the user interface and > implementation, and perhaps this is why full-time developers are still > sceptical. > > In order to move forward, imho we need at least some steps, whose steps can > be done in parallel > > 1. Further requirements need to be collected from DBAs. > >If you are a PostgreSQL DBA with Oracle experience and use perf for > troubleshooting Postgres - you are an ideal person to share your experience, > but everyone is welcome. > > 2. Further pg_wait_sampling performance testing needed and in different > environments. > I think it is better to first go with a knob whose default value will be off. We can do the performance testing as well and if by end of release nobody reported any visible regression, then we can discuss for changing the default to on. >According to developers, overhead is small, but many people have doubts > that it can be much more significant for intensive workloads. Obviously, it > is not an easy task to test, because you need to put doubtfully > non-production ready code into mission-critical production for such tests. >As a result it will be clear if this design should be abandoned and we > need to think about less-invasive solutions or this design is acceptable. > I think here main objection was that gettimeofday can cause performance regression which can be taken care by using configurable knob. I am not aware if any other part of the design has been discussed in detail to conclude whether it has any obvious problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Wait events monitoring future development
Hi, I've summarized Wait events monitoring discussion at Developer unconference in Ottawa this year on wiki: https://wiki.postgresql.org/wiki/PgCon_2016_Developer_Unconference/Wait_events_monitoring (Thanks to Alexander Korotkov for patiently pushing me to make this thing finally done) If you attended, fill free to point me out if I missed something, I will put it on the wiki too. Wait event monitoring looks ones again stuck on the way through community approval in spite of huge progress done last year in that direction. The importance of the topic is beyond discussion now, if you talk to any PostgreSQL person about implementing such a tool in Postgres and if the person does not get exited, probably you talk to a full-time PostgreSQL developer;-) Obviously it needs a better design, both the user interface and implementation, and perhaps this is why full-time developers are still sceptical. In order to move forward, imho we need at least some steps, whose steps can be done in parallel 1. Further requirements need to be collected from DBAs. If you are a PostgreSQL DBA with Oracle experience and use perf for troubleshooting Postgres - you are an ideal person to share your experience, but everyone is welcome. 2. Further pg_wait_sampling performance testing needed and in different environments. According to developers, overhead is small, but many people have doubts that it can be much more significant for intensive workloads. Obviously, it is not an easy task to test, because you need to put doubtfully non-production ready code into mission-critical production for such tests. As a result it will be clear if this design should be abandoned and we need to think about less-invasive solutions or this design is acceptable. Any thoughts? Best regards, Ilya -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com