Re: [HACKERS] User-perspective knowledge about wait events

2017-09-29 Thread Michael Paquier
On Sat, Sep 30, 2017 at 4:20 AM, Schneider  wrote:
> On Tue, Sep 26, 2017 at 4:28 PM, Michael Paquier
>  wrote:
>> Gathering a set of examples on wiki page with some rough
>> analysis I think would be a good start.
>
> I don't seem to have privs to create wiki pages; can someone else make
> a page where we can begin to gather things like this?
>
> Does https://wiki.postgresql.org/wiki/Wait_Events make sense?

Yes, that looks fine. Don't yo uhave a community account? I think that
it is necessary to log in to create a new page.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] User-perspective knowledge about wait events

2017-09-29 Thread Schneider
On Tue, Sep 26, 2017 at 4:28 PM, Michael Paquier
 wrote:
> Gathering a set of examples on wiki page with some rough
> analysis I think would be a good start.

I don't seem to have privs to create wiki pages; can someone else make
a page where we can begin to gather things like this?

Does https://wiki.postgresql.org/wiki/Wait_Events make sense?

-Jeremy

-- 
http://about.me/jeremy_schneider


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] User-perspective knowledge about wait events

2017-09-26 Thread Michael Paquier
On Wed, Sep 27, 2017 at 3:26 AM, Schneider  wrote:
> However I think that it would be immensely helpful to start gathering
> knowledge somewhere on wait events from the user perspective.  Listing
> a few of the wait events which users will probably see most often
> along with practical suggestions of what users could further examine
> or even what they could change to make improvements on their database
> systems.

I am not sure that listing wait events individually based on the
frequency they could be seen is that helpful, but profiles for
different workloads matter. For example, Postgres has pgbench in core,
so we could look at profiles generated with its generic tests, and
look at the profiles generated every second and then append them in a
custom table. By looking at how many of them are present you can guess
by what roughly Postgres is bottlecked by, and you can conclude this
and that.

For example, seeing a lot of SyncRep events in sessions would mean
that the synchronous standby may not be following its primary very
smoothly. Gathering a set of examples on wiki page with some rough
analysis I think would be a good start.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] User-perspective knowledge about wait events

2017-09-26 Thread Schneider
On Mon, Aug 14, 2017 at 11:58 PM, Michael Paquier
 wrote:
> As $subject has been touched on two threads recently
> (https://www.postgresql.org/message-id/CAB7nPqTbHLcHFn6m11tfpwAdgz8BmnBza2jjN9AK=sdx_kb...@mail.gmail.com
> and 
> https://www.postgresql.org/message-id/20170808213537.wkmmagf2a6i3hjyi@alvherre.pgsql),
> the list of wait event and their category are becoming harder to
> maintain because of their number and the fact that:
> 1) It is easy to add a wait event and...
> 2) It is easy to not update its documentation.
> 3) the documentation tables get easily broken.

The documentation provides nice "developer-perspective" descriptions
of the wait events. It's a great start. If you have some familiarity
with the code, or if you're willing to dig into it, then the
descriptions get you going.

However I think that it would be immensely helpful to start gathering
knowledge somewhere on wait events from the user perspective.  Listing
a few of the wait events which users will probably see most often
along with practical suggestions of what users could further examine
or even what they could change to make improvements on their database
systems.

When Postgres 10 goes GA and users start adopting it, we will start
getting a real-world sense of which wait events are most commonly
observed.  But I don't think it's too early to stub out a wiki page
for gathering user-level community knowledge on this.  As collective
knowledge about wait event meaning and action matures, the best advice
might even be considered for eventual incorporation back into the
official documentation.

I do realize there are some big pitfalls in the area of documenting
performance optimization. Workloads vary significantly and it's all to
easy for sound bites to devolve into "golden rules" or "best
practices" and then become mantras and silver bullets.

Nonetheless I believe it is possible to make some basic, scientific
and well-qualified observations. Not only is this immensely useful but
in fact it's critical to users' ability to have a good experience when
they dramatically ramp up their usage of PostgreSQL.

One particular section of Oracle's perf tuning doc actually isn't a
bad example in giving a few qualified and actionable user-perspective
wait event descriptions:
http://docs.oracle.com/database/122/TGDBA/instance-tuning-using-performance-views.htm#TGDBA94472

I'm just making thus up off the top of my head, so I'm sure it will be
bad... but in the interest of starting a conversation here's a example
"stub/starter":

^-^-^-^-^-^-^-^-^-^-^-^
https://wiki.postgresql.org/wiki/Wait_Events

Lock:tuple - this means that the SQL is waiting because some
application has previously locked the tuple (row) with a different
database connection and has not yet committed or rolled back that
transaction.  if you frequently see this wait event then you probably
need to look for application-level optimizations to decrease
insert/update/delete/lock contention on the same rows.

IO:DataFileRead - this means that the SQL is waiting for data to be
physically read into PostgreSQL's internal memory cache.  if you
frequently see this wait event then you might want to examine the SQL
statements which cause the most I/O and look for optimizations.  Can
you accomplish the same result while examining less data?  If not, is
there an index that might make the processing more efficient?  Can the
SQL be restructured to generate a more optimal plan?  If no
application-level optimizations are possible, then can the I/O
subsystem be optimized somehow?

IO:XactSync and IO:WALWrite - [what's the exact difference between
these?]  this probably means that the SQL is waiting for WAL data to
either be asynchronously written to disk, or for the final fsync to
complete before returning from a transactions's COMMIT call.  if you
frequently see this wait event then you might want to review the
content and frequency of updates and inserts generated by your
application.  If your application cannot be refactored to remove
unneeded DML then you might example the WAL I/O path for opportunities
to optimize or increase capacity.


Notes about scaling out/up hardware:

IO-related waits can often be somewhat remediated by scaling out or
scaling up your hardware. However it's generally worthwhile to first
review your application for optimization opportunities.  A small
application-level change can make orders-of-magnitude greater
improvements over hardware scaling!

Note that locking and concurrency related waits can often become even
worse with scaling out/up.  (Waiting for something on another machine
can be slower than waiting for something on another processor.)  Also,
scaling the wrong hardware component can exacerbate IO-related waits
if you don't scale in a way that directly addresses the actual
bottleneck.

^-^-^-^-^-^-^-^-^-^-^-^

What do others think about this?  Is the wiki.postgresql.org the right
place for something like this, or should it just