Re: [HACKERS] Index overhead cost reporting

2013-12-07 Thread Tom Lane
Thom Brown t...@linux.com writes:
 I was wondering whether anyone has any insight with regards to
 measuring and reporting the overhead of maintaining indexes on
 relations.  If an UPDATE is issued to a table with, say, 6 indexes, it
 would be useful to determine how much time is spent updating each of
 those indexes.  And perhaps such timings would not be confined to
 indexes, but also other dependants that add overhead, such as
 triggers, rules, and in future, eager incremental materialised view
 updates.

We already do track the time spent in triggers.  Although Kevin might
have a different idea, I'd think that matview updates should also be
driven by triggers, so that the last item there would be covered.

 Is that something that could be provided in an EXPLAIN ANALYSE node?

Well, it'd not be particularly difficult to add measurements of the time
spent in ExecInsertIndexTuples, but I'd have some concerns about that:

* Instrumentation overhead.  That's already painful on machines with
slow gettimeofday, and this has the potential to add a lot more,
especially with the more expansive readings of your proposal.

* Is it really measuring the right thing?  To a much greater degree
than for some other things you might try to measure, just counting
time spent in ExecInsertIndexTuples is going to understate the true
cost of updating an index, because so much of the true cost is paid
asynchronously; viz, writing WAL as well as the actual index pages.
We already have that issue with measuring the runtime of a
ModifyTable node as a whole, but slicing and dicing that time
ten ways would make the results even more untrustworthy, IMO.

* There are also other costs to think of, such as the time spent by
VACUUM on maintaining the index, and the time spent by the planner
considering (perhaps vainly) whether it can use the index for each
query that reads the table.  In principle you could instrument
VACUUM to track the time it spends updating each index, and log
that in the pgstats infrastructure.  (I'd even think that might be
a good idea, except for the bloat effect on the pgstats files.)
I'm not at all sure there's any practical way to measure the distributed
planner overhead; it's not paid in discrete chunks large enough to be
timed easily.  Perhaps it's small enough to ignore, but I'm not sure.

Bottom line, I think it's a harder problem than it might seem at
first glance.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index overhead cost reporting

2013-12-07 Thread Thom Brown
On 7 December 2013 19:41, Tom Lane t...@sss.pgh.pa.us wrote:
 Thom Brown t...@linux.com writes:
 I was wondering whether anyone has any insight with regards to
 measuring and reporting the overhead of maintaining indexes on
 relations.  If an UPDATE is issued to a table with, say, 6 indexes, it
 would be useful to determine how much time is spent updating each of
 those indexes.  And perhaps such timings would not be confined to
 indexes, but also other dependants that add overhead, such as
 triggers, rules, and in future, eager incremental materialised view
 updates.

 We already do track the time spent in triggers.  Although Kevin might
 have a different idea, I'd think that matview updates should also be
 driven by triggers, so that the last item there would be covered.

Oh, of course. :)

 Is that something that could be provided in an EXPLAIN ANALYSE node?

 Well, it'd not be particularly difficult to add measurements of the time
 spent in ExecInsertIndexTuples, but I'd have some concerns about that:

 * Instrumentation overhead.  That's already painful on machines with
 slow gettimeofday, and this has the potential to add a lot more,
 especially with the more expansive readings of your proposal.

 * Is it really measuring the right thing?  To a much greater degree
 than for some other things you might try to measure, just counting
 time spent in ExecInsertIndexTuples is going to understate the true
 cost of updating an index, because so much of the true cost is paid
 asynchronously; viz, writing WAL as well as the actual index pages.
 We already have that issue with measuring the runtime of a
 ModifyTable node as a whole, but slicing and dicing that time
 ten ways would make the results even more untrustworthy, IMO.

 * There are also other costs to think of, such as the time spent by
 VACUUM on maintaining the index, and the time spent by the planner
 considering (perhaps vainly) whether it can use the index for each
 query that reads the table.  In principle you could instrument
 VACUUM to track the time it spends updating each index, and log
 that in the pgstats infrastructure.  (I'd even think that might be
 a good idea, except for the bloat effect on the pgstats files.)
 I'm not at all sure there's any practical way to measure the distributed
 planner overhead; it's not paid in discrete chunks large enough to be
 timed easily.  Perhaps it's small enough to ignore, but I'm not sure.

 Bottom line, I think it's a harder problem than it might seem at
 first glance.

Thanks for taking the time to explain the above.

Perhaps I may have misunderstood, or not explained my question with
enough detail, but you appear to be including activity that would, in
all likelihood, occur after the DML has returned confirmation to the
user that it has completed; in particular, VACUUM.  What I was
thinking of was an execution plan node to communicate the index
modifications that are carried out prior to confirmation of the query
completing.  The bgwriter, WAL writer et al. that spring into action
as a result of the index being updated wouldn't, as I see it, be
included.

So in essence, I'd only be looking for a breakdown of anything that
adds to the duration of the DML statement.  However, it sounds like
even that isn't straightforward from what you've written.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index overhead cost reporting

2013-12-07 Thread Tom Lane
Thom Brown t...@linux.com writes:
 Perhaps I may have misunderstood, or not explained my question with
 enough detail, but you appear to be including activity that would, in
 all likelihood, occur after the DML has returned confirmation to the
 user that it has completed; in particular, VACUUM.  What I was
 thinking of was an execution plan node to communicate the index
 modifications that are carried out prior to confirmation of the query
 completing.  The bgwriter, WAL writer et al. that spring into action
 as a result of the index being updated wouldn't, as I see it, be
 included.

 So in essence, I'd only be looking for a breakdown of anything that
 adds to the duration of the DML statement.  However, it sounds like
 even that isn't straightforward from what you've written.

I think that would be reasonably straightforward, though perhaps too
expensive depending on the speed of clock reading.  My larger point was
that I don't think that that alone is a fair measure of the cost of
maintaining an index, which is what you had claimed to be interested in.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index overhead cost reporting

2013-12-07 Thread Thom Brown
On 7 December 2013 20:44, Tom Lane t...@sss.pgh.pa.us wrote:
 Thom Brown t...@linux.com writes:
 So in essence, I'd only be looking for a breakdown of anything that
 adds to the duration of the DML statement.  However, it sounds like
 even that isn't straightforward from what you've written.

 I think that would be reasonably straightforward, though perhaps too
 expensive depending on the speed of clock reading.  My larger point was
 that I don't think that that alone is a fair measure of the cost of
 maintaining an index, which is what you had claimed to be interested in.

Point taken.  Yes, I don't believe I was that clear.  This was a how
much of my DML statement time was spent updating indexes rather than
a give me stats on index maintenance times for these indexes.

It's a shame that it comes at a non-trivial price.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers