On Tue, Jan 24, 2012 at 11:24:08AM -0500, Jaime Casanova wrote:
> On Mon, Jan 23, 2012 at 7:18 PM, Noah Misch <n...@leadboat.com> wrote:
> > If someone feels like
> > doing it, +1 for making pgstattuple() count non-leaf free space.
> 
> actually i agreed that non-leaf pages are irrelevant... i just
> confirmed that in a production system with 300GB none of the indexes
> in an 84M rows table nor in a heavily updated one has more than 1 root
> page, all the rest are deleted, half_dead or leaf. so the posibility
> of bloat coming from non-leaf pages seems very odd

FWIW, the number to look at is internal_pages from pgstatindex():

[local] test=# create table t4 (c) as select * from generate_series(1,1000000);
SELECT 1000000
[local] test=# alter table t4 add primary key(c);
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "t4_pkey" for 
table "t4"
ALTER TABLE
[local] test=# select * from pgstatindex('t4_pkey');
-[ RECORD 1 ]------+---------
version            | 2
tree_level         | 2
index_size         | 22478848
root_block_no      | 290
internal_pages     | 10
leaf_pages         | 2733
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 90.06
leaf_fragmentation | 0

So, 0.4% of this index.  They appear in proportion to the logarithm of the
total index size.  I agree that bloat centered on them is unlikely.  Counting
them would be justified, but that is a question of formal accuracy rather than
practical importance.

> but the possibility of bloat coming from the meta page doesn't exist,
> AFAIUI at least
> 
> we need the most accurate value about usable free space, because the
> idea is to add a sampler mode to the function so we don't scan the
> whole relation. that's why we still need the function.

I doubt we'd add this function solely on the basis that a future improvement
will make it useful.  For the patch to go in now, it needs to be useful now.
(This is not a universal principle, but it mostly holds for low-complexity
patches like this one.)

All my comments below would also apply to such a broader patch.

> btw... pgstattuple also has the problem that it's not using a ring buffer
> 
> 
> attached are two patches:
> - v5: is the same original patch but only track space in leaf, deleted
> and half_dead pages
> - v5.1: adds the same for all kind of indexes (problem is that this is
> inconsistent with the fact that pageinspect only manages btree indexes
> for everything else)

Let's take a step back.  Again, what you're proposing is essentially a faster
implementation of "SELECT free_percent FROM pgstattuple(rel)".  If this code
belongs in core at all, it belongs in the pgstattuple module.  Share as much
infrastructure as is reasonable between the user-visible functions of that
module.  For example, I'm suspecting that the pgstat_index() call tree should
be shared, with pgstat_index_page() checking a flag to decide whether to
gather per-tuple stats.

Next, compare the bits of code that differ between pgstattuple() and
relation_free_space(), convincing yourself that the differences are justified.
Each difference will yield one of the following conclusions:

1. Your code contains an innovation that would apply to both functions.  Where
not too difficult, merge these improvements into pgstattuple().  In order for
a demonstration of your new code's better performance to be interesting, we
must fix the same low-hanging fruit in its competitor.  One example is the use
of the bulk read strategy.  Another is the support for SP-GiST.

2. Your code is missing an essential behavior of pgstattuple().  Add it to
your code.  One example is the presence of CHECK_FOR_INTERRUPTS() calls.

3. Your code behaves differently from pgstattuple() due to a fundamental
difference in their tasks.  These are the only functional differences that
ought to remain in your finished patch; please point them out in comments.
For example, pgstat_heap() visits every tuple in the heap.  You'll have no
reason to do that; pgstattuple() only needs it to calculate statistics other
than free_percent.

In particular, I call your attention to the fact that pgstattuple() takes
shared buffer content locks before examining pages.  Your proposed patch does
not do so.  I do not know with certainty whether that falls under #1 or #2.
The broad convention is to take such locks, because we elsewhere want an exact
answer.  These functions are already inexact; they make no effort to observe a
consistent snapshot of the table.  If you convince yourself that the error
arising from not locking buffers is reasonably bounded, we can lose the locks
(in both functions -- conclusion #1).  Comments would then be in order.

With all that done, run some quick benchmarks: see how "SELECT free_percent
FROM pgstattuple(rel)" fares compared to "SELECT relation_free_space(rel)" for
a large heap and for a large B-tree index.  If the timing difference is too
small to be interesting to you, remove relation_free_space() and submit your
pgstattuple() improvements alone.  Otherwise, submit as written.

I suppose I didn't make all of this clear enough earlier; sorry for that.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to