Re: [HACKERS] TABLESAMPLE patch

Petr Jelinek Sat, 31 Jan 2015 11:09:06 -0800

On 31/01/15 14:27, Amit Kapila wrote:

On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <p...@2ndquadrant.com
<mailto:p...@2ndquadrant.com>> wrote:


    On 19/01/15 07:08, Amit Kapila wrote:

        On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek
        <p...@2ndquadrant.com <mailto:p...@2ndquadrant.com>
        <mailto:p...@2ndquadrant.com <mailto:p...@2ndquadrant.com>>> wrote:


    I think that's actually good to have, because we still do costing
    and the partial index might help produce better estimate of number
    of returned rows for tablesample as well.


I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.

Oh I think we were talking abut different things then, I thought youwere talking about the index checks that planner/optimizer sometimesdoes to get more accurate statistics. I'll take another look then.


    Well similar, not same as we are not always fetching whole page or
    doing visibility checks on all tuples in the page, etc. But I don't
    see why it can't be inside nodeSamplescan. If you look at bitmap
    heap scan, that one is also essentially somewhat modified sequential
    scan and does everything within the node nodeBitmapHeapscan because
    the algorithm is not used anywhere else, just like sample scan.


I don't mind doing everything in nodeSamplescan, however if
you can split the function, it will be easier to read and understand,
if you see in nodeBitmapHeapscan, that also has function like
bitgetpage().
This is just a suggestion and if you think that it can be splitted,
then it's okay, otherwise leave it as it is.

Yeah I can split it to separate function within the nodeSamplescan, thatsounds reasonable.



        Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
        Basically during sequiantial scan on same table by different
        backends, we attempt to keep them synchronized to reduce the I/O.


    Ah this, yes, it makes sense for bernoulli, not for system though. I
    guess it should be used for sampling methods that use SAS_SEQUENTIAL
    strategy.


Have you taken care of this in your latest patch?

Not yet. I think I will need to make the strategy property of thesampling method instead of returning it by costing function so that theinfo can be used by the scan.


    Oh and BTW when I delete 50k of tuples (make them invisible) the
    results of 20 runs are between 30880 and 40063 rows.


This is between 60% to 80%, lower than what is expected,
but I guess we can't do much for this except for trying with
reverse order for visibility test and sample tuple call,
you can decide if you want to try that once just to see if that
is better.

No, that's because I can't write properly, the lower number was supposedto be 39880 which is well within the tolerance, sorry for the confusion(9 and 0 are just too close).

    Anyway, attached is new version with some updates that you mentioned
    (all_visible, etc).
    I also added optional interface for the sampling method to access
    the tuple contents as I can imagine sampling methods that will want
    to do that.

+/*
+* Let the sampling method examine the actual tuple and decide if we
+* should return it.
+*
+* Note that we let it examine even invisible tuples.
+*/
+if (OidIsValid(node->tsmreturntuple.fn_oid))
+{
+found = DatumGetBool(FunctionCall4(&node->tsmreturntuple,
+  PointerGetDatum
(node),
+  UInt32GetDatum
(blockno),
+  PointerGetDatum
(tuple),
+  BoolGetDatum
(visible)));
+/* XXX: better error */
+if (found && !visible)
+elog(ERROR, "Sampling method wanted to return invisible tuple");
+}

You have mentioned in comment that let it examine invisible tuple,
but it is not clear why you want to allow examining invisible tuple
and then later return error, why can't it skips invisible tuple.

Well I think returning invisible tuples to user is bad idea and that'swhy the check, but I also think it might make sense to examine theinvisible tuple for the sampling function in case it wants to createsome kind of stats about the scan and wants to use those for making thedecision about returning other tuples. The interface should be probablycalled tsmexaminetuple instead to make it more clear what the intention is.

1.
How about statistics (pgstat_count_heap_getnext()) during
SampleNext as we do in heap_getnext?


Right, will add.


2.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
..
+/*
+* Lock the buffer so we can safely assess tuple
+* visibility later.
+*/
+LockBuffer(buffer, BUFFER_LOCK_SHARE);
..
}

When is this content lock released, shouldn't we release it after
checking visibility of tuple?


Here,
+               if (!OffsetNumberIsValid(tupoffset))
+               {
+                       UnlockReleaseBuffer(buffer);

but yes you are correct, it should be just released there and we canunlock already after visibility check.


3.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
{
..
}

Currently in this function as soon as it sees one valid tuple,
it return's the same, however isn't it better to do some caching
for tuples on same page like we do in heapgetpage()
(scan->rs_vistuples[ntup++] = lineoff;).  Basically that can avoid
taking content lock and some other overhead of operating on a
page.

That's somewhat hard question, it would make sense in cases where weread most of the page (which is true for system sampling for example)but it would probably slow things down in case where we select smallnumber of tuples (like say 1) which is true for bernoulli with smallpercentage parameter, it's actually quite easy to imagine that on reallybig tables (which is where TABLESAMPLE makes sense) we might get blockswhere we don't actually read any tuples. This is where optimizing forone sampling method will hurt another so I don't know what's betterhere. Perhaps the sampling method should have option that says if itprefers page mode reading or not, because only the author knows this.

Anyway, I am thinking of making the heapgetpage() public and using itdirectly. It will mean that we have to initialize HeapScanDesc whichmight add couple of lines but we anyway already have to keep last bufferand last tuple and position info in the scan info so we can instead useHeapScanDesc for that. There will couple of properties of HeapScanDescwe don't use but I don't think we care.

BTW I don't expect to have time to work on this patch in next ~10 daysso I will move it to Feb commitfest.


--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

Reply via email to