Re: [HACKERS] TABLESAMPLE patch

Petr Jelinek Sat, 17 Jan 2015 11:18:02 -0800

On 17/01/15 13:46, Amit Kapila wrote:

On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek <p...@2ndquadrant.com
<mailto:p...@2ndquadrant.com>> wrote:
 >
 >
 > In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.
 >


I have looked into this patch and would like to share my
findings with you.


That's a lot for this.


1.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+/*
+* There is only one plan to consider but we still need to set
+* parameters for RelOptInfo.
+*/
+set_cheapest(rel);
}

It seems we already call set_cheapest(rel) in set_rel_pathlist()
which is a caller of set_tablesample_rel_pathlist(), so why do
we need it inside set_tablesample_rel_pathlist()?

Ah, this changed after I started working on this patch and I didn'tnotice - previously all the set_<something>_rel_pathlist calledset_cheapest() individually. I will change the code.


2.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+/* We only do sample scan if it was requested */
+add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
}

Do we need to add_path, if there is only one path?


Good point, we can probably just set the pathlist directly in this case.


3.
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+else if (rte->tablesample != NULL)
+{
+/* Build sample scan on relation */
+set_tablesample_rel_pathlist(root, rel, rte);
+}

Have you consider to have a different RTE for this?

I assume you mean different RTEKind, yes I did, but the fact is thatit's still a relation, just with different scan type and I didn't wantto modify every piece of code which deals with RTE_RELATION to also dealwith the new RTE for sample scan as it seems unnecessary. I am notstrongly opinionated about this one though.

4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?

Possibly, my thinking was that unlike the index_getnext() andheap_getnext(), this function would not be called from any other placeso adding one more layer of abstraction didn't seem useful. And it wouldmean creating new ScanDesc struct, etc.


b.
Also don't we want to handle pruning of page while
scanning (heap_page_prune_opt()) and I observed
in heap scan API's after visibility check we do check
for serializable conflict (CheckForSerializableConflictOut()).


Both good points, will add.

Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?


I don't follow this one tbh.


c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.

I don't think so, the way bernoulli works is that it returns every tuplewith equal probability, so the visible tuples have same chance of beingreturned as the invisible ones so the issue should be smoothed awayautomatically (IMHO).

The acquire_sample_rows has limit on number of rows it returns so that'swhy it has to do the visibility check before as the problem youdescribed applies there.

The reason for using the probability instead of tuple limit is the factthat there is no way to accurately guess number of tuples in the tableat the beginning of scan. This method should actually be better atreturning the correct number of tuples without dependence on how many ofthem are visible or not and how much space in blocks is used.


5.

CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;

postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
  id
----
   1
   3
   4
   7
   8
   9
(6 rows)


postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
  id
----
   0
   1
   2
   3
   4
   5
   6
   7
   8
   9
(10 rows)

So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?

I think this is caused by random generator not producing smooth enoughrandom distribution on such a small sample (or possibly in general?).


In this regard, I have one question related to below code:

+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
..
+/* Every tuple has percent chance of being returned */
+while (sampler_random_fract(sampler->randstate) > samplesize)
+{
+tupoffset++;
+
+if (tupoffset > maxoffset)
+break;
+}

Why are we not considering tuple in above code
if tupoffset is less than maxoffset?

We consider it, I will rename the samplesize to probability or somethingto make it more clear what it actually is and maybe expand the commentabove the loop.

What the loop does is that it basically considers each offset on a pageand does "coin flip" - generates random number usingsampler_random_fract and if the random number falls within theprobability (= is smaller or equal to samplesize) it will be returned, the

if (tupoffset > maxoffset)
    break;

is there just because we need to give control back to scan so it canmove to another block.

6.
One larger question about the approach used in patch, why do you
think it is better to have a new node (SampleScan/SamplePath) like
you have used in patch instead of doing it as part of Sequence Scan.
I agree that we need some changes in different parts of Sequence Scan
execution, but still sounds like a viable way.  One simple thing that
occurs to me is that in ExecSeqScan(), we can use something like
SampleSeqNext instead of SeqNext to scan heap in a slightly different
way, probably doing it as part of sequence scan will have advantage that
we can use most of the existing infrastructure (sequence node path)
and have less discrepancies as mentioned in point-4.

I originally started from SeqScan but well, it requires completelydifferent State structure, it needs to call sampling methods in variousplaces (not just for next tuple), it needs different handling in explainand in optimizer (costing). If we add all this to sequential scan itwill not be that much different from new scan node (we'd save the coupleof new copy functions and one struct, but I'd rather have clearlydistinct scan).

It would also not help with the discrepancies you mentioned becausethose are in heapam and SampleScan would not use that even if it wasmerged with SeqScan - I don't think we want to implement the sampling onheapam level, it's too much of a hotspot to be good idea to addadditional cycles there.


--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

Reply via email to