Re: [HACKERS] TABLESAMPLE patch

Petr Jelinek Wed, 28 Jan 2015 02:20:23 -0800

On 28/01/15 09:41, Kyotaro HORIGUCHI wrote:


As an issue related to size esmation, I got a explain result as
following,

=# explain (analyze on, buffers on) select a from t1 tablesample system(10) where 
a < 50000;
                                QUERY PLAN
--------------------------------------------------------------------------------
  Sample Scan on t1  (cost=0.00..301.00 rows=10000 width=4) (actual 
time=0.015..2
.920 rows=4294 loops=1)
    Filter: (a < 50000)
    Rows Removed by Filter: 5876
    Buffers: shared hit=45

actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel->tuples,
not using baserel->rows so it doesn't reflect the scan quals. Did
the code come from some other intention?


No, that's a bug.

         4.
         SampleNext()
         a.
         Isn't it better to code SampleNext() similar to SeqNext() and
         IndexNext(), which encapsulate the logic to get the tuple in
         another function(tbs_next() or something like that)?


     Possibly, my thinking was that unlike the index_getnext() and
     heap_getnext(), this function would not be called from any other
     place so adding one more layer of abstraction didn't seem useful.
     And it would mean creating new ScanDesc struct, etc.


I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.


Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that
one is also essentially somewhat modified sequential scan and does
everything within the node nodeBitmapHeapscan because the algorithm is
not used anywhere else, just like sample scan.


As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.

Yes, that's my view too. I would generally be for that change also andit would be worth it if the code was used in more than one place, but asit is it seems like it will just add code/complexity for no realbenefit. It would make sense in case we used sequential scan nodeinstead of the new node as Amit also suggested, but I remain unconvincedthat mixing sampling and sequential scan into single scan node would bea good idea.


How will it get smoothen for cases when let us say
more than 50% of tuples are not visible.  I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.


I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks
don't matter as the percentage of visible tuples will be same in the
result as in the relation.


Surely it migh yield the effectively the same result. Even so,
this code needs exaplation comment about the nature aroud the
code, or write code as MMVC people won't get confused, I suppose.

Yes it does, but as I am failing to explain even here, it's not clear tome what to write there. From my point of view it's just effect of theessential property of bernoulli sampling which is that every element hasequal probability of being included in the sample. It comes from thefact that we do bernoulli trial (in the code it's the while(sampler_random_fract(sampler->randstate) > probability) loop) on everyindividual tuple.

This means that the ratio of visible and invisible tuples should be samein the sample as it is in the relation. We then just skip the invisibletuples and get the correct percentage of the visible ones (this hasperformance benefit of not having to do visibility check on all tuples).

If that wasn't true than the bernoulli sampling would just simply notwork as intended as the same property is the reason why it's used instatistics - the trends should look the same in sample as they look inthe source population.

Obviously there is some variation in practice as we don't have perfectrandom generator but that's independent of the algorithm.



--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

Reply via email to