Re: [HACKERS] TABLESAMPLE patch

Petr Jelinek Sat, 07 Mar 2015 09:08:48 -0800

On 05/03/15 09:21, Amit Kapila wrote:

On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <p...@2ndquadrant.com
<mailto:p...@2ndquadrant.com>> wrote:
 >
 >
 > I didn't add the whole page visibility caching as the tuple ids we
get from sampling methods don't map well to the visibility info we get
from heapgetpage (it maps to the values in the rs_vistuples array not to
to its indexes). Commented about it in code also.
 >


I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?

Double checking for tuple visibility is the only downside I can thinkof. Plus some added code complexity of course. I guess we could usebinary search on rs_vistuples (it's already sorted) so that info won'tbe completely useless. Not sure if worth it compared to normalvisibility check, but not hard to do.

I personally don't see the page pruning in TABLESAMPLE as all thatimportant though, considering that in practice it will only scan tinyportion of a relation and usually one that does not get many updates (inpractice the main use-case is BI).


Few other comments:

1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej


Ah bitrot over time.


2.
Few warnings in code (compiled on windows(msvc))
1>src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1>src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'

I think this is just compiler stupidity but hopefully fixed (I don'thave msvc to check for it and other compilers I tried don't complain).

3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+TableSampleClause *tablesample)
{
..
+/*
+* Page at a time mode is useless for us as we need to check visibility
+* of tuples individually because tuple offsets returned by sampling
+* methods map to rs_vistuples values and not to its indexes.
+*/
+node->ss.ss_currentScanDesc->rs_pageatatime = false;
..
}

Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?

I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c

Yeah I agree, I taught the heap_beginscan_strat about it as that one isthe advanced API.


4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+*tuples = path->rows * samplesize;
}

It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results.  I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.

I am not sure how we would know what size would the tuples have in therandom blocks that we are going to read later. That said, I am sure thatcosting can be improved even if I don't know how myself.

6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+| REPEATABLE

Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.

To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html

Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.

Yes it can be moved to type_func_name_keyword which is not all that muchbetter but at least something. I did try to play with this alreadyduring first submission but could not find a way to move it to somethingless restrictive.

I could not even pinpoint what exactly is the shift/reduce issue exceptthat it has something to do with the fact that the REPEATABLE clause isoptional (at least I didn't have the problem when it wasn't optional).


On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
                 ;

  opt_repeatable_clause:
-                       REPEATABLE '(' AexprConst ')'   { $$ = (Node *)
$3; }
+                       REPEATABLE '(' a_expr ')'       { $$ = (Node *)
$3; }
                         | /*EMPTY*/
         { $$ = NULL; }


Bah, lost this change during rebase.

--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

Reply via email to