Re: [HACKERS] TABLESAMPLE patch

Petr Jelinek Sat, 04 Apr 2015 07:55:45 -0700

On 04/04/15 14:57, Amit Kapila wrote:


1.
+tablesample_clause:
+TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause


Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.

It will be reported as error during parse analysis if the TABLESAMPLEmethod does not accept two parameters, same as when the expression usedwrong type for example.


SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same <numeric value expr>

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.

Yes I want extensibility here. And I think the tablesample methodarguments are same thing as function arguments given that in the endthey are arguments for the init function of tablesampling method.

I would be ok with just expr_list, naming parameters here isn't overlyimportant, but ability to have different types and numbers of parametersfor custom TABLESAMPLE methods *is* important.

In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.

Well I did attach it as separate diff mainly for that reason. I agreethat DDL does not have to be committed immediately with the rest of thepatch (although it's the simplest part of the patch IMHO).


2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
= 2;
ERROR:  syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR:  syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?

It's supported in the FROM part of UPDATE and USING part of DELETE. Ithink that that's sufficient.

Standard is somewhat useless for UPDATE and DELETE as it only definesquite limited syntax there. From what I've seen when doing researchMSSQL also only supports it in their equivalent of FROM/USING list,Oracle does not seem to support their SAMPLING clause outside of SELECTsat all and if I got the cryptic DB2 manual correctly I think they don'tsupport it outside of (sub)SELECTs either.

4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan->rs_snapshot;
Bufferbuffer = scan->rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.

It's probably even better to do that one level up in the samplenexttup()and only call the SampleTupleVisible if page is not allvisible(PageIsAllVisible() is cheap).


6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?

Eh, stupid copy-paste error when copying function name from header toactual source file.

7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause.  If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.

Because the REPEATABLE is numeric expression so it can produce whatevernumber but we need int4 internally (well float4 would also work just thecode would be slightly uglier). And we do this type of coercion even fortable data (you can insert -2.3 into integer column and it will work) soI don't see what's wrong with it here.


8.
+DATA(insert OID = 3295 (  tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 (  tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
f f t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
_null_ ));

Datatype for second argument is kept as  700 (Float4), shouldn't
it be 1700 (Numeric)?

Why is that? Given the sampling error I think the float4 is enough forspecifying the percentage and it makes the calculations much easier andfaster than dealing with Numeric would.

9.
postgres=# explain SELECT t1.id <http://t1.id> FROM test_tablesample as
t1 TABLESAMPLE SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
                                  QUERY PLAN
----------------------------------------------------------------------------
  Nested Loop  (cost=0.00..4.05 rows=100 width=4)
    ->  Sample Scan on test_tablesample t1  (cost=0.00..0.01 rows=1 width=4)
    ->  Sample Scan on test_tablesample t2  (cost=0.00..4.02 rows=2 width=0)
(3 rows)

Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...


Good idea.


--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

Reply via email to