Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Hello, it seems to me to work as expected. At Thu, 08 Dec 2016 15:58:10 -0500, Tom Lane wrote in <7083.1481230...@sss.pgh.pa.us> > I've pushed the previous patch to HEAD. Attached is a proposed patch > (against 9.6) that we could use for the back branches; it takes the > brute force approach of just computing the correct value on-demand > in the two functions at issue. The key question of course is whether > this is acceptable from a performance standpoint. I did a simple test > > using a 1000-entry VALUES list: > > select count(a) from ( > values > ('0'::varchar(3), '0'::varchar(4)), > ('1'::varchar(3), '1'::varchar(4)), > ('2'::varchar(3), '2'::varchar(4)), > ('3'::varchar(3), '3'::varchar(4)), > ('4'::varchar(3), '4'::varchar(4)), > ... > ('996'::varchar(3), '996'::varchar(4)), > ('997'::varchar(3), '997'::varchar(4)), > ('998'::varchar(3), '998'::varchar(4)), > ('999'::varchar(3), '999'::varchar(4)) > ) v(a,b); > > Since all the rows do have the same typmod, this represents the worst > case where we have to scan all the way to the end to confirm the typmod, > and it has about as little overhead otherwise as I could think of doing. > I ran it like this: > > pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression > > and could not see any above-the-noise-level difference --- in fact, > it seemed like it was faster *with* the patch, which is obviously > impossible; I blame that on chance realignments of loops vs. cache > line boundaries. > > So I think this is an okay candidate for back-patching. If anyone > wants to do their own performance tests, please do. For all the additional computation, I had the same result on 9.6. The patch accelerates the processing around the noise rate.. 9.6 without the patch 103.2 tps 9.6 withthe patch 108.3 tps For the master branch, master102.9 tps 0b78106c^ 103.4 tps regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
I've pushed the previous patch to HEAD. Attached is a proposed patch (against 9.6) that we could use for the back branches; it takes the brute force approach of just computing the correct value on-demand in the two functions at issue. The key question of course is whether this is acceptable from a performance standpoint. I did a simple test using a 1000-entry VALUES list: select count(a) from ( values ('0'::varchar(3), '0'::varchar(4)), ('1'::varchar(3), '1'::varchar(4)), ('2'::varchar(3), '2'::varchar(4)), ('3'::varchar(3), '3'::varchar(4)), ('4'::varchar(3), '4'::varchar(4)), ... ('996'::varchar(3), '996'::varchar(4)), ('997'::varchar(3), '997'::varchar(4)), ('998'::varchar(3), '998'::varchar(4)), ('999'::varchar(3), '999'::varchar(4)) ) v(a,b); Since all the rows do have the same typmod, this represents the worst case where we have to scan all the way to the end to confirm the typmod, and it has about as little overhead otherwise as I could think of doing. I ran it like this: pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression and could not see any above-the-noise-level difference --- in fact, it seemed like it was faster *with* the patch, which is obviously impossible; I blame that on chance realignments of loops vs. cache line boundaries. So I think this is an okay candidate for back-patching. If anyone wants to do their own performance tests, please do. regards, tom lane diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c index 1e3ecbc..c51fd81 100644 *** a/src/backend/parser/parse_relation.c --- b/src/backend/parser/parse_relation.c *** static void expandTupleDesc(TupleDesc tu *** 52,57 --- 52,58 int rtindex, int sublevels_up, int location, bool include_dropped, List **colnames, List **colvars); + static int32 *getValuesTypmods(RangeTblEntry *rte); static int specialAttNum(const char *attname); static bool isQueryUsingTempRelation_walker(Node *node, void *context); *** expandRTE(RangeTblEntry *rte, int rtinde *** 2157,2165 --- 2158,2179 { /* Values RTE */ ListCell *aliasp_item = list_head(rte->eref->colnames); + int32 *coltypmods; ListCell *lcv; ListCell *lcc; + /* + * It's okay to extract column types from the expressions in + * the first row, since all rows will have been coerced to the + * same types. Their typmods might not be the same though, so + * we potentially need to examine all rows to compute those. + * Column collations are pre-computed in values_collations. + */ + if (colvars) + coltypmods = getValuesTypmods(rte); + else + coltypmods = NULL; + varattno = 0; forboth(lcv, (List *) linitial(rte->values_lists), lcc, rte->values_collations) *** expandRTE(RangeTblEntry *rte, int rtinde *** 2184,2196 varnode = makeVar(rtindex, varattno, exprType(col), ! exprTypmod(col), colcollation, sublevels_up); varnode->location = location; *colvars = lappend(*colvars, varnode); } } } break; case RTE_JOIN: --- 2198,2212 varnode = makeVar(rtindex, varattno, exprType(col), ! coltypmods[varattno - 1], colcollation, sublevels_up); varnode->location = location; *colvars = lappend(*colvars, varnode); } } + if (coltypmods) + pfree(coltypmods); } break; case RTE_JOIN: *** expandRTE(RangeTblEntry *rte, int rtinde *** 2296,2301 --- 2312,2319 varnode = makeVar(rtindex, varattno, coltype, coltypmod, colcoll, sublevels_up); + varnode->location = location; + *colvars = lappend(*colvars, varnode); } } *** expandTupleDesc(TupleDesc tupdesc, Alias *** 2413,2418 --- 2431,2504 } /* + * getValuesTypmods -- expandRTE subroutine + * + * Identify per-column typmods for the given VALUES RTE. Returns a + * palloc'd array. + */ + static int32 * + getValuesTypmods(RangeTblEntry *rte) + { + int32 *coltypmods; + List *firstrow; + int ncolumns, + nvalid, + i; + ListCell *lc; + + Assert(rte->values_lists != NIL); + firstrow = (List *) linitial(rte->values_lists); + ncolumns = list_length(firstrow); + coltypmods = (int32 *) palloc(ncolumns * sizeof(int32)); + nvalid = 0; + + /* Collect the typmods from the first VALUES row */ + i = 0; + foreach(lc, firstrow) + { + Node *col = (Node *) lfirst(lc); + + coltypmods[i] = exprTypmod(col); + if (coltypmods[i] >= 0) + nvalid++; + i++; + } + + /* + * Scan remaining rows; as soon as we have a non-matching typmod for a + * column, reset that typmod to -1. We can bail out early if all typmods + *
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
2016-12-08 14:03 GMT+01:00 Alvaro Herrera : > Tom Lane wrote: > > Alvaro Herrera writes: > > > Tom Lane wrote: > > >> In HEAD, we could change the RTE data structure so that > > >> transformValuesClause could save the typmod information in the RTE, > > >> keeping the lookups cheap. > > > > > Hmm, I think this would be useful for the XMLTABLE patch too. I talked > > > a bit about it at > > > https://www.postgresql.org/message-id/20161122204730. > dgipy6gxi25j4e6a@alvherre.pgsql > > > > I dunno. If your example there is correct that XMLTABLE can be called as > > a plain function in a SELECT list, then I doubt that we want to tie > > anything about it to the RTE data structure. If anything, the case where > > it appears in FROM seems to need to be treated as a generic RTE_FUNCTION > > case. > > Well, XMLTABLE is specified by the standard to be part of , > which it turn is part of . I can't immediately tell > whether it allows XMLTABLE to be called like a regular function. The > current patch allows it, but maybe that's not right, and it's probably > not that useful anyway. > It looks like function, and we support on both sides, so I implemented both. Probably, there is only 10 rows more related to this feature. Using this function in target list is not critical feature - now with LATERAL JOIN we can live without it. It is just some few steps forward to our user. Again - implementation of this feature is probably few lines only. > > > I've been trying to avoid getting involved in the XMLTABLE patch, mainly > > because I know zip about XML, but maybe I need to take a look. > > I think it'd be productive that you did so. The XML part of it is > reasonably well isolated, so you could give your opinion on the core > parser / executor parts without looking at the XML part. > The critical part has zero relation to XML. All is some game with tupledesc. Regards Pavel > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> In HEAD, we could change the RTE data structure so that > >> transformValuesClause could save the typmod information in the RTE, > >> keeping the lookups cheap. > > > Hmm, I think this would be useful for the XMLTABLE patch too. I talked > > a bit about it at > > https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql > > I dunno. If your example there is correct that XMLTABLE can be called as > a plain function in a SELECT list, then I doubt that we want to tie > anything about it to the RTE data structure. If anything, the case where > it appears in FROM seems to need to be treated as a generic RTE_FUNCTION > case. Well, XMLTABLE is specified by the standard to be part of , which it turn is part of . I can't immediately tell whether it allows XMLTABLE to be called like a regular function. The current patch allows it, but maybe that's not right, and it's probably not that useful anyway. > I've been trying to avoid getting involved in the XMLTABLE patch, mainly > because I know zip about XML, but maybe I need to take a look. I think it'd be productive that you did so. The XML part of it is reasonably well isolated, so you could give your opinion on the core parser / executor parts without looking at the XML part. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Mmm. I did the same thing in select_common_type and resulted in a messier (a bit). At Wed, 07 Dec 2016 23:44:19 -0500, Tom Lane wrote in <15128.1481172...@sss.pgh.pa.us> > Attached is a patch that fixes this by storing typmod info in the RTE. > This turned out to be straightforward, and I think it's definitely > what we should do in HEAD. I have mixed emotions about whether it's > worth doing anything about it in the back branches. With it, VALUES works as the same as UNION, as documentation says. https://www.postgresql.org/docs/8.2/static/queries-values.html Past versions have the same documentation so back-patching the *behavior* seems to me worth doing. Instead of modifying RTE, re-coercing the first row's value would enough (I'm not sure how to do that now) for back-patching. > I chose to redefine the existing coltypes/coltypmods/colcollations > lists for CTE RTEs as also applying to VALUES RTEs. That saves a > little space in the RangeTblEntry nodes and allows sharing code > in a couple of places. It's tempting to consider making that apply > to all RTE types, which would permit collapsing expandRTE() and > get_rte_attribute_type() into a single case. But AFAICS there would > be no benefit elsewhere, so I'm not sure the extra code churn is > justified. Agreed. > BTW, I noticed that the CTE case of expandRTE() fails to assign the > specified location to the generated Vars, which is clearly a bug > though a very minor one; it would result in failing to display a > parse error location in some cases where we would do so for Vars from > other RTE types. That part might be worth back-patching, not sure. If we do back-patching VALUES patch, involving it would worth, I think. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
2016-12-07 22:17 GMT+01:00 Alvaro Herrera : > Tom Lane wrote: > > > In HEAD, we could change the RTE data structure so that > > transformValuesClause could save the typmod information in the RTE, > > keeping the lookups cheap. > > Hmm, I think this would be useful for the XMLTABLE patch too. I talked > a bit about it at > https://www.postgresql.org/message-id/20161122204730. > dgipy6gxi25j4e6a@alvherre.pgsql > > The patch has evolved quite a bit since then, but the tupdesc continues > to be a problem. Latest patch is at > https://www.postgresql.org/message-id/CAFj8pRBsrhwR636-_ > 3TPbqu%3DFo3_DDer6_yp_afzR7qzhW1T6Q%40mail.gmail.com What do you dislike on tupdesc usage there? regards Pavel > > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Attached is a patch that fixes this by storing typmod info in the RTE. This turned out to be straightforward, and I think it's definitely what we should do in HEAD. I have mixed emotions about whether it's worth doing anything about it in the back branches. I chose to redefine the existing coltypes/coltypmods/colcollations lists for CTE RTEs as also applying to VALUES RTEs. That saves a little space in the RangeTblEntry nodes and allows sharing code in a couple of places. It's tempting to consider making that apply to all RTE types, which would permit collapsing expandRTE() and get_rte_attribute_type() into a single case. But AFAICS there would be no benefit elsewhere, so I'm not sure the extra code churn is justified. BTW, I noticed that the CTE case of expandRTE() fails to assign the specified location to the generated Vars, which is clearly a bug though a very minor one; it would result in failing to display a parse error location in some cases where we would do so for Vars from other RTE types. That part might be worth back-patching, not sure. regards, tom lane diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index e30c57e..d973225 100644 *** a/src/backend/nodes/copyfuncs.c --- b/src/backend/nodes/copyfuncs.c *** _copyRangeTblEntry(const RangeTblEntry * *** 2149,2161 COPY_NODE_FIELD(functions); COPY_SCALAR_FIELD(funcordinality); COPY_NODE_FIELD(values_lists); - COPY_NODE_FIELD(values_collations); COPY_STRING_FIELD(ctename); COPY_SCALAR_FIELD(ctelevelsup); COPY_SCALAR_FIELD(self_reference); ! COPY_NODE_FIELD(ctecoltypes); ! COPY_NODE_FIELD(ctecoltypmods); ! COPY_NODE_FIELD(ctecolcollations); COPY_NODE_FIELD(alias); COPY_NODE_FIELD(eref); COPY_SCALAR_FIELD(lateral); --- 2149,2160 COPY_NODE_FIELD(functions); COPY_SCALAR_FIELD(funcordinality); COPY_NODE_FIELD(values_lists); COPY_STRING_FIELD(ctename); COPY_SCALAR_FIELD(ctelevelsup); COPY_SCALAR_FIELD(self_reference); ! COPY_NODE_FIELD(coltypes); ! COPY_NODE_FIELD(coltypmods); ! COPY_NODE_FIELD(colcollations); COPY_NODE_FIELD(alias); COPY_NODE_FIELD(eref); COPY_SCALAR_FIELD(lateral); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index b7a109c..edc1797 100644 *** a/src/backend/nodes/equalfuncs.c --- b/src/backend/nodes/equalfuncs.c *** _equalRangeTblEntry(const RangeTblEntry *** 2460,2472 COMPARE_NODE_FIELD(functions); COMPARE_SCALAR_FIELD(funcordinality); COMPARE_NODE_FIELD(values_lists); - COMPARE_NODE_FIELD(values_collations); COMPARE_STRING_FIELD(ctename); COMPARE_SCALAR_FIELD(ctelevelsup); COMPARE_SCALAR_FIELD(self_reference); ! COMPARE_NODE_FIELD(ctecoltypes); ! COMPARE_NODE_FIELD(ctecoltypmods); ! COMPARE_NODE_FIELD(ctecolcollations); COMPARE_NODE_FIELD(alias); COMPARE_NODE_FIELD(eref); COMPARE_SCALAR_FIELD(lateral); --- 2460,2471 COMPARE_NODE_FIELD(functions); COMPARE_SCALAR_FIELD(funcordinality); COMPARE_NODE_FIELD(values_lists); COMPARE_STRING_FIELD(ctename); COMPARE_SCALAR_FIELD(ctelevelsup); COMPARE_SCALAR_FIELD(self_reference); ! COMPARE_NODE_FIELD(coltypes); ! COMPARE_NODE_FIELD(coltypmods); ! COMPARE_NODE_FIELD(colcollations); COMPARE_NODE_FIELD(alias); COMPARE_NODE_FIELD(eref); COMPARE_SCALAR_FIELD(lateral); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 0d858f5..7258c03 100644 *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *** _outRangeTblEntry(StringInfo str, const *** 2841,2855 break; case RTE_VALUES: WRITE_NODE_FIELD(values_lists); ! WRITE_NODE_FIELD(values_collations); break; case RTE_CTE: WRITE_STRING_FIELD(ctename); WRITE_UINT_FIELD(ctelevelsup); WRITE_BOOL_FIELD(self_reference); ! WRITE_NODE_FIELD(ctecoltypes); ! WRITE_NODE_FIELD(ctecoltypmods); ! WRITE_NODE_FIELD(ctecolcollations); break; default: elog(ERROR, "unrecognized RTE kind: %d", (int) node->rtekind); --- 2841,2857 break; case RTE_VALUES: WRITE_NODE_FIELD(values_lists); ! WRITE_NODE_FIELD(coltypes); ! WRITE_NODE_FIELD(coltypmods); ! WRITE_NODE_FIELD(colcollations); break; case RTE_CTE: WRITE_STRING_FIELD(ctename); WRITE_UINT_FIELD(ctelevelsup); WRITE_BOOL_FIELD(self_reference); ! WRITE_NODE_FIELD(coltypes); ! WRITE_NODE_FIELD(coltypmods); ! WRITE_NODE_FIELD(colcollations); break; default: elog(ERROR, "unrecognized RTE kind: %d", (int) node->rtekind); diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index c587d4e..d608530 100644 *** a/src/backend/nodes/readfuncs.c --- b/src/backend/nodes/readfuncs.c *** _readRangeTblEntry(void) *** 1314,1328 break; case RTE_VALUES: READ_NODE_FIELD(values_lists); ! READ_NODE_FIELD(values_collations); break; case RTE_
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Alvaro Herrera writes: > Tom Lane wrote: >> In HEAD, we could change the RTE data structure so that >> transformValuesClause could save the typmod information in the RTE, >> keeping the lookups cheap. > Hmm, I think this would be useful for the XMLTABLE patch too. I talked > a bit about it at > https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql I dunno. If your example there is correct that XMLTABLE can be called as a plain function in a SELECT list, then I doubt that we want to tie anything about it to the RTE data structure. If anything, the case where it appears in FROM seems to need to be treated as a generic RTE_FUNCTION case. I've been trying to avoid getting involved in the XMLTABLE patch, mainly because I know zip about XML, but maybe I need to take a look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Tom Lane wrote: > In HEAD, we could change the RTE data structure so that > transformValuesClause could save the typmod information in the RTE, > keeping the lookups cheap. Hmm, I think this would be useful for the XMLTABLE patch too. I talked a bit about it at https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql The patch has evolved quite a bit since then, but the tupdesc continues to be a problem. Latest patch is at https://www.postgresql.org/message-id/CAFj8pRBsrhwR636-_3TPbqu%3DFo3_DDer6_yp_afzR7qzhW1T6Q%40mail.gmail.com -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
On Wed, Dec 7, 2016 at 1:03 PM, Tom Lane wrote: > Still, things have been like this since 8.2 when we implemented multi-row > VALUES, and nobody's noticed up to now. Maybe the right answer is to > change the data structure in HEAD and decree that we won't fix it in back > branches. I don't really like that answer though ... > The merit of "won't back-patch" is that at least you don't punish those who are being lazy (in a good sense) but generating values in subsequent lines that conform to the type specification of the first record. We already implicitly accept such behavior elsewhere - though probably with better validation - so keeping it here is defense-able. We dislike changing query plans in back branches and this really isn't that different. The concern, especially since this can propagate to a CREATE TABLE AS, is whether there is some kind of fundamental storage risk being introduced that we do not want to have happen no matter how rare. /me feels deja-vu... David J.
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Kyotaro HORIGUCHI writes: > At Mon, 5 Dec 2016 21:54:08 -0700, "David G. Johnston" > wrote in > >> If every row passes you can retain the typemod. That is arguably the >> perfect solution. The concern is that "scan every row" could be very >> expensive - though in writing this I'm thinking that you'd quickly find a > I found that it already scans the whole list. select_common_type > does that. And it returns the type that is found to be applicable > on all values. Handling typmod there has a small > impact. (Though, I don't assert that we should do this.) The problem is that there is not where the impact is. It would be really easy for transformValuesClause to check for a common typmod while it's transforming the construct, but it has noplace to put the information. The place where the info is needed is in expandRTE and get_rte_attribute_type (which is called by make_var). So basically, parsing of each Var reference to a VALUES subquery would potentially have to scan all rows of the VALUES list to identify the right typmod to give to the Var. The repetitiveness of that is what's bothering me. In HEAD, we could change the RTE data structure so that transformValuesClause could save the typmod information in the RTE, keeping the lookups cheap. But that option is not available to us in released branches. On the other hand, we might be worrying over nothing. I think that in typical use, expandRTE() will be done just once per query, and it could amortize the cost by considering all the rows in one scan. get_rte_attribute_type would be a problem except that I believe it's rarely used for VALUES RTEs --- our code coverage report shows that that switch branch isn't reached at all in the standard regression tests. (The reason for the above statements is that we convert a bare VALUES to SELECT * FROM VALUES, and the * is expanded by expandRTE, not by retail applications of make_var.) > Ok, I think all of the #1 to #5 are the change of behavior in > this criteria. What we shold resolve here is the inconsistent > table generated by create table as select from (values... Well, that's one symptom, but not the only one; consider # select x::varchar(4) from (values ('z'::varchar(4)), ('0123456789')) v(x); x z 0123456789 (2 rows) The Var representing v.x is (mis)labeled as already having the right typmod for varchar(4), so transformTypeCast concludes that no run-time work is needed, and you get clearly-wrong answers out. Still, things have been like this since 8.2 when we implemented multi-row VALUES, and nobody's noticed up to now. Maybe the right answer is to change the data structure in HEAD and decree that we won't fix it in back branches. I don't really like that answer though ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Hello, At Mon, 5 Dec 2016 21:54:08 -0700, "David G. Johnston" wrote in > feel free to s/typemod/typmod/ ... my fingers don't want to drop the "e" > > On Mon, Dec 5, 2016 at 9:17 PM, Kyotaro HORIGUCHI < > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > Hello, > > > > At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" < > > david.g.johns...@gmail.com> wrote in > zrvf9qia0wwt_qzmauy...@mail.gmail.com> > > > On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI < > > > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > > > > (It's not typo but my poor wording... Sorry.) > > Mmm. I think the typemod of a row should not be applied onto > > other rows, and the same can be said for types. But type of the > > first row is applied to all of the rest rows, though... Does it > > make sense? > > > > Yes. All rows in a given relation must end up with the exact same type > and it isn't friendly to fail when a implicit conversion from unknown is > possible. I'm not sure how clearly users are conscious of the type unkown, my feeling is opposed to implicity applying of the type of the first value in VALUES to all other values of the type "unknown". On the other hand, such behavior doesn't harm anything when we don't explicitly type the values in VALUES. (But it would be a new feature as you wrote below) Anyway the real behavior of the current parser for VALUES is scanning whole the list and extract common-coerceable type, then applying the type on the whole list. (transformValuesClause) > > But what I wanted to say was not that but the something like the > > following. > > > > select pg_typeof('12345'::varchar(1)); > > pg_typeof > > --- > > character varying > > > > A value seemingly doesn't have typmod. So it seems reasonable > > that VALUES cannot handle typmod. It seems reasonable regardless > > of how it is acutually implemented. > > > > This is an artifact of functions - the typemod associated with the value > '12345' is lost when that value is passed into the function pg_typeof. It seems to me what is occuring in VALUES. > Thus it is impossible to write a SQL query the reports the typemod of a > literal or column reference. Nonetheless it is there in reality. Just see > the original CREATE TABLE AS example for proof. The created table's column > is shown (using direct catalog queries) to contain typemod value of 20 - > which it could only have gotten from the first values rows which contained > a casted literal. Ok, I found myself have read the original problem wrongly. =# create table t2 (a) as select * from (values ('abcdee'::varchar(3)), ('defghij'::varchar(5))) x; postgres=# \d t2; Table "public.t2" Column | Type | Modifiers +--+--- a | character varying(3) | postgres=# select * from t2; a --- abc defgh (2 rows) The problem to be resolved here seems to be that CREATE TABLE AS creates a broken in-a-sense table. Not a coercion of VALUES. #6 - raise an error if a subquery's result doesn't fit the newly created table. Or, create a new table so that all the value given from subqery fit to it. (I havent' understand how the source typmod affects the new table and I dont'see how to do that, though) > > But I undetstand what we want to solve here is how to change > > VALUES's behavior to perfectly coerce the row values to the types > > explicity applied in the first row. Right? > > > > Actually, no, since it is not possible to coerce "perfectly". Since any > attempt at doing so could fail it is only possible to scan every row and > compare its typemod to the first row's typemod. Ideally (but maybe not in > reality) as soon as a discrepancy is found stop and discard the typemod. > If every row passes you can retain the typemod. That is arguably the > perfect solution. The concern is that "scan every row" could be very > expensive - though in writing this I'm thinking that you'd quickly find a I found that it already scans the whole list. select_common_type does that. And it returns the type that is found to be applicable on all values. Handling typmod there has a small impact. (Though, I don't assert that we should do this.) Even the value itself doesn't convey typmod, VALUES can take expressions. (It is obvious because the first value was already a coercing expression). They are scanned already. > non-match even in a large dataset - and so a less perfect but still valid > solution is to simply discard the typemod if there is more than one row. > My thought was that if you are going to discard typemod in the n > 1 case > for consistency you should discard the typemod in the n = 1 case as well. > > That is, in a nutshell, options 1, 2, and 3 in order. > > The "fault" in #1 that #4 attempted to fix was that VALUES are often hand > entered and so the inexperienced would like to only type in a cast one the > first row and have it will apply to all subsequent rows
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
On Mon, Dec 5, 2016 at 9:54 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > The concern is that "scan every row" could be very expensive - though in > writing this I'm thinking that you'd quickly find a non-match even in a > large dataset - and so a less perfect but still valid solution is to simply > discard the typemod if there is more than one row. > My folly here - and the actual question to ask - is if you are faced with large dataset that does have consistent typmods - is the benefit of knowing what it is and carrying it to the next layer worth the cost of scanning every single row to prove it is consistent? My vote would be no - and the only question to ask is whether n = 1 and n > 1 behaviors should differ - to which I'd say no as well - at least in master. In the back branches the current behavior would be retained if the n = 1 behavior is kept different than the n > 1 behavior which is a worthy trade-off. David J.
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
feel free to s/typemod/typmod/ ... my fingers don't want to drop the "e" On Mon, Dec 5, 2016 at 9:17 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello, > > At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" < > david.g.johns...@gmail.com> wrote in zrvf9qia0wwt_qzmauy...@mail.gmail.com> > > On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI < > > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > (It's not typo but my poor wording... Sorry.) > Mmm. I think the typemod of a row should not be applied onto > other rows, and the same can be said for types. But type of the > first row is applied to all of the rest rows, though... Does it > make sense? > Yes. All rows in a given relation must end up with the exact same type and it isn't friendly to fail when a implicit conversion from unknown is possible. > But what I wanted to say was not that but the something like the > following. > > select pg_typeof('12345'::varchar(1)); > pg_typeof > --- > character varying > > A value seemingly doesn't have typmod. So it seems reasonable > that VALUES cannot handle typmod. It seems reasonable regardless > of how it is acutually implemented. > This is an artifact of functions - the typemod associated with the value '12345' is lost when that value is passed into the function pg_typeof. Thus it is impossible to write a SQL query the reports the typemod of a literal or column reference. Nonetheless it is there in reality. Just see the original CREATE TABLE AS example for proof. The created table's column is shown (using direct catalog queries) to contain typemod value of 20 - which it could only have gotten from the first values rows which contained a casted literal. > > > Even though I'm not sure about SQL standard here but my > > > feeling is something like the following. > > > > > > | FROM ( > > > | VALUES (row 1), .. (row n)) > > > | AS foo (colname *type*, ..) > > > > > > for this case, > > > > > > | create temporary table product_codes as select * > > > | from ( > > > | values > > > | ('abcdefg'), > > > | ('012345678901234567ABCDEFGHIJKLMN') > > > | ) csv_data (product_code character varying(20)); > > > > > > Myself have gotten errors for this false syntax several times:( > > > > > > > Only the function in from form of this accepts a column definition in > lieu > > of a simple alias. Regardless of the merits of this idea it would not be > > backpatch-able and wouldn't resolve the current valid syntax problem. > > Yeah.. It wouldn't be back-patchable. I personally think that it > is not necessary to be "solve"d, since a value doesn't rigged > with typmod. > > But I undetstand what we want to solve here is how to change > VALUES's behavior to perfectly coerce the row values to the types > explicity applied in the first row. Right? > Actually, no, since it is not possible to coerce "perfectly". Since any attempt at doing so could fail it is only possible to scan every row and compare its typemod to the first row's typemod. Ideally (but maybe not in reality) as soon as a discrepancy is found stop and discard the typemod. If every row passes you can retain the typemod. That is arguably the perfect solution. The concern is that "scan every row" could be very expensive - though in writing this I'm thinking that you'd quickly find a non-match even in a large dataset - and so a less perfect but still valid solution is to simply discard the typemod if there is more than one row. My thought was that if you are going to discard typemod in the n > 1 case for consistency you should discard the typemod in the n = 1 case as well. That is, in a nutshell, options 1, 2, and 3 in order. The "fault" in #1 that #4 attempted to fix was that VALUES are often hand entered and so the inexperienced would like to only type in a cast one the first row and have it will apply to all subsequent rows. That would be a feature request, though. Your capability to add type information to any FROM alias is likewise a feature request - solving the overall problem by giving the author a place to specify the desired type explicitly without having to pollute the query with excessive casts or subqueries. David J.
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Hello, At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" wrote in > On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI < > horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > Hello, > > > > At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" < > > david.g.johns...@gmail.com> wrote in > v8nr0FsCFrQ=oo1dkp...@mail.gmail.com> > > > On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane wrote: > > > > > > > "David G. Johnston" writes: > > > > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane wrote: > > > > >> In order to fix this, we first have to decide what the semantics > > ought > > > > >> to be. I think there are two plausible definitions: > > > > >> 1. If all the expressions in the VALUES column share the same > > typmod, > > > > >> use that typmod, else use -1. > > > > >> 2. Use -1 whenever there is more than one VALUES row. > > > > > > > > > Can we be precise enough to perform #2 if the top-level (or > > immediate > > > > > parent) command is an INSERT - the existing table is going to > > enforce its > > > > > own typemod anyway, otherwise go with #1? > > > > > > > > I dunno if that's "precise" or just "randomly inconsistent" ;-) > > > > > > > > > > :) > > > > > > How does "targeted optimization" sound? > > > > (sorry I don't understand what the "targetted optimization" is..) > > > > I guess its a bit misleading depending on the implementation chosen. The > general thought is that we simply ignore typemod information in VALUES if > we have been told otherwise what that typemod will be (in this case an > insert into column will use the typemod of the existing column regardless > of the input data's typemod). I think I understand that. Fitting the source in VALUES into the destination column types. That sounds reasonable. > > FWIW, different from the UNION case, I don't see a reason that > > every row in a VALUES clause shares anything common with any > > other rows. Of course typmod and even type are not to be > > shared. (Type is shared, though.) > > > > You have a typo here somewhere..."type not to be shared. (Type is shared, > though)" doesn't work. (It's not typo but my poor wording... Sorry.) Mmm. I think the typemod of a row should not be applied onto other rows, and the same can be said for types. But type of the first row is applied to all of the rest rows, though... Does it make sense? > > On the other hand, if we make all values to be strictly typed (I > > mean that every value brings its own type information along > > with), values also can consider strict type. But currently the > > following command is ignoring the type of the first value. > > > > =# select 'bar'::varchar(4) || ''; > > ?column? > > -- > > bar > > > > > Its not ignored - is discarded during coercion to make the || operator > work on the same type (text). > > SELECT '12345'::varchar(3) || '678' Hmm, sorry. Coercion seems happen in the direction to the more "free" (or robust?) side. The results of both int + float and float + int are double precision. But what I wanted to say was not that but the something like the following. select pg_typeof('12345'::varchar(1)); pg_typeof --- character varying or =# select pg_typeof('12345'::numeric(6, 1)); pg_typeof --- numeric A value seemingly doesn't have typmod. So it seems reasonable that VALUES cannot handle typmod. It seems reasonable regardless of how it is acutually implemented. > > Even though I'm not sure about SQL standard here but my > > feeling is something like the following. > > > > | FROM ( > > | VALUES (row 1), .. (row n)) > > | AS foo (colname *type*, ..) > > > > for this case, > > > > | create temporary table product_codes as select * > > | from ( > > | values > > | ('abcdefg'), > > | ('012345678901234567ABCDEFGHIJKLMN') > > | ) csv_data (product_code character varying(20)); > > > > Myself have gotten errors for this false syntax several times:( > > > > Only the function in from form of this accepts a column definition in lieu > of a simple alias. Regardless of the merits of this idea it would not be > backpatch-able and wouldn't resolve the current valid syntax problem. Yeah.. It wouldn't be back-patchable. I personally think that it is not necessary to be "solve"d, since a value doesn't rigged with typmod. But I undetstand what we want to solve here is how to change VALUES's behavior to perfectly coerce the row values to the types explicity applied in the first row. Right? > I suppose my option #4 has the same problem... > > > > > Unnecessary maybe, but wouldn't it be immaterial given we are only able > > to > > > be efficient when inserting exactly one row. > > > > > > There is also a #4 here to consider - if the first (or any) row is not > > type > > > unknown, and the remaining rows are all unknown, use the type and typemod > > > of the known row AND attempt coerce all of the unknowns to that same > > type. > > > I'd suggest this is probably the most user-friendly
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello, > > At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" < > david.g.johns...@gmail.com> wrote in v8nr0FsCFrQ=oo1dkp...@mail.gmail.com> > > On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane wrote: > > > > > "David G. Johnston" writes: > > > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane wrote: > > > >> In order to fix this, we first have to decide what the semantics > ought > > > >> to be. I think there are two plausible definitions: > > > >> 1. If all the expressions in the VALUES column share the same > typmod, > > > >> use that typmod, else use -1. > > > >> 2. Use -1 whenever there is more than one VALUES row. > > > > > > > Can we be precise enough to perform #2 if the top-level (or > immediate > > > > parent) command is an INSERT - the existing table is going to > enforce its > > > > own typemod anyway, otherwise go with #1? > > > > > > I dunno if that's "precise" or just "randomly inconsistent" ;-) > > > > > > > :) > > > > How does "targeted optimization" sound? > > (sorry I don't understand what the "targetted optimization" is..) > I guess its a bit misleading depending on the implementation chosen. The general thought is that we simply ignore typemod information in VALUES if we have been told otherwise what that typemod will be (in this case an insert into column will use the typemod of the existing column regardless of the input data's typemod). > FWIW, different from the UNION case, I don't see a reason that > every row in a VALUES clause shares anything common with any > other rows. Of course typmod and even type are not to be > shared. (Type is shared, though.) > You have a typo here somewhere..."type not to be shared. (Type is shared, though)" doesn't work. > > On the other hand, if we make all values to be strictly typed (I > mean that every value brings its own type information along > with), values also can consider strict type. But currently the > following command is ignoring the type of the first value. > > =# select 'bar'::varchar(4) || ''; > ?column? > -- > bar > > Its not ignored - is discarded during coercion to make the || operator work on the same type (text). SELECT '12345'::varchar(3) || '678' > Even though I'm not sure about SQL standard here but my > feeling is something like the following. > > | FROM ( > | VALUES (row 1), .. (row n)) > | AS foo (colname *type*, ..) > > for this case, > > | create temporary table product_codes as select * > | from ( > | values > | ('abcdefg'), > | ('012345678901234567ABCDEFGHIJKLMN') > | ) csv_data (product_code character varying(20)); > > Myself have gotten errors for this false syntax several times:( > Only the function in from form of this accepts a column definition in lieu of a simple alias. Regardless of the merits of this idea it would not be backpatch-able and wouldn't resolve the current valid syntax problem. I suppose my option #4 has the same problem... > > Unnecessary maybe, but wouldn't it be immaterial given we are only able > to > > be efficient when inserting exactly one row. > > > > There is also a #4 here to consider - if the first (or any) row is not > type > > unknown, and the remaining rows are all unknown, use the type and typemod > > of the known row AND attempt coerce all of the unknowns to that same > type. > > I'd suggest this is probably the most user-friendly option (do as I mean, > > not as I say). The OP query would then fail since the second literal is > > too long to fit in a varchar(20) - I would not want the value truncated > so > > an actual cast wouldn't work. > > 1 has a type of int, true 1.0 has a type of float false - its numeric select pg_typeof(1.0) -> numeric and '1' has a typ > e > of text. false - its unknown (but implicitly cast-able) select pgtype_of('1') -> error, pg_typeof(unknown) does not exist So I don't see a situation where only the first row is > detectably typed. Or is it means that only the first row is > explicitly typed? I do indeed mean explicitly typed here. Using 1 or 1.0 in a values would be a form of explicit typing in this sense. David J.
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
On 6 December 2016 at 04:52, David G. Johnston wrote: > Can we be precise enough to perform #2 if the top-level (or immediate > parent) command is an INSERT - the existing table is going to enforce its > own typemod anyway, otherwise go with #1? We already routinely throw away typmod information in other places, and can only truly rely on it when working directly with data in tables. So it sounds sensible to me. I see semi-regular complaints about this from JDBC users, who want to be able to know things like "number of digits in this numeric" and "can this column be null" even through various projections and transformations of results. But I don't think your suggested change will make the existing situation any worse. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
Hello, At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" wrote in > On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane wrote: > > > "David G. Johnston" writes: > > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane wrote: > > >> In order to fix this, we first have to decide what the semantics ought > > >> to be. I think there are two plausible definitions: > > >> 1. If all the expressions in the VALUES column share the same typmod, > > >> use that typmod, else use -1. > > >> 2. Use -1 whenever there is more than one VALUES row. > > > > > Can we be precise enough to perform #2 if the top-level (or immediate > > > parent) command is an INSERT - the existing table is going to enforce its > > > own typemod anyway, otherwise go with #1? > > > > I dunno if that's "precise" or just "randomly inconsistent" ;-) > > > > :) > > How does "targeted optimization" sound? (sorry I don't understand what the "targetted optimization" is..) FWIW, different from the UNION case, I don't see a reason that every row in a VALUES clause shares anything common with any other rows. Of course typmod and even type are not to be shared. (Type is shared, though.) On the other hand, if we make all values to be strictly typed (I mean that every value brings its own type information along with), values also can consider strict type. But currently the following command is ignoring the type of the first value. =# select 'bar'::varchar(4) || ''; ?column? -- bar > > > Lacking that possibility I'd say that documenting that our treatment of > > > typemod in VALUES is similar to our treatment of typemod in function > > > arguments would be acceptable. This suggests a #3 - simply use "-1" > > > regardless of the number of rows in the VALUES expression. > > > > I'm a bit concerned about whether that would introduce overhead that we > > avoid today, in particular for something like > > > > insert into foo (varchar20col) values ('bar'::varchar(20)); > > > > I think if we throw away the knowledge that the VALUES row produces the > > right typmod already, we'd end up adding an unnecessary runtime coercion > > step. Is it means that something like this? insert into foo (varchar20col) select a::varchar(20) from (values ('bar')) as a; Even though I'm not sure about SQL standard here but my feeling is something like the following. | FROM ( | VALUES (row 1), .. (row n)) | AS foo (colname *type*, ..) for this case, | create temporary table product_codes as select * | from ( | values | ('abcdefg'), | ('012345678901234567ABCDEFGHIJKLMN') | ) csv_data (product_code character varying(20)); Myself have gotten errors for this false syntax several times:( > Unnecessary maybe, but wouldn't it be immaterial given we are only able to > be efficient when inserting exactly one row. > > There is also a #4 here to consider - if the first (or any) row is not type > unknown, and the remaining rows are all unknown, use the type and typemod > of the known row AND attempt coerce all of the unknowns to that same type. > I'd suggest this is probably the most user-friendly option (do as I mean, > not as I say). The OP query would then fail since the second literal is > too long to fit in a varchar(20) - I would not want the value truncated so > an actual cast wouldn't work. 1 has a type of int, 1.0 has a type of float and '1' has a type of text. So I don't see a situation where only the first row is detectably typed. Or is it means that only the first row is explicitly typed? I agree that it would be an option but I prefer the above syntax (#5?) instead for the same purpose. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane wrote: > "David G. Johnston" writes: > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane wrote: > >> In order to fix this, we first have to decide what the semantics ought > >> to be. I think there are two plausible definitions: > >> 1. If all the expressions in the VALUES column share the same typmod, > >> use that typmod, else use -1. > >> 2. Use -1 whenever there is more than one VALUES row. > > > Can we be precise enough to perform #2 if the top-level (or immediate > > parent) command is an INSERT - the existing table is going to enforce its > > own typemod anyway, otherwise go with #1? > > I dunno if that's "precise" or just "randomly inconsistent" ;-) > :) How does "targeted optimization" sound? > > Lacking that possibility I'd say that documenting that our treatment of > > typemod in VALUES is similar to our treatment of typemod in function > > arguments would be acceptable. This suggests a #3 - simply use "-1" > > regardless of the number of rows in the VALUES expression. > > I'm a bit concerned about whether that would introduce overhead that we > avoid today, in particular for something like > > insert into foo (varchar20col) values ('bar'::varchar(20)); > > I think if we throw away the knowledge that the VALUES row produces the > right typmod already, we'd end up adding an unnecessary runtime coercion > step. > Unnecessary maybe, but wouldn't it be immaterial given we are only able to be efficient when inserting exactly one row. There is also a #4 here to consider - if the first (or any) row is not type unknown, and the remaining rows are all unknown, use the type and typemod of the known row AND attempt coerce all of the unknowns to that same type. I'd suggest this is probably the most user-friendly option (do as I mean, not as I say). The OP query would then fail since the second literal is too long to fit in a varchar(20) - I would not want the value truncated so an actual cast wouldn't work. David J. David J.
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
"David G. Johnston" writes: > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane wrote: >> In order to fix this, we first have to decide what the semantics ought >> to be. I think there are two plausible definitions: >> 1. If all the expressions in the VALUES column share the same typmod, >> use that typmod, else use -1. >> 2. Use -1 whenever there is more than one VALUES row. > Can we be precise enough to perform #2 if the top-level (or immediate > parent) command is an INSERT - the existing table is going to enforce its > own typemod anyway, otherwise go with #1? I dunno if that's "precise" or just "randomly inconsistent" ;-) > Lacking that possibility I'd say that documenting that our treatment of > typemod in VALUES is similar to our treatment of typemod in function > arguments would be acceptable. This suggests a #3 - simply use "-1" > regardless of the number of rows in the VALUES expression. I'm a bit concerned about whether that would introduce overhead that we avoid today, in particular for something like insert into foo (varchar20col) values ('bar'::varchar(20)); I think if we throw away the knowledge that the VALUES row produces the right typmod already, we'd end up adding an unnecessary runtime coercion step. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typmod associated with multi-row VALUES constructs
On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane wrote: > I looked into the issue reported in bug #14448, > https://www.postgresql.org/message-id/20161205143037. > 4377.60754%40wrigleys.postgresql.org > > The core of it seems to be that expandRTE() will report the type and > typmod of a column of a VALUES construct as being exprType() and > exprTypmod() of the corresponding expression in the first row of > the VALUES. It's okay to handle data type that way, because we've > coerced all the expressions for the column to the same type; but > we have *not* coerced them to the same typmod. So some of the values > from later rows may fail to meet the claimed typmod. This is not good. > > In order to fix this, we first have to decide what the semantics ought > to be. I think there are two plausible definitions: > > 1. If all the expressions in the VALUES column share the same typmod, > use that typmod, else use -1. > > 2. Use -1 whenever there is more than one VALUES row. > > #1 is what we do for some comparable cases such as UNION and CASE. > However, it's potentially quite expensive for large VALUES constructs. > #2 would be a lot cheaper, and given that this is the first complaint > we've gotten in all the years we've had multi-row-VALUES support, it's > not clear that deriving a precise typmod is really all that useful > for VALUES. > > I have no strong preference between these two, but I think whatever > we do needs to be back-patched. The behavior described in the bug > report is definitely broken. > > Thoughts? > Can we be precise enough to perform #2 if the top-level (or immediate parent) command is an INSERT - the existing table is going to enforce its own typemod anyway, otherwise go with #1? Lacking that possibility I'd say that documenting that our treatment of typemod in VALUES is similar to our treatment of typemod in function arguments would be acceptable. This suggests a #3 - simply use "-1" regardless of the number of rows in the VALUES expression. David J.
[HACKERS] Typmod associated with multi-row VALUES constructs
I looked into the issue reported in bug #14448, https://www.postgresql.org/message-id/20161205143037.4377.60754%40wrigleys.postgresql.org The core of it seems to be that expandRTE() will report the type and typmod of a column of a VALUES construct as being exprType() and exprTypmod() of the corresponding expression in the first row of the VALUES. It's okay to handle data type that way, because we've coerced all the expressions for the column to the same type; but we have *not* coerced them to the same typmod. So some of the values from later rows may fail to meet the claimed typmod. This is not good. In order to fix this, we first have to decide what the semantics ought to be. I think there are two plausible definitions: 1. If all the expressions in the VALUES column share the same typmod, use that typmod, else use -1. 2. Use -1 whenever there is more than one VALUES row. #1 is what we do for some comparable cases such as UNION and CASE. However, it's potentially quite expensive for large VALUES constructs. #2 would be a lot cheaper, and given that this is the first complaint we've gotten in all the years we've had multi-row-VALUES support, it's not clear that deriving a precise typmod is really all that useful for VALUES. I have no strong preference between these two, but I think whatever we do needs to be back-patched. The behavior described in the bug report is definitely broken. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers