subject:"\[HACKERS\] Typmod associated with multi\-row VALUES constructs"

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-13 Thread Kyotaro HORIGUCHI

Hello, it seems to me to work as expected.

At Thu, 08 Dec 2016 15:58:10 -0500, Tom Lane  wrote in 
<7083.1481230...@sss.pgh.pa.us>
> I've pushed the previous patch to HEAD.  Attached is a proposed patch
> (against 9.6) that we could use for the back branches; it takes the
> brute force approach of just computing the correct value on-demand
> in the two functions at issue.  The key question of course is whether
> this is acceptable from a performance standpoint.  I did a simple test
> 
> using a 1000-entry VALUES list:
> 
> select count(a) from (
> values
>   ('0'::varchar(3), '0'::varchar(4)),
>   ('1'::varchar(3), '1'::varchar(4)),
>   ('2'::varchar(3), '2'::varchar(4)),
>   ('3'::varchar(3), '3'::varchar(4)),
>   ('4'::varchar(3), '4'::varchar(4)),
>   ...
>   ('996'::varchar(3), '996'::varchar(4)),
>   ('997'::varchar(3), '997'::varchar(4)),
>   ('998'::varchar(3), '998'::varchar(4)),
>   ('999'::varchar(3), '999'::varchar(4))
> ) v(a,b);
> 
> Since all the rows do have the same typmod, this represents the worst
> case where we have to scan all the way to the end to confirm the typmod,
> and it has about as little overhead otherwise as I could think of doing.
> I ran it like this:
> 
> pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression
> 
> and could not see any above-the-noise-level difference --- in fact,
> it seemed like it was faster *with* the patch, which is obviously
> impossible; I blame that on chance realignments of loops vs. cache
> line boundaries.
> 
> So I think this is an okay candidate for back-patching.  If anyone
> wants to do their own performance tests, please do.

For all the additional computation, I had the same result on
9.6. The patch accelerates the processing around the noise rate..

9.6 without the patch 103.2 tps
9.6 withthe patch 108.3 tps

For the master branch,

master102.9 tps
0b78106c^ 103.4 tps

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Tom Lane

I've pushed the previous patch to HEAD.  Attached is a proposed patch
(against 9.6) that we could use for the back branches; it takes the
brute force approach of just computing the correct value on-demand
in the two functions at issue.  The key question of course is whether
this is acceptable from a performance standpoint.  I did a simple test
using a 1000-entry VALUES list:

select count(a) from (
values
  ('0'::varchar(3), '0'::varchar(4)),
  ('1'::varchar(3), '1'::varchar(4)),
  ('2'::varchar(3), '2'::varchar(4)),
  ('3'::varchar(3), '3'::varchar(4)),
  ('4'::varchar(3), '4'::varchar(4)),
  ...
  ('996'::varchar(3), '996'::varchar(4)),
  ('997'::varchar(3), '997'::varchar(4)),
  ('998'::varchar(3), '998'::varchar(4)),
  ('999'::varchar(3), '999'::varchar(4))
) v(a,b);

Since all the rows do have the same typmod, this represents the worst
case where we have to scan all the way to the end to confirm the typmod,
and it has about as little overhead otherwise as I could think of doing.
I ran it like this:

pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression

and could not see any above-the-noise-level difference --- in fact,
it seemed like it was faster *with* the patch, which is obviously
impossible; I blame that on chance realignments of loops vs. cache
line boundaries.

So I think this is an okay candidate for back-patching.  If anyone
wants to do their own performance tests, please do.

regards, tom lane

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 1e3ecbc..c51fd81 100644
*** a/src/backend/parser/parse_relation.c
--- b/src/backend/parser/parse_relation.c
*** static void expandTupleDesc(TupleDesc tu
*** 52,57 
--- 52,58 
  int rtindex, int sublevels_up,
  int location, bool include_dropped,
  List **colnames, List **colvars);
+ static int32 *getValuesTypmods(RangeTblEntry *rte);
  static int	specialAttNum(const char *attname);
  static bool isQueryUsingTempRelation_walker(Node *node, void *context);
  
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2157,2165 
--- 2158,2179 
  			{
  /* Values RTE */
  ListCell   *aliasp_item = list_head(rte->eref->colnames);
+ int32	   *coltypmods;
  ListCell   *lcv;
  ListCell   *lcc;
  
+ /*
+  * It's okay to extract column types from the expressions in
+  * the first row, since all rows will have been coerced to the
+  * same types.  Their typmods might not be the same though, so
+  * we potentially need to examine all rows to compute those.
+  * Column collations are pre-computed in values_collations.
+  */
+ if (colvars)
+ 	coltypmods = getValuesTypmods(rte);
+ else
+ 	coltypmods = NULL;
+ 
  varattno = 0;
  forboth(lcv, (List *) linitial(rte->values_lists),
  		lcc, rte->values_collations)
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2184,2196 
  
  		varnode = makeVar(rtindex, varattno,
  		  exprType(col),
! 		  exprTypmod(col),
  		  colcollation,
  		  sublevels_up);
  		varnode->location = location;
  		*colvars = lappend(*colvars, varnode);
  	}
  }
  			}
  			break;
  		case RTE_JOIN:
--- 2198,2212 
  
  		varnode = makeVar(rtindex, varattno,
  		  exprType(col),
! 		  coltypmods[varattno - 1],
  		  colcollation,
  		  sublevels_up);
  		varnode->location = location;
  		*colvars = lappend(*colvars, varnode);
  	}
  }
+ if (coltypmods)
+ 	pfree(coltypmods);
  			}
  			break;
  		case RTE_JOIN:
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2296,2301 
--- 2312,2319 
  		varnode = makeVar(rtindex, varattno,
  		  coltype, coltypmod, colcoll,
  		  sublevels_up);
+ 		varnode->location = location;
+ 
  		*colvars = lappend(*colvars, varnode);
  	}
  }
*** expandTupleDesc(TupleDesc tupdesc, Alias
*** 2413,2418 
--- 2431,2504 
  }
  
  /*
+  * getValuesTypmods -- expandRTE subroutine
+  *
+  * Identify per-column typmods for the given VALUES RTE.  Returns a
+  * palloc'd array.
+  */
+ static int32 *
+ getValuesTypmods(RangeTblEntry *rte)
+ {
+ 	int32	   *coltypmods;
+ 	List	   *firstrow;
+ 	int			ncolumns,
+ nvalid,
+ i;
+ 	ListCell   *lc;
+ 
+ 	Assert(rte->values_lists != NIL);
+ 	firstrow = (List *) linitial(rte->values_lists);
+ 	ncolumns = list_length(firstrow);
+ 	coltypmods = (int32 *) palloc(ncolumns * sizeof(int32));
+ 	nvalid = 0;
+ 
+ 	/* Collect the typmods from the first VALUES row */
+ 	i = 0;
+ 	foreach(lc, firstrow)
+ 	{
+ 		Node	   *col = (Node *) lfirst(lc);
+ 
+ 		coltypmods[i] = exprTypmod(col);
+ 		if (coltypmods[i] >= 0)
+ 			nvalid++;
+ 		i++;
+ 	}
+ 
+ 	/*
+ 	 * Scan remaining rows; as soon as we have a non-matching typmod for a
+ 	 * column, reset that typmod to -1.  We can bail out early if all typmods
+ 	 *

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Pavel Stehule

2016-12-08 14:03 GMT+01:00 Alvaro Herrera :

> Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > Tom Lane wrote:
> > >> In HEAD, we could change the RTE data structure so that
> > >> transformValuesClause could save the typmod information in the RTE,
> > >> keeping the lookups cheap.
> >
> > > Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> > > a bit about it at
> > > https://www.postgresql.org/message-id/20161122204730.
> dgipy6gxi25j4e6a@alvherre.pgsql
> >
> > I dunno.  If your example there is correct that XMLTABLE can be called as
> > a plain function in a SELECT list, then I doubt that we want to tie
> > anything about it to the RTE data structure.  If anything, the case where
> > it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
> > case.
>
> Well, XMLTABLE is specified by the standard to be part of ,
> which it turn is part of .  I can't immediately tell
> whether it allows XMLTABLE to be called like a regular function.  The
> current patch allows it, but maybe that's not right, and it's probably
> not that useful anyway.
>

It looks like function, and we support on both sides, so I implemented
both.

Probably, there is only 10 rows more related to this feature. Using this
function in target list is not critical feature - now with LATERAL JOIN we
can live without it. It is just some few steps forward to our user.

Again - implementation of this feature is probably few lines only.


>
> > I've been trying to avoid getting involved in the XMLTABLE patch, mainly
> > because I know zip about XML, but maybe I need to take a look.
>
> I think it'd be productive that you did so.  The XML part of it is
> reasonably well isolated, so you could give your opinion on the core
> parser / executor parts without looking at the XML part.
>

The critical part has zero relation to XML. All is some game with tupledesc.

Regards

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> In HEAD, we could change the RTE data structure so that
> >> transformValuesClause could save the typmod information in the RTE,
> >> keeping the lookups cheap.
> 
> > Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> > a bit about it at
> > https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql
> 
> I dunno.  If your example there is correct that XMLTABLE can be called as
> a plain function in a SELECT list, then I doubt that we want to tie
> anything about it to the RTE data structure.  If anything, the case where
> it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
> case.

Well, XMLTABLE is specified by the standard to be part of ,
which it turn is part of .  I can't immediately tell
whether it allows XMLTABLE to be called like a regular function.  The
current patch allows it, but maybe that's not right, and it's probably
not that useful anyway.

> I've been trying to avoid getting involved in the XMLTABLE patch, mainly
> because I know zip about XML, but maybe I need to take a look.

I think it'd be productive that you did so.  The XML part of it is
reasonably well isolated, so you could give your opinion on the core
parser / executor parts without looking at the XML part.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Kyotaro HORIGUCHI

Mmm. I did the same thing in select_common_type and resulted in a
messier (a bit).

At Wed, 07 Dec 2016 23:44:19 -0500, Tom Lane  wrote in 
<15128.1481172...@sss.pgh.pa.us>
> Attached is a patch that fixes this by storing typmod info in the RTE.
> This turned out to be straightforward, and I think it's definitely
> what we should do in HEAD.  I have mixed emotions about whether it's
> worth doing anything about it in the back branches.

With it, VALUES works as the same as UNION, as documentation
says.

https://www.postgresql.org/docs/8.2/static/queries-values.html

Past versions have the same documentation so back-patching the
*behavior* seems to me worth doing. Instead of modifying RTE,
re-coercing the first row's value would enough (I'm not sure how
to do that now) for back-patching.

> I chose to redefine the existing coltypes/coltypmods/colcollations
> lists for CTE RTEs as also applying to VALUES RTEs.  That saves a
> little space in the RangeTblEntry nodes and allows sharing code
> in a couple of places.  It's tempting to consider making that apply
> to all RTE types, which would permit collapsing expandRTE() and
> get_rte_attribute_type() into a single case.  But AFAICS there would
> be no benefit elsewhere, so I'm not sure the extra code churn is
> justified.

Agreed.

> BTW, I noticed that the CTE case of expandRTE() fails to assign the
> specified location to the generated Vars, which is clearly a bug
> though a very minor one; it would result in failing to display a
> parse error location in some cases where we would do so for Vars from
> other RTE types.  That part might be worth back-patching, not sure.

If we do back-patching VALUES patch, involving it would
worth, I think.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Pavel Stehule

2016-12-07 22:17 GMT+01:00 Alvaro Herrera :

> Tom Lane wrote:
>
> > In HEAD, we could change the RTE data structure so that
> > transformValuesClause could save the typmod information in the RTE,
> > keeping the lookups cheap.
>
> Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> a bit about it at
> https://www.postgresql.org/message-id/20161122204730.
> dgipy6gxi25j4e6a@alvherre.pgsql
>
> The patch has evolved quite a bit since then, but the tupdesc continues
> to be a problem.  Latest patch is at
> https://www.postgresql.org/message-id/CAFj8pRBsrhwR636-_
> 3TPbqu%3DFo3_DDer6_yp_afzR7qzhW1T6Q%40mail.gmail.com


What do you dislike on tupdesc usage there?

regards

Pavel

>
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Tom Lane

Attached is a patch that fixes this by storing typmod info in the RTE.
This turned out to be straightforward, and I think it's definitely
what we should do in HEAD.  I have mixed emotions about whether it's
worth doing anything about it in the back branches.

I chose to redefine the existing coltypes/coltypmods/colcollations
lists for CTE RTEs as also applying to VALUES RTEs.  That saves a
little space in the RangeTblEntry nodes and allows sharing code
in a couple of places.  It's tempting to consider making that apply
to all RTE types, which would permit collapsing expandRTE() and
get_rte_attribute_type() into a single case.  But AFAICS there would
be no benefit elsewhere, so I'm not sure the extra code churn is
justified.

BTW, I noticed that the CTE case of expandRTE() fails to assign the
specified location to the generated Vars, which is clearly a bug
though a very minor one; it would result in failing to display a
parse error location in some cases where we would do so for Vars from
other RTE types.  That part might be worth back-patching, not sure.

regards, tom lane

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e30c57e..d973225 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*** _copyRangeTblEntry(const RangeTblEntry *
*** 2149,2161 
  	COPY_NODE_FIELD(functions);
  	COPY_SCALAR_FIELD(funcordinality);
  	COPY_NODE_FIELD(values_lists);
- 	COPY_NODE_FIELD(values_collations);
  	COPY_STRING_FIELD(ctename);
  	COPY_SCALAR_FIELD(ctelevelsup);
  	COPY_SCALAR_FIELD(self_reference);
! 	COPY_NODE_FIELD(ctecoltypes);
! 	COPY_NODE_FIELD(ctecoltypmods);
! 	COPY_NODE_FIELD(ctecolcollations);
  	COPY_NODE_FIELD(alias);
  	COPY_NODE_FIELD(eref);
  	COPY_SCALAR_FIELD(lateral);
--- 2149,2160 
  	COPY_NODE_FIELD(functions);
  	COPY_SCALAR_FIELD(funcordinality);
  	COPY_NODE_FIELD(values_lists);
  	COPY_STRING_FIELD(ctename);
  	COPY_SCALAR_FIELD(ctelevelsup);
  	COPY_SCALAR_FIELD(self_reference);
! 	COPY_NODE_FIELD(coltypes);
! 	COPY_NODE_FIELD(coltypmods);
! 	COPY_NODE_FIELD(colcollations);
  	COPY_NODE_FIELD(alias);
  	COPY_NODE_FIELD(eref);
  	COPY_SCALAR_FIELD(lateral);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b7a109c..edc1797 100644
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
*** _equalRangeTblEntry(const RangeTblEntry 
*** 2460,2472 
  	COMPARE_NODE_FIELD(functions);
  	COMPARE_SCALAR_FIELD(funcordinality);
  	COMPARE_NODE_FIELD(values_lists);
- 	COMPARE_NODE_FIELD(values_collations);
  	COMPARE_STRING_FIELD(ctename);
  	COMPARE_SCALAR_FIELD(ctelevelsup);
  	COMPARE_SCALAR_FIELD(self_reference);
! 	COMPARE_NODE_FIELD(ctecoltypes);
! 	COMPARE_NODE_FIELD(ctecoltypmods);
! 	COMPARE_NODE_FIELD(ctecolcollations);
  	COMPARE_NODE_FIELD(alias);
  	COMPARE_NODE_FIELD(eref);
  	COMPARE_SCALAR_FIELD(lateral);
--- 2460,2471 
  	COMPARE_NODE_FIELD(functions);
  	COMPARE_SCALAR_FIELD(funcordinality);
  	COMPARE_NODE_FIELD(values_lists);
  	COMPARE_STRING_FIELD(ctename);
  	COMPARE_SCALAR_FIELD(ctelevelsup);
  	COMPARE_SCALAR_FIELD(self_reference);
! 	COMPARE_NODE_FIELD(coltypes);
! 	COMPARE_NODE_FIELD(coltypmods);
! 	COMPARE_NODE_FIELD(colcollations);
  	COMPARE_NODE_FIELD(alias);
  	COMPARE_NODE_FIELD(eref);
  	COMPARE_SCALAR_FIELD(lateral);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 0d858f5..7258c03 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outRangeTblEntry(StringInfo str, const 
*** 2841,2855 
  			break;
  		case RTE_VALUES:
  			WRITE_NODE_FIELD(values_lists);
! 			WRITE_NODE_FIELD(values_collations);
  			break;
  		case RTE_CTE:
  			WRITE_STRING_FIELD(ctename);
  			WRITE_UINT_FIELD(ctelevelsup);
  			WRITE_BOOL_FIELD(self_reference);
! 			WRITE_NODE_FIELD(ctecoltypes);
! 			WRITE_NODE_FIELD(ctecoltypmods);
! 			WRITE_NODE_FIELD(ctecolcollations);
  			break;
  		default:
  			elog(ERROR, "unrecognized RTE kind: %d", (int) node->rtekind);
--- 2841,2857 
  			break;
  		case RTE_VALUES:
  			WRITE_NODE_FIELD(values_lists);
! 			WRITE_NODE_FIELD(coltypes);
! 			WRITE_NODE_FIELD(coltypmods);
! 			WRITE_NODE_FIELD(colcollations);
  			break;
  		case RTE_CTE:
  			WRITE_STRING_FIELD(ctename);
  			WRITE_UINT_FIELD(ctelevelsup);
  			WRITE_BOOL_FIELD(self_reference);
! 			WRITE_NODE_FIELD(coltypes);
! 			WRITE_NODE_FIELD(coltypmods);
! 			WRITE_NODE_FIELD(colcollations);
  			break;
  		default:
  			elog(ERROR, "unrecognized RTE kind: %d", (int) node->rtekind);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c587d4e..d608530 100644
*** a/src/backend/nodes/readfuncs.c
--- b/src/backend/nodes/readfuncs.c
*** _readRangeTblEntry(void)
*** 1314,1328 
  			break;
  		case RTE_VALUES:
  			READ_NODE_FIELD(values_lists);
! 			READ_NODE_FIELD(values_collations);
  			break;
  		case RTE_

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> In HEAD, we could change the RTE data structure so that
>> transformValuesClause could save the typmod information in the RTE,
>> keeping the lookups cheap.

> Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> a bit about it at
> https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql

I dunno.  If your example there is correct that XMLTABLE can be called as
a plain function in a SELECT list, then I doubt that we want to tie
anything about it to the RTE data structure.  If anything, the case where
it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
case.

I've been trying to avoid getting involved in the XMLTABLE patch, mainly
because I know zip about XML, but maybe I need to take a look.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Alvaro Herrera

Tom Lane wrote:

> In HEAD, we could change the RTE data structure so that
> transformValuesClause could save the typmod information in the RTE,
> keeping the lookups cheap.

Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
a bit about it at
https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql

The patch has evolved quite a bit since then, but the tupdesc continues
to be a problem.  Latest patch is at
https://www.postgresql.org/message-id/CAFj8pRBsrhwR636-_3TPbqu%3DFo3_DDer6_yp_afzR7qzhW1T6Q%40mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread David G. Johnston

On Wed, Dec 7, 2016 at 1:03 PM, Tom Lane  wrote:

> Still, things have been like this since 8.2 when we implemented multi-row
> VALUES, and nobody's noticed up to now.  Maybe the right answer is to
> change the data structure in HEAD and decree that we won't fix it in back
> branches.  I don't really like that answer though ...
>

The merit of "won't back-patch" is that at least you don't punish those
who are being lazy (in a good sense) but generating values in subsequent
lines that conform to the type specification of the first record.  We
already implicitly accept such behavior elsewhere - though probably with
better validation - so keeping it here is defense-able.  We dislike
changing query plans in back branches and this really isn't that different.

The concern, especially since this can propagate to a CREATE TABLE AS, is
whether there is some kind of fundamental storage risk being introduced
that we do not want to have happen no matter how rare.  /me feels deja-vu...

David J.

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Tom Lane

Kyotaro HORIGUCHI  writes:
> At Mon, 5 Dec 2016 21:54:08 -0700, "David G. Johnston" 
>  wrote in 
> 
>> If every row passes you can retain the typemod.  That is arguably the
>> perfect solution.  The concern is that "scan every row" could be very
>> expensive - though in writing this I'm thinking that you'd quickly find a

> I found that it already scans the whole list. select_common_type
> does that. And it returns the type that is found to be applicable
> on all values. Handling typmod there has a small
> impact. (Though, I don't assert that we should do this.)

The problem is that there is not where the impact is.  It would be really
easy for transformValuesClause to check for a common typmod while it's
transforming the construct, but it has noplace to put the information.
The place where the info is needed is in expandRTE and
get_rte_attribute_type (which is called by make_var).  So basically,
parsing of each Var reference to a VALUES subquery would potentially have
to scan all rows of the VALUES list to identify the right typmod to give
to the Var.  The repetitiveness of that is what's bothering me.

In HEAD, we could change the RTE data structure so that
transformValuesClause could save the typmod information in the RTE,
keeping the lookups cheap.  But that option is not available to us in
released branches.

On the other hand, we might be worrying over nothing.  I think that
in typical use, expandRTE() will be done just once per query, and it
could amortize the cost by considering all the rows in one scan.
get_rte_attribute_type would be a problem except that I believe it's
rarely used for VALUES RTEs --- our code coverage report shows that that
switch branch isn't reached at all in the standard regression tests.
(The reason for the above statements is that we convert a bare VALUES to
SELECT * FROM VALUES, and the * is expanded by expandRTE, not by retail
applications of make_var.)

> Ok, I think all of the #1 to #5 are the change of behavior in
> this criteria. What we shold resolve here is the inconsistent
> table generated by create table as select from (values...

Well, that's one symptom, but not the only one; consider

# select x::varchar(4) from (values ('z'::varchar(4)), ('0123456789')) v(x);
 x  

 z
 0123456789
(2 rows)

The Var representing v.x is (mis)labeled as already having the right typmod
for varchar(4), so transformTypeCast concludes that no run-time work is
needed, and you get clearly-wrong answers out.

Still, things have been like this since 8.2 when we implemented multi-row
VALUES, and nobody's noticed up to now.  Maybe the right answer is to
change the data structure in HEAD and decree that we won't fix it in back
branches.  I don't really like that answer though ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 5 Dec 2016 21:54:08 -0700, "David G. Johnston" 
 wrote in 

> feel free to s/typemod/typmod/ ... my fingers don't want to drop the "e"
> 
> On Mon, Dec 5, 2016 at 9:17 PM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> 
> > Hello,
> >
> > At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" <
> > david.g.johns...@gmail.com> wrote in  > zrvf9qia0wwt_qzmauy...@mail.gmail.com>
> > > On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <
> > > horiguchi.kyot...@lab.ntt.co.jp> wrote:
> > >
> >
> > (It's not typo but my poor wording... Sorry.)
> > Mmm. I think the typemod of a row should not be applied onto
> > other rows, and the same can be said for types. But type of the
> > first row is applied to all of the rest rows, though... Does it
> > make sense?
> >
> 
> Yes.  All rows in a given relation must end up with the exact same type
> and it isn't friendly to fail when a implicit conversion from unknown is
> possible.

I'm not sure how clearly users are conscious of the type unkown,
my feeling is opposed to implicity applying of the type of the
first value in VALUES to all other values of the type "unknown".

On the other hand, such behavior doesn't harm anything when we
don't explicitly type the values in VALUES. (But it would be a
new feature as you wrote below)

Anyway the real behavior of the current parser for VALUES is
scanning whole the list and extract common-coerceable type, then
applying the type on the whole list. (transformValuesClause)

> > But what I wanted to say was not that but the something like the
> > following.
> >
> > select pg_typeof('12345'::varchar(1));
> >  pg_typeof
> > ---
> >  character varying
> >
> > A value seemingly doesn't have typmod. So it seems reasonable
> > that VALUES cannot handle typmod. It seems reasonable regardless
> > of how it is acutually implemented.
> >
> 
> This is an artifact of functions - the typemod associated with the value
> '12345' is lost when that value is passed into the function pg_typeof.

It seems to me what is occuring in VALUES.

> Thus it is impossible to write a SQL query the reports the typemod of a
> literal or column reference.  Nonetheless it is there in reality.  Just see
> the original CREATE TABLE AS example for proof.  The created table's column
> is shown (using direct catalog queries) to contain typemod value of 20 -
> which it could only have gotten from the first values rows which contained
> a casted literal.

Ok, I found myself have read the original problem wrongly.

=# create table t2 (a) as select * from (values ('abcdee'::varchar(3)), 
('defghij'::varchar(5))) x;
postgres=# \d t2;
 Table "public.t2"
 Column | Type | Modifiers 
+--+---
 a  | character varying(3) | 
postgres=# select * from t2;
   a   
---
 abc
 defgh
(2 rows)

The problem to be resolved here seems to be that CREATE TABLE AS
creates a broken in-a-sense table. Not a coercion of VALUES.

#6 - raise an error if a subquery's result doesn't fit the newly
 created table. Or, create a new table so that all the value
 given from subqery fit to it. (I havent' understand how the
 source typmod affects the new table and I dont'see how to do
 that, though)


> > But I undetstand what we want to solve here is how to change
> > VALUES's behavior to perfectly coerce the row values to the types
> > explicity applied in the first row. Right?
> >
> 
> Actually, no, since it is not possible to coerce "perfectly".  Since any
> attempt at doing so could fail it is only possible to scan every row and
> compare its typemod to the first row's typemod.  Ideally (but maybe not in
> reality) as soon as a discrepancy is found stop and discard the typemod.


> If every row passes you can retain the typemod.  That is arguably the
> perfect solution.  The concern is that "scan every row" could be very
> expensive - though in writing this I'm thinking that you'd quickly find a

I found that it already scans the whole list. select_common_type
does that. And it returns the type that is found to be applicable
on all values. Handling typmod there has a small
impact. (Though, I don't assert that we should do this.)

Even the value itself doesn't convey typmod, VALUES can take
expressions. (It is obvious because the first value was already a
coercing expression). They are scanned already.

> non-match even in a large dataset - and so a less perfect but still valid
> solution is to simply discard the typemod if there is more than one row.
> My thought was that if you are going to discard typemod in the n > 1 case
> for consistency you should discard the typemod in the n = 1 case as well.
> 
> That is, in a nutshell, options 1, 2, and 3 in order.
> 
> The "fault" in #1 that #4 attempted to fix was that VALUES are often hand
> entered and so the inexperienced would like to only type in a cast one the
> first row and have it will apply to all subsequent rows

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston

On Mon, Dec 5, 2016 at 9:54 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

>  The concern is that "scan every row" could be very expensive - though in
> writing this I'm thinking that you'd quickly find a non-match even in a
> large dataset - and so a less perfect but still valid solution is to simply
> discard the typemod if there is more than one row.
>

My folly here - and the actual question to ask - is if you are faced with
large dataset that does have consistent typmods - is the benefit of knowing
what it is and carrying it to the next layer worth the cost of scanning
every single row to prove it is consistent?

My vote would be no - and the only question to ask is whether n = 1 and n >
1 behaviors should differ - to which I'd say no as well - at least in
master.  In the back branches the current behavior would be retained if the
n = 1 behavior is kept different than the n > 1 behavior which is a worthy
trade-off.

David J.

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston

feel free to s/typemod/typmod/ ... my fingers don't want to drop the "e"

On Mon, Dec 5, 2016 at 9:17 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello,
>
> At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" <
> david.g.johns...@gmail.com> wrote in  zrvf9qia0wwt_qzmauy...@mail.gmail.com>
> > On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <
> > horiguchi.kyot...@lab.ntt.co.jp> wrote:
> >
>
> (It's not typo but my poor wording... Sorry.)
> Mmm. I think the typemod of a row should not be applied onto
> other rows, and the same can be said for types. But type of the
> first row is applied to all of the rest rows, though... Does it
> make sense?
>

Yes.  All rows in a given relation must end up with the exact same type
and it isn't friendly to fail when a implicit conversion from unknown is
possible.

> But what I wanted to say was not that but the something like the
> following.
>
> select pg_typeof('12345'::varchar(1));
>  pg_typeof
> ---
>  character varying
>
> A value seemingly doesn't have typmod. So it seems reasonable
> that VALUES cannot handle typmod. It seems reasonable regardless
> of how it is acutually implemented.
>

This is an artifact of functions - the typemod associated with the value
'12345' is lost when that value is passed into the function pg_typeof.
Thus it is impossible to write a SQL query the reports the typemod of a
literal or column reference.  Nonetheless it is there in reality.  Just see
the original CREATE TABLE AS example for proof.  The created table's column
is shown (using direct catalog queries) to contain typemod value of 20 -
which it could only have gotten from the first values rows which contained
a casted literal.

> > > Even though I'm not sure about SQL standard here but my
> > > feeling is something like the following.
> > >
> > > | FROM (
> > > |   VALUES (row 1), .. (row n))
> > > | AS foo (colname *type*, ..)
> > >
> > > for this case,
> > >
> > > | create temporary table product_codes  as select *
> > > | from (
> > > |   values
> > > |   ('abcdefg'),
> > > |   ('012345678901234567ABCDEFGHIJKLMN')
> > > | ) csv_data (product_code character varying(20));
> > >
> > > Myself have gotten errors for this false syntax several times:(
> > >
> >
> > Only the function in from form of this accepts a column definition in
> lieu
> > of a simple alias.  Regardless of the merits of this idea it would not be
> > backpatch-able and wouldn't resolve the current valid syntax problem.
>
> Yeah.. It wouldn't be back-patchable. I personally think that it
> is not necessary to be "solve"d, since a value doesn't rigged
> with typmod.
>
> But I undetstand what we want to solve here is how to change
> VALUES's behavior to perfectly coerce the row values to the types
> explicity applied in the first row. Right?
>

Actually, no, since it is not possible to coerce "perfectly".  Since any
attempt at doing so could fail it is only possible to scan every row and
compare its typemod to the first row's typemod.  Ideally (but maybe not in
reality) as soon as a discrepancy is found stop and discard the typemod.
If every row passes you can retain the typemod.  That is arguably the
perfect solution.  The concern is that "scan every row" could be very
expensive - though in writing this I'm thinking that you'd quickly find a
non-match even in a large dataset - and so a less perfect but still valid
solution is to simply discard the typemod if there is more than one row.
My thought was that if you are going to discard typemod in the n > 1 case
for consistency you should discard the typemod in the n = 1 case as well.

That is, in a nutshell, options 1, 2, and 3 in order.

The "fault" in #1 that #4 attempted to fix was that VALUES are often hand
entered and so the inexperienced would like to only type in a cast one the
first row and have it will apply to all subsequent rows. That would be a
feature request, though.  Your capability to add type information to any
FROM alias is likewise a feature request - solving the overall problem by
giving the author a place to specify the desired type explicitly without
having to pollute the query with excessive casts or subqueries.

David J.

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" 
 wrote in 

> On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> 
> > Hello,
> >
> > At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" <
> > david.g.johns...@gmail.com> wrote in  > v8nr0FsCFrQ=oo1dkp...@mail.gmail.com>
> > > On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane  wrote:
> > >
> > > > "David G. Johnston"  writes:
> > > > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:
> > > > >> In order to fix this, we first have to decide what the semantics
> > ought
> > > > >> to be.  I think there are two plausible definitions:
> > > > >> 1. If all the expressions in the VALUES column share the same
> > typmod,
> > > > >> use that typmod, else use -1.
> > > > >> 2. Use -1 whenever there is more than one VALUES row.
> > > >
> > > > > Can we be precise enough to perform #2 if the top-level (or
> > immediate
> > > > > parent) command is an INSERT - the existing table is going to
> > enforce its
> > > > > own typemod anyway, otherwise go with #1?
> > > >
> > > > I dunno if that's "precise" or just "randomly inconsistent" ;-)
> > > >
> > >
> > > :)
> > >
> > > How does "targeted optimization" sound?
> >
> > (sorry I don't understand what the "targetted optimization" is..)
> >
> 
> I guess its a bit misleading depending on the implementation chosen.  The
> general thought is that we simply ignore typemod information in VALUES if
> we have been told otherwise what that typemod will be (in this case an
> insert into column will use the typemod of the existing column regardless
> of the input data's typemod).

I think I understand that. Fitting the source in VALUES into the
destination column types. That sounds reasonable.

> > FWIW, different from the UNION case, I don't see a reason that
> > every row in a VALUES clause shares anything common with any
> > other rows. Of course typmod and even type are not to be
> > shared. (Type is shared, though.)
> >
> 
> You have a typo here somewhere..."type not to be shared. (Type is shared,
> though)" doesn't work.

(It's not typo but my poor wording... Sorry.)
Mmm. I think the typemod of a row should not be applied onto
other rows, and the same can be said for types. But type of the
first row is applied to all of the rest rows, though... Does it
make sense?

> > On the other hand, if we make all values to be strictly typed (I
> > mean that every value brings its own type information along
> > with), values also can consider strict type. But currently the
> > following command is ignoring the type of the first value.
> >
> > =# select 'bar'::varchar(4) || '';
> >  ?column?
> > --
> >  bar
> >
> >
> Its not ignored - is discarded during coercion to make the || operator
> work on the same type (text).  
> 
> SELECT '12345'::varchar(3) || '678'

Hmm, sorry. Coercion seems happen in the direction to the more
"free" (or robust?) side. The results of both int + float and
float + int are double precision.

But what I wanted to say was not that but the something like the
following.

select pg_typeof('12345'::varchar(1));
 pg_typeof 
---
 character varying

or 

=# select pg_typeof('12345'::numeric(6, 1));
 pg_typeof 
---
 numeric

A value seemingly doesn't have typmod. So it seems reasonable
that VALUES cannot handle typmod. It seems reasonable regardless
of how it is acutually implemented.

> > Even though I'm not sure about SQL standard here but my
> > feeling is something like the following.
> >
> > | FROM (
> > |   VALUES (row 1), .. (row n))
> > | AS foo (colname *type*, ..)
> >
> > for this case,
> >
> > | create temporary table product_codes  as select *
> > | from (
> > |   values
> > |   ('abcdefg'),
> > |   ('012345678901234567ABCDEFGHIJKLMN')
> > | ) csv_data (product_code character varying(20));
> >
> > Myself have gotten errors for this false syntax several times:(
> >
> 
> Only the function in from form of this accepts a column definition in lieu
> of a simple alias.  Regardless of the merits of this idea it would not be
> backpatch-able and wouldn't resolve the current valid syntax problem.

Yeah.. It wouldn't be back-patchable. I personally think that it
is not necessary to be "solve"d, since a value doesn't rigged
with typmod.

But I undetstand what we want to solve here is how to change
VALUES's behavior to perfectly coerce the row values to the types
explicity applied in the first row. Right?

> I suppose my option #4 has the same problem...
> 
> 
> > > Unnecessary maybe, but wouldn't it be immaterial given we are only able
> > to
> > > be efficient when inserting exactly one row.
> > >
> > > There is also a #4 here to consider - if the first (or any) row is not
> > type
> > > unknown, and the remaining rows are all unknown, use the type and typemod
> > > of the known row AND attempt coerce all of the unknowns to that same
> > type.
> > > I'd suggest this is probably the most user-friendly

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston

On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello,
>
> At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" <
> david.g.johns...@gmail.com> wrote in  v8nr0FsCFrQ=oo1dkp...@mail.gmail.com>
> > On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane  wrote:
> >
> > > "David G. Johnston"  writes:
> > > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:
> > > >> In order to fix this, we first have to decide what the semantics
> ought
> > > >> to be.  I think there are two plausible definitions:
> > > >> 1. If all the expressions in the VALUES column share the same
> typmod,
> > > >> use that typmod, else use -1.
> > > >> 2. Use -1 whenever there is more than one VALUES row.
> > >
> > > > Can we be precise enough to perform #2 if the top-level (or
> immediate
> > > > parent) command is an INSERT - the existing table is going to
> enforce its
> > > > own typemod anyway, otherwise go with #1?
> > >
> > > I dunno if that's "precise" or just "randomly inconsistent" ;-)
> > >
> >
> > :)
> >
> > How does "targeted optimization" sound?
>
> (sorry I don't understand what the "targetted optimization" is..)
>

I guess its a bit misleading depending on the implementation chosen.  The
general thought is that we simply ignore typemod information in VALUES if
we have been told otherwise what that typemod will be (in this case an
insert into column will use the typemod of the existing column regardless
of the input data's typemod).


> FWIW, different from the UNION case, I don't see a reason that
> every row in a VALUES clause shares anything common with any
> other rows. Of course typmod and even type are not to be
> shared. (Type is shared, though.)
>

You have a typo here somewhere..."type not to be shared. (Type is shared,
though)" doesn't work.

>
> On the other hand, if we make all values to be strictly typed (I
> mean that every value brings its own type information along
> with), values also can consider strict type. But currently the
> following command is ignoring the type of the first value.
>
> =# select 'bar'::varchar(4) || '';
>  ?column?
> --
>  bar
>
>
Its not ignored - is discarded during coercion to make the || operator
work on the same type (text).  

SELECT '12345'::varchar(3) || '678'


> Even though I'm not sure about SQL standard here but my
> feeling is something like the following.
>
> | FROM (
> |   VALUES (row 1), .. (row n))
> | AS foo (colname *type*, ..)
>
> for this case,
>
> | create temporary table product_codes  as select *
> | from (
> |   values
> |   ('abcdefg'),
> |   ('012345678901234567ABCDEFGHIJKLMN')
> | ) csv_data (product_code character varying(20));
>
> Myself have gotten errors for this false syntax several times:(
>

Only the function in from form of this accepts a column definition in lieu
of a simple alias.  Regardless of the merits of this idea it would not be
backpatch-able and wouldn't resolve the current valid syntax problem.

I suppose my option #4 has the same problem...


> > Unnecessary maybe, but wouldn't it be immaterial given we are only able
> to
> > be efficient when inserting exactly one row.
> >
> > There is also a #4 here to consider - if the first (or any) row is not
> type
> > unknown, and the remaining rows are all unknown, use the type and typemod
> > of the known row AND attempt coerce all of the unknowns to that same
> type.
> > I'd suggest this is probably the most user-friendly option (do as I mean,
> > not as I say).  The OP query would then fail since the second literal is
> > too long to fit in a varchar(20) - I would not want the value truncated
> so
> > an actual cast wouldn't work.
>
> 1 has a type of int,


true

1.0 has a type of float


false - its numeric
select pg_typeof(1.0) -> numeric


and '1' has a typ
> e 
> of text.


false - its unknown (but implicitly cast-able)
select pgtype_of('1') -> error, pg_typeof(unknown) does not exist

So I don't see a situation where only the first row is
> detectably typed. Or is it means that only the first row is
> explicitly typed?


I do indeed mean explicitly typed here.  Using 1 or 1.0 in a values would
be a form of explicit typing in this sense.

David J.

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Craig Ringer

On 6 December 2016 at 04:52, David G. Johnston
 wrote:

> Can we be precise enough to perform #2 if the top-level (or immediate
> parent) command is an INSERT - the existing table is going to enforce its
> own typemod anyway, otherwise go with #1?

We already routinely throw away typmod information in other places,
and can only truly rely on it when working directly with data in
tables. So it sounds sensible to me.

I see semi-regular complaints about this from JDBC users, who want to
be able to know things like "number of digits in this numeric" and
"can this column be null" even through various projections and
transformations of results. But I don't think your suggested change
will make the existing situation any worse.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" 
 wrote in 

> On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane  wrote:
> 
> > "David G. Johnston"  writes:
> > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:
> > >> In order to fix this, we first have to decide what the semantics ought
> > >> to be.  I think there are two plausible definitions:
> > >> 1. If all the expressions in the VALUES column share the same typmod,
> > >> use that typmod, else use -1.
> > >> 2. Use -1 whenever there is more than one VALUES row.
> >
> > > Can we be precise enough to perform #2 if the top-level (or immediate
> > > parent) command is an INSERT - the existing table is going to enforce its
> > > own typemod anyway, otherwise go with #1?
> >
> > I dunno if that's "precise" or just "randomly inconsistent" ;-)
> >
> 
> :)
> 
> How does "targeted optimization" sound?

(sorry I don't understand what the "targetted optimization" is..)

FWIW, different from the UNION case, I don't see a reason that
every row in a VALUES clause shares anything common with any
other rows. Of course typmod and even type are not to be
shared. (Type is shared, though.)

On the other hand, if we make all values to be strictly typed (I
mean that every value brings its own type information along
with), values also can consider strict type. But currently the
following command is ignoring the type of the first value.

=# select 'bar'::varchar(4) || '';
 ?column? 
--
 bar


> > > Lacking that possibility I'd say that documenting that our treatment of
> > > typemod in VALUES is similar to our treatment of typemod in function
> > > arguments would be acceptable. This suggests a #3 - simply use "-1"
> > > regardless of the number of rows in the VALUES expression.
> >
> > I'm a bit concerned about whether that would introduce overhead that we
> > avoid today, in particular for something like
> >
> > insert into foo (varchar20col) values ('bar'::varchar(20));
> >
> > I think if we throw away the knowledge that the VALUES row produces the
> > right typmod already, we'd end up adding an unnecessary runtime coercion
> > step.

Is it means that something like this?

 insert into foo (varchar20col)
select a::varchar(20) from (values ('bar')) as a;


Even though I'm not sure about SQL standard here but my
feeling is something like the following.

| FROM (
|   VALUES (row 1), .. (row n))
| AS foo (colname *type*, ..)

for this case, 

| create temporary table product_codes  as select *
| from (
|   values
|   ('abcdefg'),
|   ('012345678901234567ABCDEFGHIJKLMN')
| ) csv_data (product_code character varying(20));

Myself have gotten errors for this false syntax several times:(

> Unnecessary maybe, but wouldn't it be immaterial given we are only able to
> be efficient when inserting exactly one row.
> 
> There is also a #4 here to consider - if the first (or any) row is not type
> unknown, and the remaining rows are all unknown, use the type and typemod
> of the known row AND attempt coerce all of the unknowns to that same type.
> I'd suggest this is probably the most user-friendly option (do as I mean,
> not as I say).  The OP query would then fail since the second literal is
> too long to fit in a varchar(20) - I would not want the value truncated so
> an actual cast wouldn't work.

1 has a type of int, 1.0 has a type of float and '1' has a type
of text. So I don't see a situation where only the first row is
detectably typed. Or is it means that only the first row is
explicitly typed? I agree that it would be an option but I prefer
the above syntax (#5?) instead for the same purpose.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston

On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:
> >> In order to fix this, we first have to decide what the semantics ought
> >> to be.  I think there are two plausible definitions:
> >> 1. If all the expressions in the VALUES column share the same typmod,
> >> use that typmod, else use -1.
> >> 2. Use -1 whenever there is more than one VALUES row.
>
> > Can we be precise enough to perform #2 if the top-level (or immediate
> > parent) command is an INSERT - the existing table is going to enforce its
> > own typemod anyway, otherwise go with #1?
>
> I dunno if that's "precise" or just "randomly inconsistent" ;-)
>

:)

How does "targeted optimization" sound?

> > Lacking that possibility I'd say that documenting that our treatment of
> > typemod in VALUES is similar to our treatment of typemod in function
> > arguments would be acceptable. This suggests a #3 - simply use "-1"
> > regardless of the number of rows in the VALUES expression.
>
> I'm a bit concerned about whether that would introduce overhead that we
> avoid today, in particular for something like
>
> insert into foo (varchar20col) values ('bar'::varchar(20));
>
> I think if we throw away the knowledge that the VALUES row produces the
> right typmod already, we'd end up adding an unnecessary runtime coercion
> step.
>

Unnecessary maybe, but wouldn't it be immaterial given we are only able to
be efficient when inserting exactly one row.

There is also a #4 here to consider - if the first (or any) row is not type
unknown, and the remaining rows are all unknown, use the type and typemod
of the known row AND attempt coerce all of the unknowns to that same type.
I'd suggest this is probably the most user-friendly option (do as I mean,
not as I say).  The OP query would then fail since the second literal is
too long to fit in a varchar(20) - I would not want the value truncated so
an actual cast wouldn't work.

David J.

David J.

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Tom Lane

"David G. Johnston"  writes:
> On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:
>> In order to fix this, we first have to decide what the semantics ought
>> to be.  I think there are two plausible definitions:
>> 1. If all the expressions in the VALUES column share the same typmod,
>> use that typmod, else use -1.
>> 2. Use -1 whenever there is more than one VALUES row.

> Can we be precise enough to perform #2 if the top-level (or immediate
> parent) command is an INSERT - the existing table is going to enforce its
> own typemod anyway, otherwise go with #1?

I dunno if that's "precise" or just "randomly inconsistent" ;-)

> Lacking that possibility I'd say that documenting that our treatment of
> typemod in VALUES is similar to our treatment of typemod in function
> arguments would be acceptable. This suggests a #3 - simply use "-1"
> regardless of the number of rows in the VALUES expression.

I'm a bit concerned about whether that would introduce overhead that we
avoid today, in particular for something like

insert into foo (varchar20col) values ('bar'::varchar(20));

I think if we throw away the knowledge that the VALUES row produces the
right typmod already, we'd end up adding an unnecessary runtime coercion
step.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston

On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:

> I looked into the issue reported in bug #14448,
> https://www.postgresql.org/message-id/20161205143037.
> 4377.60754%40wrigleys.postgresql.org
>
> The core of it seems to be that expandRTE() will report the type and
> typmod of a column of a VALUES construct as being exprType() and
> exprTypmod() of the corresponding expression in the first row of
> the VALUES.  It's okay to handle data type that way, because we've
> coerced all the expressions for the column to the same type; but
> we have *not* coerced them to the same typmod.  So some of the values
> from later rows may fail to meet the claimed typmod.  This is not good.
>
> In order to fix this, we first have to decide what the semantics ought
> to be.  I think there are two plausible definitions:
>
> 1. If all the expressions in the VALUES column share the same typmod,
> use that typmod, else use -1.
>
> 2. Use -1 whenever there is more than one VALUES row.
>
> #1 is what we do for some comparable cases such as UNION and CASE.
> However, it's potentially quite expensive for large VALUES constructs.
> #2 would be a lot cheaper, and given that this is the first complaint
> we've gotten in all the years we've had multi-row-VALUES support, it's
> not clear that deriving a precise typmod is really all that useful
> for VALUES.
>
> I have no strong preference between these two, but I think whatever
> we do needs to be back-patched.  The behavior described in the bug
> report is definitely broken.
>
> Thoughts?
>

Can we be precise enough to perform #2 if the top-level (or immediate
parent) command is an INSERT - the existing table is going to enforce its
own typemod anyway, otherwise go with #1?

Lacking that possibility I'd say that documenting that our treatment of
typemod in VALUES is similar to our treatment of typemod in function
arguments would be acceptable. This suggests a #3 - simply use "-1"
regardless of the number of rows in the VALUES expression.

David J.

[HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Tom Lane

I looked into the issue reported in bug #14448,
https://www.postgresql.org/message-id/20161205143037.4377.60754%40wrigleys.postgresql.org

The core of it seems to be that expandRTE() will report the type and
typmod of a column of a VALUES construct as being exprType() and
exprTypmod() of the corresponding expression in the first row of
the VALUES.  It's okay to handle data type that way, because we've
coerced all the expressions for the column to the same type; but
we have *not* coerced them to the same typmod.  So some of the values
from later rows may fail to meet the claimed typmod.  This is not good.

In order to fix this, we first have to decide what the semantics ought
to be.  I think there are two plausible definitions:

1. If all the expressions in the VALUES column share the same typmod,
use that typmod, else use -1.

2. Use -1 whenever there is more than one VALUES row.

#1 is what we do for some comparable cases such as UNION and CASE.
However, it's potentially quite expensive for large VALUES constructs.
#2 would be a lot cheaper, and given that this is the first complaint
we've gotten in all the years we've had multi-row-VALUES support, it's
not clear that deriving a precise typmod is really all that useful
for VALUES.

I have no strong preference between these two, but I think whatever
we do needs to be back-patched.  The behavior described in the bug
report is definitely broken.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

[HACKERS] Typmod associated with multi-row VALUES constructs

22 matches

Site Navigation

Mail list logo

Footer information