Re: [HACKERS] Cache lookup error when using jsonb, json_build_object and a WITH clause

2014-05-09 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 I found the following error when playing with jsonb and json_build_object:
 =# with jsonb_data as (select * from jsonb_each('{aa :
 po}'::jsonb)) select json_build_object(key,value) from jsonb_data;
 ERROR:  XX000: cache lookup failed for type 2147483650
 LOCATION:  lookup_type_cache, typcache.c:193

The immediate problem seems to be that add_json() did not get taught
that jsonb is of TYPCATEGORY_JSON; somebody missed updating that copy
of logic that's been copied and pasted several times too many, IMNSHO.

However, now that I look at this code, it seems like it's got more
problems than that:

* it will be fooled utterly by domains over types it's interested in.

* there is nothing stopping somebody from making user-defined types
with category 'j' or 'c', which will confuse it even more.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache lookup error when using jsonb, json_build_object and a WITH clause

2014-05-09 Thread Andres Freund
Hi,

On 2014-05-09 21:40:07 +0900, Michael Paquier wrote:
 Hi all,
 
 I found the following error when playing with jsonb and json_build_object:
 =# with jsonb_data as (select * from jsonb_each('{aa :
 po}'::jsonb)) select json_build_object(key,value) from jsonb_data;
 ERROR:  XX000: cache lookup failed for type 2147483650
 LOCATION:  lookup_type_cache, typcache.c:193
 
 I would have expected the result to be the same as in the case of json:
 =# with json_data as (select * from json_each('{aa : po}'::json))
 select json_build_object(key,value) from json_data;
  json_build_object
 ---
  {aa : po}
 (1 row)

Whoa. There's two wierd things here:
a) jsonb has a typcategory 'C'. Marking a composite type. json has
   'U'.

b) datum_to_json() thinks it's a good idea to use typcategory to decide
   how a type is output. Isn't that pertty fundamentally flawed? To
   detect composite types it really should look at typtype, now?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache lookup error when using jsonb, json_build_object and a WITH clause

2014-05-09 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Whoa. There's two wierd things here:
 a) jsonb has a typcategory 'C'. Marking a composite type. json has
'U'.

Yeah, that's flat out wrong.  I changed it before seeing your message.

 b) datum_to_json() thinks it's a good idea to use typcategory to decide
how a type is output. Isn't that pertty fundamentally flawed?

Indeed.  I think the bit that uses TYPCATEGORY_NUMERIC as a hint to decide
whether the value can be left unquoted (assuming it looks like a number)
might be all right, but the rest of this seems pretty bogus.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache lookup error when using jsonb, json_build_object and a WITH clause

2014-05-09 Thread Tom Lane
I wrote:
 Andres Freund and...@2ndquadrant.com writes:
 b) datum_to_json() thinks it's a good idea to use typcategory to decide
 how a type is output. Isn't that pertty fundamentally flawed?

 Indeed.  I think the bit that uses TYPCATEGORY_NUMERIC as a hint to decide
 whether the value can be left unquoted (assuming it looks like a number)
 might be all right, but the rest of this seems pretty bogus.

Actually, that would be a security hole if it weren't that CREATE TYPE for
new base types is superuser-only.  Otherwise a user-defined type could
fool this logic with a malicious choice of typcategory.  jsonb itself was
darn close to being a malicious choice of typcategory --- it's entirely
accidental that Michael's example didn't lead to a crash or even more
interesting stuff, since the code was trying to process a jsonb as though
it were a regular composite type.  Other choices of typcategory could have
sent the code into the array path for something that's not an array, or
have allowed escaping to be bypassed for something that's not json, etc.

In short, there are defined ways to decide if a type is array or
composite, and this ain't how.

After further reflection I think we should lose the TYPCATEGORY_NUMERIC
business too.  ruleutils.c hard-wires the set of types it will consider
to be numeric, and I see no very good reason not to do likewise here.
That will remove the need to look up the typcategory at all.

So we need to:

1. Refactor so there's only one copy of the control logic.

2. Smash domains to their base types.

3. Identify boolean, numeric, and json types by direct tests of type OID.

4. Identify array and composite types using standard methods.

Anybody see other problems to fix here?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache lookup error when using jsonb, json_build_object and a WITH clause

2014-05-09 Thread Andrew Dunstan


On 05/09/2014 10:07 AM, Tom Lane wrote:

After further reflection I think we should lose the TYPCATEGORY_NUMERIC
business too.  ruleutils.c hard-wires the set of types it will consider
to be numeric, and I see no very good reason not to do likewise here.
That will remove the need to look up the typcategory at all.

So we need to:

1. Refactor so there's only one copy of the control logic.

2. Smash domains to their base types.

3. Identify boolean, numeric, and json types by direct tests of type OID.

4. Identify array and composite types using standard methods.

Anybody see other problems to fix here?



I guess this is my fault. I recall some discussions when some of this 
was first being written about the best way to make the type based 
decisions, not sure at this remove whether on list or off. The origin of 
it is in 9.2, so if you're going to adjust it you should probably go 
back that far.


I was aware of the domain problem, but in 2 years or so nobody has 
complained about it, so I guess nobody is defining domains over json.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache lookup error when using jsonb, json_build_object and a WITH clause

2014-05-09 Thread Andres Freund
On 2014-05-09 10:07:10 -0400, Tom Lane wrote:
 I wrote:
  Andres Freund and...@2ndquadrant.com writes:
  b) datum_to_json() thinks it's a good idea to use typcategory to decide
  how a type is output. Isn't that pertty fundamentally flawed?
 
  Indeed.  I think the bit that uses TYPCATEGORY_NUMERIC as a hint to decide
  whether the value can be left unquoted (assuming it looks like a number)
  might be all right, but the rest of this seems pretty bogus.
 
 Actually, that would be a security hole if it weren't that CREATE TYPE for
 new base types is superuser-only.

Yea. I actual wonder why CREATE TYPE seems to allow createing
TYPCATEGORY_COMPOSITE types - there really doesn't seem to be a usecase.

 After further reflection I think we should lose the TYPCATEGORY_NUMERIC
 business too.  ruleutils.c hard-wires the set of types it will consider
 to be numeric, and I see no very good reason not to do likewise here.
 That will remove the need to look up the typcategory at all.

Maybe we should expose that list or the functionality in a neater way?
It's already been copied to test_decoding...

 So we need to:
 
 1. Refactor so there's only one copy of the control logic.
 
 2. Smash domains to their base types.
 
 3. Identify boolean, numeric, and json types by direct tests of type OID.
 
 4. Identify array and composite types using standard methods.
 
 Anybody see other problems to fix here?

Yea.
5)
 if (val_type  FirstNormalObjectId)
isn't fundamentally incorrect but imo shouldn't be replaced by something
like !IsCatalogType() (akin to IsCatalogRelation). At least if we decide
that hunk is safe from other POVs - I am not actually 100% sure yet.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache lookup error when using jsonb, json_build_object and a WITH clause

2014-05-09 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I guess this is my fault. I recall some discussions when some of this 
 was first being written about the best way to make the type based 
 decisions, not sure at this remove whether on list or off. The origin of 
 it is in 9.2, so if you're going to adjust it you should probably go 
 back that far.

Right, will back-patch.

 I was aware of the domain problem, but in 2 years or so nobody has 
 complained about it, so I guess nobody is defining domains over json.

Actually, I was more concerned about domains over other types.
For instance

regression=# create domain dd as int;
CREATE DOMAIN
regression=# select json_build_object('foo', 43);  
 json_build_object 
---
 {foo : 43}
(1 row)

regression=# select json_build_object('foo', 43::dd);
 json_build_object 
---
 {foo : 43}
(1 row)

With the attached patch, you get 43 without any quotes, which seems
right to me.  However, given the lack of complaints, it might be better
to not make this behavioral change in the back branches.  I'll omit
the getBaseType() call from the back-patches.

Draft HEAD patch attached.

regards, tom lane

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 22ef402..a7364f3 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*** typedef enum	/* contexts of JSON par
*** 48,53 
--- 48,65 
  	JSON_PARSE_END/* saw the end of a document, expect nothing */
  } JsonParseContext;
  
+ typedef enum	/* type categories for datum_to_json */
+ {
+ 	JSONTYPE_NULL,/* null, so we didn't bother to identify */
+ 	JSONTYPE_BOOL,/* boolean (built-in types only) */
+ 	JSONTYPE_NUMERIC,			/* numeric (ditto) */
+ 	JSONTYPE_JSON,/* JSON itself (and JSONB) */
+ 	JSONTYPE_ARRAY,/* array */
+ 	JSONTYPE_COMPOSITE,			/* composite */
+ 	JSONTYPE_CAST,/* something with an explicit cast to JSON */
+ 	JSONTYPE_OTHER/* all else */
+ } JsonTypeCategory;
+ 
  static inline void json_lex(JsonLexContext *lex);
  static inline void json_lex_string(JsonLexContext *lex);
  static inline void json_lex_number(JsonLexContext *lex, char *s, bool *num_err);
*** static void composite_to_json(Datum comp
*** 64,75 
    bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
    Datum *vals, bool *nulls, int *valcount,
!   TYPCATEGORY tcategory, Oid typoutputfunc,
    bool use_line_feeds);
  static void array_to_json_internal(Datum array, StringInfo result,
  	   bool use_line_feeds);
  static void datum_to_json(Datum val, bool is_null, StringInfo result,
! 			  TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar);
  static void add_json(Datum val, bool is_null, StringInfo result,
  		 Oid val_type, bool key_scalar);
  
--- 76,91 
    bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
    Datum *vals, bool *nulls, int *valcount,
!   JsonTypeCategory tcategory, Oid outfuncoid,
    bool use_line_feeds);
  static void array_to_json_internal(Datum array, StringInfo result,
  	   bool use_line_feeds);
+ static void json_categorize_type(Oid typoid,
+ 	 JsonTypeCategory *tcategory,
+ 	 Oid *outfuncoid);
  static void datum_to_json(Datum val, bool is_null, StringInfo result,
! 			  JsonTypeCategory tcategory, Oid outfuncoid,
! 			  bool key_scalar);
  static void add_json(Datum val, bool is_null, StringInfo result,
  		 Oid val_type, bool key_scalar);
  
*** lex_expect(JsonParseContext ctx, JsonLex
*** 143,156 
  		report_parse_error(ctx, lex);;
  }
  
- /*
-  * All the defined	type categories are upper case , so use lower case here
-  * so we avoid any possible clash.
-  */
- /* fake type category for JSON so we can distinguish it in datum_to_json */
- #define TYPCATEGORY_JSON 'j'
- /* fake category for types that have a cast to json */
- #define TYPCATEGORY_JSON_CAST 'c'
  /* chars to consider as part of an alphanumeric token */
  #define JSON_ALPHANUMERIC_CHAR(c)  \
  	(((c) = 'a'  (c) = 'z') || \
--- 159,164 
*** extract_mb_char(char *s)
*** 1219,1232 
  }
  
  /*
!  * Turn a scalar Datum into JSON, appending the string to result.
   *
!  * Hand off a non-scalar datum to composite_to_json or array_to_json_internal
!  * as appropriate.
   */
  static void
  datum_to_json(Datum val, bool is_null, StringInfo result,
! 			  TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar)
  {
  	char	   *outputstr;
  	text	   *jsontext;
--- 1227,1321 
  }
  
  /*
!  * Determine how we want to print values of a given type in datum_to_json.
   *
!  * Given the datatype OID, return its JsonTypeCategory, as well as the type's
!  * output function OID.  If the returned category is JSONTYPE_CAST, we
!  * return the OID of the type-JSON cast function instead.
!  */
! static void
! 

Re: [HACKERS] Cache lookup error when using jsonb, json_build_object and a WITH clause

2014-05-09 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Anybody see other problems to fix here?

 Yea.
 5)
  if (val_type  FirstNormalObjectId)
 isn't fundamentally incorrect but imo shouldn't be replaced by something
 like !IsCatalogType() (akin to IsCatalogRelation). At least if we decide
 that hunk is safe from other POVs - I am not actually 100% sure yet.

I didn't particularly like that either.  The test is off-by-one, for
one thing (a type created right at OID wraparound could have
FirstNormalObjectId).  However, it seems reasonable to avoid fruitless
syscache searches for builtin types, and I'm not seeing a lot of point
to wrapping this test in some macro.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers