Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

Bernd Helmle Mon, 04 Jul 2011 04:11:05 -0700


--On 18. Juni 2011 12:29:38 +0200 Bernd Helmle <maili...@oopsware.de> wrote:

Similar problems occur with a couple other modules I tried (hstore,
intarray).


Hmm, works for me. Seems you have messed up your installation in some way
(build against current -HEAD but running against a 9.1?).

I'm going to review in the next couple of days again.


A bit later than expected, but here is my summary on the patch:

-- Patch Status

The patch applies cleanly. Since it's primarily targeted as an addition tocontrib/, it doesn't touch the backend source tree (besides documentation TOCentries), but adds a new subdirectory in contrib/json. The patch is in contextdiff as requested.


-- Functionality

The patch as is provides a basic implementation for a JSON datatype. Itincludes normalization and validation of JSON strings. It adds the datatypeimplementation as an extension.

The implementation focus on the datatype functionality only, there is noadditional support for special operators. Thus, i think the comments in thecontrol file (and docs) promises actually more than the contrib module willdeliver:


+comment = 'data type for storing and manipulating JSON content'

I'm not sure, if "manipulating" is a correct description. Maybe i missed it,but i didn't see functions to manipulate JSON strings directly, for example,adding an object to a JSON string, ...


-- Coding

The JSON datatype is implemented as a variable length character type, thusallows very large JSON strings to be stored. It transparently decides when toescape unicode code points depending on the selected client- andserver-encoding.

Validation is done through its own JSON parser. The parser itself is arecursive descent parser. It is noticable that the parser seems to define itsown is_space() and is_digit() functions, e.g.:


+#define is_space(c) ((c) == '\t' || (c) == '\n' || (c) == '\r' || (c) == ' ')

Wouldn't it be more clean to use isspace() instead?

This code block in function json_escape_unicode() looks suspicious:

+   /* Convert to UTF-8, if necessary. */
+   {
+       const char *orig = json;
+       json = (const char *)
+           pg_do_encoding_conversion((unsigned char *) json, length,
+                                     GetDatabaseEncoding(), PG_UTF8);
+       if (json != orig)
+           length = strlen(json);
+   }

Seems it *always* wants to convert the string. Isn't there a "if" conditionmissing?

There's a commented code fragment missing, which should be removed (see lasttwo lines of the snippet below):


+static unsigned int
+read_hex16(const char *in)
+{
+       unsigned int i;
+       unsigned int tmp;
+       unsigned int ret = 0;
+
+       for (i = 0; i < 4; i++)
+       {
+               char c = *in++;
+
+               Assert(is_hex_digit(c));
+
+               if (c >= '0' && c <= '9')
+                       tmp = c - '0';
+               else if (c >= 'A' && c <= 'F')
+                       tmp = c - 'A' + 10;
+               else /* if (c >= 'a' && c <= 'f') */
+                       tmp = c - 'a' + 10;

json.c introduces new appendStringInfo* functions, e.g.
appendStringInfoEscape() and appendStringInfoUtf8(). Maybe it's better
to name them to something different, since it may occur someday that the backend
will introduce the same notion with a different meaning. They are static,

but why not naming them into something like jsonAppendStringInfoUtf8() orsimilar?


-- Regression Tests

The patch includes a fairly complete regression test suite, which covers much

of the formatting, validating and input functionality of the JSON datatype. Italso tests UNICODE sequences and input encoding with other server encoding thanUTF-8.



--
Thanks

        Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

Reply via email to