Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-02-02 Thread Robert Haas
On Sat, Jan 31, 2015 at 8:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I understand Andrew to be saying that if you take a 6-character string and convert it to a JSON string and then back to text, you will *usually* get back the same 6 characters you

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I understand Andrew to be saying that if you take a 6-character string and convert it to a JSON string and then back to text, you will *usually* get back the same 6 characters you started with ... unless the first character was \, the second u, and the

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-30 Thread Peter Geoghegan
On Thu, Jan 29, 2015 at 11:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: The point of JSONB is that we take a position on certain aspects like this. We're bridging a pointedly loosey goosey interchange format, JSON, with native PostgreSQL types. For example, we take a firm position on encoding.

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Robert Haas
On Wed, Jan 28, 2015 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: The cause of this bug is commit 0ad1a816320a2b539a51628e2a0b1e83ff096b1d, which I'm inclined to think we need to simply revert, not render even more squirrely. Yes, that commit looks broken. If you convert from text to JSON

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Noah Misch
On Wed, Jan 28, 2015 at 12:48:45PM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote: So at this point I propose that we reject \u when de-escaping JSON. I would have agreed on 2014-12-09, and this release is the last

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Bruce Momjian
On Wed, Jan 28, 2015 at 03:13:47PM -0600, Jim Nasby wrote: While I sympathize with Noah's sentiments, the only thing that makes sense to me is that a JSON text field is treated the same way as we treat text. Right now, that means NUL is not allowed, period. +1 -- Bruce Momjian

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Andrew Dunstan
On 01/29/2015 04:59 PM, Robert Haas wrote: On Thu, Jan 29, 2015 at 4:33 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/29/2015 12:10 PM, Robert Haas wrote: On Wed, Jan 28, 2015 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: The cause of this bug is commit

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Robert Haas
On Thu, Jan 29, 2015 at 4:33 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/29/2015 12:10 PM, Robert Haas wrote: On Wed, Jan 28, 2015 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: The cause of this bug is commit 0ad1a816320a2b539a51628e2a0b1e83ff096b1d, which I'm inclined to think we

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Andrew Dunstan
On 01/29/2015 12:10 PM, Robert Haas wrote: On Wed, Jan 28, 2015 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: The cause of this bug is commit 0ad1a816320a2b539a51628e2a0b1e83ff096b1d, which I'm inclined to think we need to simply revert, not render even more squirrely. Yes, that commit

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Andres Freund
On 2015-01-29 16:33:36 -0500, Andrew Dunstan wrote: On 01/29/2015 12:10 PM, Robert Haas wrote: On Wed, Jan 28, 2015 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: The cause of this bug is commit 0ad1a816320a2b539a51628e2a0b1e83ff096b1d, which I'm inclined to think we need to simply revert,

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 29, 2015 at 4:33 PM, Andrew Dunstan and...@dunslane.net wrote: I'm coming down more and more on the side of Tom's suggestion just to ban \u in jsonb. I have yet to understand what we fix by banning \u. How is different from

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I have yet to understand what we fix by banning \u. How is different from any other four-digit hexadecimal number that's not a valid character in the current encoding? What does banning that one particular value do? BTW, as to the point

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Tom Lane
Attached is a draft patch for this. It basically reverts commit 0ad1a816320a2b539a51628e2a0b1e83ff096b1d, adds a ban of \u if that would need to be converted to text (so it still works in the plain json type, so long as you don't do much processing), and adds some regression tests. I made

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Peter Geoghegan
On Thu, Jan 29, 2015 at 10:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: I made the \u error be errcode(ERRCODE_INVALID_TEXT_REPRESENTATION) and errmsg(invalid input syntax for type json), by analogy to what's thrown for non-ASCII Unicode escapes in non-UTF8 encoding. I'm not terribly happy

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes: I looked into it, and it turns out that MongoDB does not accept NUL in at least some contexts (for object keys). Apparently it wasn't always so. MongoDB previously had a security issue that was fixed by introducing this restriction. Their JSON-centric

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Andrew Dunstan
On 01/29/2015 05:39 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: I have yet to understand what we fix by banning \u. How is different from any other four-digit hexadecimal number that's not a valid character in the current encoding? What does banning that one

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-29 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes: On Thu, Jan 29, 2015 at 10:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: I made the \u error be errcode(ERRCODE_INVALID_TEXT_REPRESENTATION) and errmsg(invalid input syntax for type json), by analogy to what's thrown for non-ASCII Unicode escapes in

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Tom Lane
Noah Misch n...@leadboat.com writes: On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote: So at this point I propose that we reject \u when de-escaping JSON. I would have agreed on 2014-12-09, and this release is the last chance to make such a change. It is a bold wager that could

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Andrew Dunstan
On 01/28/2015 12:50 AM, Noah Misch wrote: On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 02:28 PM, Tom Lane wrote: Well, we can either fix it now or suffer with a broken representation forever. I'm not wedded to the exact

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: It's not clear to me how we should represent a unicode null. i.e. given a json of '[foo\ubar]', I get that we'd store the element as 'foo\x00bar', but what is the result of (jsonb '[foo\ubar')-0 It's defined to be text so we can't

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes: While I sympathize with Noah's sentiments, the only thing that makes sense to me is that a JSON text field is treated the same way as we treat text. Right now, that means NUL is not allowed, period. If no one has bitched about this with text, is it

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Jim Nasby
On 1/28/15 11:36 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: It's not clear to me how we should represent a unicode null. i.e. given a json of '[foo\ubar]', I get that we'd store the element as 'foo\x00bar', but what is the result of (jsonb '[foo\ubar')-0

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Jeff Janes
On Wed, Jan 28, 2015 at 1:13 PM, Jim Nasby jim.na...@bluetreble.com wrote: My $0.01: While I sympathize with Noah's sentiments, the only thing that makes sense to me is that a JSON text field is treated the same way as we treat text. Right now, that means NUL is not allowed, period. If no

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread Tom Lane
David G Johnston david.g.johns...@gmail.com writes: Am I missing something or has there been no consideration in this forbid plan on whether users will be able to retrieve, even if partially incorrectly, any jsonb data that has already been stored? Data that's already been stored would look

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-28 Thread David G Johnston
Tom Lane-2 wrote Andrew Dunstan lt; andrew@ gt; writes: On 01/27/2015 02:28 PM, Tom Lane wrote: Well, we can either fix it now or suffer with a broken representation forever. I'm not wedded to the exact solution I described, but I think we'll regret it if we don't change the

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 01:40 PM, Tom Lane wrote: In particular, I would like to suggest that the current representation of \u is fundamentally broken and that we have to change it, not try to band-aid around it. This will mean an on-disk incompatibility

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Andrew Dunstan
On 01/27/2015 02:28 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 01:40 PM, Tom Lane wrote: In particular, I would like to suggest that the current representation of \u is fundamentally broken and that we have to change it, not try to band-aid around it.

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 02:28 PM, Tom Lane wrote: Well, we can either fix it now or suffer with a broken representation forever. I'm not wedded to the exact solution I described, but I think we'll regret it if we don't change the representation. The only

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes: On Tue, Jan 27, 2015 at 12:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: In particular, I would like to suggest that the current representation of \u is fundamentally broken and that we have to change it, not try to band-aid around it. This will mean

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 12:23 PM, Tom Lane wrote: I think coding anything is premature until we decide how we're going to deal with the fundamental ambiguity. The input \\uabcd will be stored correctly as \uabcd, but this will in turn be rendered as \uabcd,

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Andrew Dunstan
On 01/27/2015 01:40 PM, Tom Lane wrote: In particular, I would like to suggest that the current representation of \u is fundamentally broken and that we have to change it, not try to band-aid around it. This will mean an on-disk incompatibility for jsonb data containing U+, but

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Merlin Moncure
On Tue, Jan 27, 2015 at 12:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 12:23 PM, Tom Lane wrote: I think coding anything is premature until we decide how we're going to deal with the fundamental ambiguity. The input \\uabcd will be

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Noah Misch
On Tue, Jan 27, 2015 at 03:56:22PM -0500, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 02:28 PM, Tom Lane wrote: Well, we can either fix it now or suffer with a broken representation forever. I'm not wedded to the exact solution I described, but I think we'll

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Andrew Dunstan
On 01/27/2015 12:24 AM, Noah Misch wrote: For the moment, maybe I could commit the fix for the non U+ case for escape_json, and we could think some more about detecting and warning about the problem strings. +1 for splitting development that way. Fixing the use of escape_json() is

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 12:24 AM, Noah Misch wrote: +1 for splitting development that way. Fixing the use of escape_json() is objective, but the choices around the warning are more subtle. OK, so something like this patch? I'm mildly concerned that you and I

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-27 Thread Andrew Dunstan
On 01/27/2015 12:23 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 01/27/2015 12:24 AM, Noah Misch wrote: +1 for splitting development that way. Fixing the use of escape_json() is objective, but the choices around the warning are more subtle. OK, so something like this

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-26 Thread Andrew Dunstan
On 01/23/2015 02:18 AM, Noah Misch wrote: On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote: The following case has just been brought to my attention (look at the differing number of backslashes): andrew=# select jsonb '\\u'; jsonb -- \u

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-26 Thread Noah Misch
On Mon, Jan 26, 2015 at 09:20:54AM -0500, Andrew Dunstan wrote: On 01/23/2015 02:18 AM, Noah Misch wrote: On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote: We could probably fix this fairly easily for non- U+ cases by having jsonb_to_cstring use a different escape_json

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-22 Thread Noah Misch
On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote: The following case has just been brought to my attention (look at the differing number of backslashes): andrew=# select jsonb '\\u'; jsonb -- \u (1 row) andrew=# select jsonb