jsonpath Time and Timestamp Special Cases

2024-04-29 Thread David E. Wheeler
Hello hackers,

I noticed that the jsonpath date/time functions (.time() and timestamp(), et 
al.) don’t support some valid but special-case PostgreSQL values, notably 
`infinity`, `-infinity`, and, for times, '24:00:00`:

❯ psql
psql (17devel)
Type "help" for help.

david=# select jsonb_path_query(to_jsonb('24:00:00'::time), '$.time()');
ERROR:  time format is not recognized: "24:00:00"

david=# select jsonb_path_query(to_jsonb('infinity'::timestamptz), 
'$.timestamp_tz()');
ERROR:  timestamp_tz format is not recognized: "infinity"

I assume this is because the standard doesn’t support these, or references 
JavaScript-only values or some such. Is that right?

Best,

David





Re: New committers: Melanie Plageman, Richard Guo

2024-04-26 Thread David E. Wheeler
On Apr 26, 2024, at 07:54, Jonathan S. Katz  wrote:

> The Core Team would like to extend our congratulations to Melanie Plageman 
> and Richard Guo, who have accepted invitations to become our newest 
> PostgreSQL committers.
> 
> Please join us in wishing them much success and few reverts!

Great news, congratulations!

Best,

David





Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 3:22 PM, Erik Wienhold  wrote:

> Thanks Peter!  But what is the definition of the entire path expression?
> Perhaps something like:
> 
> ::=  { "."  }
> 
> That would imply that "$.$foo" is a valid path that accesses a variable
> member (but I guess the path evaluation is also specified somewhere).

I read it as “if it starts with a dollar sign, it’s a variable and not a path 
identifier”, and I assume any `.foo` expression is a path identifier.

> What bugs me about this description, after reading it a couple of times,
> is that it's not clear what is meant by ."$varname".  It could mean two
> things: (1) the double-quoting masks $varname in order to not interpret
> those characters as a variable or (2) an interpolated string that
> resolves $varname and yields a dynamic member accessor.

My understanding is that if it’s in double quotes it’s never anything other 
than a string (whether a string literal or a path identifier string literal). 
IOW, variables don’t interpolate inside strings.

> Under case (2) I'd expect that query to return 456 (because $foo
> resolves to "bar").  (Similar to how psql would resolve :'foo' to
> 'bar'.)

Yes, I suspect this is the correct interpretation, but agree the wording could 
use some massaging, especially since path identifiers cannot start with a 
dollar sign anyway. Perhaps:

"If the key name starts with $ or does not meet the JavaScript rules for an 
identifier, it must be enclosed in double quotes to make it a string literal."

> Variables already work in array accessors and table 8.25 says that "The
> specified index can be an integer, as well as an expression returning a
> single numeric value [...]".  A variable is such an expression.
> 
>=> select jsonb_path_query('[2,3,5]', '$[$i]', '{"i":1}');
> jsonb_path_query
>--
> 3
>(1 row)
> 
> So I'd expect a similar behavior for member accessors as well when
> seeing ."$varname" in the same table.

Oh, interesting point! Now I wonder if the standard has this inconsistency (and 
is aware of it).

> Yes, I think so.  That would be case C in the spec excerpt provided by
> Peter.  So it's just a key name that happens to contain (but not start
> with) the dollar sign.

Exactly. It also matches the doc you quote above. Something would have to 
change in src/backend/utils/adt/jsonpath_scan.l to fix that, but that file 
makes my eyes water, so I’m not gonna take a stab at it. :-)

D





Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 05:46, Peter Eisentraut  wrote:

> I have committed this patch, and backpatched it, as a bug fix, because the 
> existing description was wrong.  To keep the patch minimal for backpatching, 
> I didn't do the conversion to a list.  I'm not sure I like that anyway, 
> because it tends to draw more attention to that part over the surrounding 
> parts, which didn't seem appropriate in this case.  But anyway, if you have 
> any more non-bug-fix editing in this area, which would then target PG18, 
> please send more patches.

Makes sense, that level of detail gets into the weeks so maybe doesn’t need to 
be quite so prominent as a list. Thank you!

David





Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 05:51, Peter Eisentraut  wrote:

>A  is classified as follows.
> 
>Case:
> 
>a) A  that is a  is acontext variable>.
> 
>b) A  that begins with  is a
>   .
> 
>c) Otherwise, a  is a .
> 
> Does this help?  I wasn't following all the discussion to see if there is 
> anything wrong with the implementation.

Yes, it does, as it ties the special meaning of the dollar sign to the 
*beginning* of an expression. So it makes sense that this would be an error:

david=# select '$.$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
   ^

But I’m less sure when a dollar sign is used in the *middle* (or end) of a json 
path identifier:

david=# select '$.xx$foo'::jsonpath;
ERROR:  syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.xx$foo'::jsonpath;
   ^

Perhaps that should be valid?

Best,

David





Re: RFC: Additional Directory for Extensions

2024-04-11 Thread David E. Wheeler
On Apr 4, 2024, at 1:20 PM, David E. Wheeler  wrote:

> I’ve added some docs based on your GUC description; updated patch attached.

Here’s a rebase.

I realize this probably isn’t going to happen for 17, given the freeze, but I 
would very much welcome feedback and pointers to address concerns about 
providing a second directory for extensions and DSOs. Quite a few people have 
talked about the need for this in the Extension Mini Summits[1], so I’m sure I 
could get some collaborators to make improvements or look at a different 
approach.

Best,

David

[1] 
https://justatheory.com/2024/02/extension-ecosystem-summit/#extension-ecosystem-mini-summit




v3-0001-Add-extension_directory-GUC.patch
Description: Binary data




Re: ❓ JSON Path Dot Precedence

2024-04-10 Thread David E. Wheeler
On Apr 10, 2024, at 10:29, Peter Eisentraut  wrote:

> So the whole thing is
> 
>   
> 
> The syntax of  and  is then punted to 
> ECMAScript 5.1.
> 
> 0x2 is a HexIntegerLiteral.  (There can be no dots in that.)
> 
> p10 is an Identifier.
> 
> So I think this is all correct.

That makes sense, thanks. It’s just a little odd to me that the resulting path 
isn’t a query at all. To Erik’s point: what path can `'0x2.p10` even select?

Best,

David





Re: ❓ JSON Path Dot Precedence

2024-04-07 Thread David E. Wheeler
On Apr 7, 2024, at 15:46, Erik Wienhold  wrote:

> I guess jsonpath assumes that hex, octal, and binary literals are
> integers.  So there's no ambiguity about any fractional part that might
> follow.

Yeah, that’s what the comment in the flex file says:

https://github.com/postgres/postgres/blob/b4a71cf/src/backend/utils/adt/jsonpath_scan.l#L102-L105


> Also, is there even a use case for path "0x2.p10"?  The path has to
> start with "$" or ("@" in case of a filter expression), doesn't it?  And
> it that case it doesn't parse:
> 
>test=# select '$.0x2.p10'::jsonpath;
>ERROR:  trailing junk after numeric literal at or near ".0x" of jsonpath 
> input
>LINE 1: select '$.0x2.p10'::jsonpath;
> 
> Even with extra whitespace:
> 
>test=# select '$ . 0x2 . p10'::jsonpath;
>ERROR:  syntax error at or near "0x2" of jsonpath input
>LINE 1: select '$ . 0x2 . p10'::jsonpath;
> 
> Or should it behave like an array accessor?  Similar to:
> 
>test=# select jsonb_path_query('[0,1,{"p10":42},3]', 
> '$[0x2].p10'::jsonpath);
> jsonb_path_query
>--
> 42
>(1 row)

I too am curious why these parse successfully, but don’t appear to be useful.

Best,

David





❓ JSON Path Dot Precedence

2024-04-07 Thread David E. Wheeler
Hello Hackers,

A question about the behavior of the JSON Path parser. The docs[1] have this to 
say about numbers:

>  Numeric literals in SQL/JSON path expressions follow JavaScript rules, which 
> are different from both SQL and JSON in some minor details. For example, 
> SQL/JSON path allows .1 and 1., which are invalid in JSON.

In other words, this is valid:

david=# select '2.'::jsonpath;
 jsonpath 
--
 2

But this feature creates a bit of a conflict with the use of a dot for path 
expressions. Consider `0x2.p10`. How should that be parsed? As an invalid 
decimal expression ("trailing junk after numeric literal”), or as a valid 
integer 2 followed by the path segment “p10”? Here’s the parser’s answer:

david=# select '0x2.p10'::jsonpath;
 jsonpath  
---
 (2)."p10"

So it would seem that, other things being equal, a path key expression (`.foo`) 
is slightly higher precedence than a decimal expression. Is that 
intentional/correct?

Discovered while writing my Go lexer and throwing all of Go’s floating point 
literal examples[2] at it and comparing to the Postgres path parser. Curiously, 
this is only an issue for 0x/0o/0b numeric expressions; a decimal expression 
does not behave in the same way:

david=# select '2.p10'::jsonpath;
ERROR:  trailing junk after numeric literal at or near "2.p" of jsonpath input
LINE 1: select '2.p10'::jsonpath;

Which maybe seems a bit inconsistent.

Thoughts on what the “correct” behavior should be?

Best,

David

  [1]: 
https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
  [2]: https://tip.golang.org/ref/spec#Floating-point_literals



Re: RFC: Additional Directory for Extensions

2024-04-04 Thread David E. Wheeler
On Apr 3, 2024, at 12:46 PM, Christoph Berg  wrote:

> Thanks for bringing this up, I should have submitted this years ago.
> (The patch is originally from September 2020.)

That’s okay, it’s still 2020 in some ways. 

> I designed the patch to require a superuser to set it, so it doesn't
> matter very much by which mechanism it gets updated. There should be
> little reason to vary it at run-time, so I'd be fine with requiring a
> restart, but otoh, why restrict the superuser from reloading it if
> they know what they are doing?

I think that’s fair. I’ll keep it requiring a restart now, on the theory it 
would be easier to loosen it later than have to tighten it later.

> I'm not sure the concept of "core libraries" exists. PG happens to
> dlopen things at run time, and it doesn't know/care if they were
> installed by users or by the original PG server. Also, an exploited
> libpgtypes library is not worse than any other untrusted "user"
> library, so you really don't want to allow users to provide their own
> .so files, no matter by what mechanism.

Yes, I guess my concern is whether it could be used to “shadow” core libraries. 
Maybe it’s no different, really.

>> This would also allow the lookup of extension libraries prefixed by the 
>> directory field from the control file, which would enable much tidier 
>> extension installation: The control file, SQL scripts, and DSOs could all be 
>> in a single directory for an extension.
> 
> Nice idea, but that would mean installing .so files into PGSHAREDIR.
> Perhaps the whole extension stuff would have to move to PKGLIBDIR
> instead.

Yes, I was just poking around the code, and realized that, when extension 
functions are created they may or may not not use `MODULE_PATHNAME`, but in any 
event, there is nothing different about loading an extension DSO than any other 
DSO. I was hoping to find a path where it knows it’s opening a DSO for the 
purpose of an extension, so we could limit the lookup there. But that does not 
(currently) exist.

Maybe we could add an `$extensiondir` variable to complement `$libdir`?

Or is PGKLIBDIR is the way to go? I’m not familiar with it. It looks like 
extension JIT files are put there already.

> Fwiw, I wrote this patch to solve the problem of testing extensions at
> build-time where the build process does not have write access to
> PGSHAREDIR. It solves that problem quite well, almost all PG
> extensions have build-time test coverage now (where there was
> basically 0 before).

Yeah, good additional use case.

On Apr 3, 2024, at 1:03 PM, Christoph Berg  wrote:

> Also, it's called extension "destdir" because it behaves like DESTDIR
> in Makefiles: It prepends the given path to the path that PG is trying
> to open when set. So it doesn't allow arbitrary new locations as of
> now, just /home/build/foo-1/debian/foo/usr/share/postgresql/17/extension
> in addition to /usr/share/postgresql/17/extension. (That is what the
> Debian package build process needs, so that restriction/design choice
> made sense.

Right, this makes perfect sense, in that you don’t have to copy all the 
extension files from the destdir to the SHAREDIR to test them, which I imagine 
could be a PITA.

> That's also included in the current GUC description:
> 
>   This directory is prepended to paths when loading extensions
>   (control and SQL files), and to the '$libdir' directive when
>   loading modules that back functions. The location is made
>   configurable to allow build-time testing of extensions that do not
>   have been installed to their proper location yet.
> 
> Perhaps I should have included a more verbose "NOT FOR PRODUCTION"
> there.

The use cases I described upthread are very much production use cases. Do you 
think it’s not for production just because we need to really think it through?

I’ve added some docs based on your GUC description; updated patch attached.

Best,

David



v2-0001-Add-extension_directory-GUC.patch
Description: Binary data


Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 8:54 AM, David E. Wheeler  wrote:

> Yes, I like the suggestion to make it require a restart, which lets the 
> sysadmin control it and not limited to whatever the person who compiled it 
> thought would make sense.

Here’s a revision of the Debian patch that requires a server start.

However, in studying the patch, it appears that the `extension_directory` is 
searched for *all* shared libraries, not just those being loaded for an 
extension. Am I reading the `expand_dynamic_library_name()` function right?

If so, this seems like a good way for a bad actor to muck with things, by 
putting an exploited libpgtypes library into the extension directory, where it 
would be loaded in preference to the core libpgtypes library, if they couldn’t 
exploit the original.

I’m thinking it would be better to have the dynamic library lookup for 
extension libraries (and LOAD libraries?) separate, so that the 
`extension_directory` would not be used for core libraries.

This would also allow the lookup of extension libraries prefixed by the 
directory field from the control file, which would enable much tidier extension 
installation: The control file, SQL scripts, and DSOs could all be in a single 
directory for an extension.

Thoughts?

Best,

David




v1-0001-Add-extension_directory-GUC.patch
Description: Binary data


Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 8:54 AM, David E. Wheeler  wrote:

> Yes, I like the suggestion to make it require a restart, which lets the 
> sysadmin control it and not limited to whatever the person who compiled it 
> thought would make sense.

Or SIGHUP?

D



Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 3:57 AM, walt...@technowledgy.de wrote:

> I can also imagine that it would be very helpful in a container setup to be 
> able to set an environment variable with this path instead of having to 
> recompile all of postgres to change it.

Yes, I like the suggestion to make it require a restart, which lets the 
sysadmin control it and not limited to whatever the person who compiled it 
thought would make sense.

Best,

David





RFC: Additional Directory for Extensions

2024-04-02 Thread David E. Wheeler
Hackers,

In the Security lessons from liblzma thread[1], walther broached the subject of 
an extension directory path[1]:

> Also a configurable directoy to look up extensions, possibly even to be 
> changed at run-time like [2]. The patch says this:
> 
>> This directory is prepended to paths when loading extensions (control and 
>> SQL files), and to the '$libdir' directive when loading modules that back 
>> functions. The location is made configurable to allow build-time testing of 
>> extensions that do not have been installed to their proper location yet.
> 
> This seems like a great thing to have. This might also be relevant in 
> light of recent discussions in the ecosystem around extension management.


That quotation comes from this Debian patch[2] maintained by Christoph Berg. 
I’d like to formally propose integrating this patch into the core. And not only 
because it’s overhead for package maintainers like Christoph, but because a 
number of use cases have emerged since we originally discussed something like 
this back in 2013[3]:

Docker Immutability
---

Docker images must be immutable. In order for users of a Docker image to 
install extensions that persist, they must create a persistent volume, map it 
to SHAREDIR/extensions, and copy over all the core extensions (or muck with 
symlink magic[4]). This makes upgrades trickier, because the core extensions 
are mixed in with third party extensions. 

By supporting a second directory pretended to the list of directories to 
search, as the Debian patch does, users of Docker images can keep extensions 
they install separate from core extensions, in a directory mounted to a 
persistent volume with none of the core extensions. Along with tweaking 
dynamic_library_path to support additional directories for shared object 
libraries, which can also be mounted to a separate path, we can have a 
persistent and clean separation of immutable core extensions and extensions 
installed at runtime.

Postgres.app


The Postgres.app project also supports installing extensions. However, because 
they must go into the SHAREDIR/extensions, once a user installs one the package 
has been modified and the Apple bundle signature will be broken. The OS will no 
longer be able to validate that the app is legit.

If the core supported an additional extension (and PKGLIBDIR), it would allow 
an immutable PostgreSQL base package and still allow extensions to be installed 
into directories outside the app bundle, and thus preserve bundle signing on 
macOS (and presumably other systems --- is this the nix issue, too?)

RFC
---

I know there was some objection to changes like this in the past, but the 
support I’m seeing in the liblzma thread for making pkglibdir configurable  me 
optimistic that this might be the right time to support additional 
configuration for the extension directory, as well, starting with the Debian 
patch, perhaps.

Thoughts?

I would be happy to submit a clean version of the Debian patch[2].

Best,

David

[1] 
https://www.postgresql.org/message-id/99c41b46-616e-49d0-9ffd-a29432cec818%40technowledgy.de
[2] 
https://salsa.debian.org/postgresql/postgresql/-/blob/17/debian/patches/extension_destdir?ref_type=heads
[3] https://www.postgresql.org/message-id/flat/51ae0845.8010...@ocharles.org.uk
[4] https://speakerdeck.com/ongres/postgres-extensions-in-kubernetes?slide=14





Re: Security lessons from liblzma

2024-04-01 Thread David E. Wheeler
On Apr 1, 2024, at 06:55, walt...@technowledgy.de wrote:

> Also a configurable directoy to look up extensions, possibly even to be 
> changed at run-time like [2]. The patch says this:
> 
>> This directory is prepended to paths when loading extensions (control and 
>> SQL files), and to the '$libdir' directive when loading modules that back 
>> functions. The location is made configurable to allow build-time testing of 
>> extensions that do not have been installed to their proper location yet.
> 
> This seems like a great thing to have. This might also be relevant in light 
> of recent discussions in the ecosystem around extension management.
> 
> All the path-related issues have in common, that while it's easy to move 
> files around to their proper locations later, they all need to adjust 
> pg_config's output.

Funny timing, I was planning to resurrect this old patch[1] and propose that 
patch this week. One of motivators is the increasing use of Docker images in 
Kubernetes to run Postgres, where there’s a desire to keep the core service and 
extensions immutable, and to have a second directory mounted to a persistent 
volume into which other extensions can be installed and preserved independently 
of the Docker image.

The current approach involves symlinking shenanigans[2] that complicate things 
pretty substantially, making it more difficult to administer. A second 
directory fit for purpose would be far better.

There are some other motivators, so I’ll do some additional diligence and start 
a separate thread (or reply to the original[3]).

Best,

David

[1] https://commitfest.postgresql.org/5/170/
[2] https://speakerdeck.com/ongres/postgres-extensions-in-kubernetes?slide=14
[3] 
https://www.postgresql.org/message-id/flat/51AE0845.8010600%40ocharles.org.uk





Re: Patch: Add parse_type Function

2024-03-20 Thread David E. Wheeler
On Mar 20, 2024, at 17:23, Tom Lane  wrote:

> Pushed with some editorialization.  Mostly, I whacked the
> documentation around pretty heavily: we have a convention for what
> examples in function descriptions should look like, and this wasn't
> it.  Not entirely your fault, since some nearby entries in that
> table hadn't gotten the word either.

Ah, great, and your wording on the parser error issue is much better, thank you!

Best,

David





Re: Q: Escapes in jsonpath Idents

2024-03-19 Thread David E. Wheeler
On Mar 17, 2024, at 20:09, Erik Wienhold  wrote:
> 
> On 2024-03-17 20:50 +0100, David E. Wheeler wrote:
>> On Mar 17, 2024, at 15:12, Erik Wienhold  wrote:
>>> So I think it makes sense to reword the entire backslash part of the
>>> paragraph and remove references to JSON entirely.  The attached patch
>>> does that and also formats the backslash escapes as a bulleted list for
>>> readability.
>> 
>> Ah, it’s JavaScript format, not JSON! This does clarify things quite
>> nicely, thank you. Happy to add my review once it’s in a commit fest.
> 
> Thanks.  https://commitfest.postgresql.org/48/4899/

Applies cleanly, `make -C doc/src/sgml check` runs without error. Doc 
improvement welcome and much clearer than before.

> I had the same reasoning while writing my first reply but scrapped that
> part because I found it obvious:  That jsonpath is parsed before calling
> jsonb_path_exists() and therefore the parser has no context about any
> variables, which might not even be hardcoded but may result from a
> query.

Right, there’s a chicken/egg problem.

> Unfortunately, I don't have access to that part of the SQL spec.  So I
> don't know how the jsonpath grammar is specified.

Seems quite logical; I think it should be documented, but I’d also be 
interested to know what the 2016 and 2023 standards say, exactly.

> Also checked git log src/backend/utils/adt/jsonpath_scan.l for some
> insights but haven't found any yet.

Everybody’s taking shortcuts relative to the standard, AFAICT. For example, 
jsonpath_scan.l matches unqouted identifiers with these two regular expressions:

{other}+
\/\*
\\.

Plus the backslash escapes. {other} is defined as:

/* "other" means anything that's not special, blank, or '\' or '"' */
other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f]

Which is wy more liberal than the ECMA standard[1], by my reading, but the 
MSDN[2] description is quite succinct (thanks for the links!):

> In JavaScript, identifiers are commonly made of alphanumeric characters, 
> underscores (_), and dollar signs ($). Identifiers are not allowed to start 
> with numbers. However, JavaScript identifiers are not only limited to ASCII — 
> many Unicode code points are allowed as well. Namely, any character in the 
> ID_Start category can start an identifier, while any character in the 
> ID_Continue category can appear after the first character.


ID_Start[3] and ID_Continue[4] point to the unicode standard codes lister, 
nether of which reference Emoji. Sure enough, in Safari:

> x = {"": true}
< {: true}
> x.
< SyntaxError: Invalid character '\ud83c’

But in Postgres jsonpath:

david=# select '$.'::jsonpath;
 jsonpath 
--
 $.""

If the MSDN references to ID_Start and ID_Continue are correct, then the 
Postgres path parser is being overly-liberal. Maybe that’s totally fine? Not 
sure what should be documented and what’s not worth it.

Aside: I’m only digging into these details because I’m busy porting the path 
parser, so trying to figure out where to be compatible and where not to. So far 
I’m rejecting '$' (but allowing '\$' and '\u0024') but taking advantage of the 
unicode support in Go to specifically validate against ID_Start and ID_Continue.

Best,

David

[1] https://262.ecma-international.org/#sec-identifier-names
[2] 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers
[3] 
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Start%7D
[4] 
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Continue%7D











Re: Q: Escapes in jsonpath Idents

2024-03-17 Thread David E. Wheeler
On Mar 17, 2024, at 15:12, Erik Wienhold  wrote:

> Hi David,

Hey Erik. Thanks for the detailed reply and patch!

> So I think it makes sense to reword the entire backslash part of the
> paragraph and remove references to JSON entirely.  The attached patch
> does that and also formats the backslash escapes as a bulleted list for
> readability.

Ah, it’s JavaScript format, not JSON! This does clarify things quite nicely, 
thank you. Happy to add my review once it’s in a commit fest.

> The first case ($.$foo) is in line with the restriction on member
> accessors that you quoted first.

Huh, that’s now how I read it. Here it is again:

>> Member accessor that returns an object member with the specified
>> key. If the key name matches some named variable starting with $ or
>> does not meet the JavaScript rules for an identifier, it must be
>> enclosed in double quotes to make it a string literal.


Note that in my example `$foo` does not match a variable. I mean it looks like 
a variable, but none is used here. I guess it’s being conservative because it 
might be used in one of the functions, like jsonb_path_exists(), to which 
variables might be passed.

> The error message 'syntax error at or near "$oo" of jsonpath input' for
> the second case ($.f$oo), however, looks as if the scanner identifies
> '$oo' as a variable instead of contiuing the scan of identifier (f$oo)
> for the member accessor.  Looks like a bug to me because a variable
> doesn't even make sense in that place.

Right. Maybe the docs should be updated to say that a literal dollar sign isn’t 
supported in identifiers, unlike in JavaScript, except through escapes like 
this:

> What works though, besides double quoting, is escaping the dollar sign:
> 
> regress=# select '$.\u0024foo'::jsonpath;
> jsonpath
> --
> $."$foo"
> (1 row)
> 
> And we've come full circle :)



Best,

David






Re: Q: Escapes in jsonpath Idents

2024-03-16 Thread David E. Wheeler
On Mar 16, 2024, at 14:39, David E. Wheeler  wrote:

> I went looking for the JavaScript rules for an identifier and found this in 
> the MDN docs[2]:
> 
>> In JavaScript, identifiers can contain Unicode letters, $, _, and digits 
>> (0-9), but may not start with a digit. An identifier differs from a string 
>> in that a string is data, while an identifier is part of the code. In 
>> JavaScript, there is no way to convert identifiers to strings, but sometimes 
>> it is possible to parse strings into identifiers.

Coda: Dollar signs don’t work at all outside double-quoted string identifiers:

david=# select '$.$foo'::jsonpath;
ERROR:  syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
   ^

david=# select '$.f$oo'::jsonpath;
ERROR:  syntax error at or near "$oo" of jsonpath input
LINE 1: select '$.f$oo'::jsonpath;
   ^

david=# select '$."$foo"'::jsonpath;
 jsonpath 
--
 $."$foo"

This, too, contradicts the MDM definition an identifier (and some quick browser 
tests).

Best,

David





Q: Escapes in jsonpath Idents

2024-03-16 Thread David E. Wheeler
Hackers,

The jsonpath doc[1] has an excellent description of the format of strings, but 
for unquoted path keys, it simply says:

> Member accessor that returns an object member with the specified key. If the 
> key name matches some named variable starting with $ or does not meet the 
> JavaScript rules for an identifier, it must be enclosed in double quotes to 
> make it a string literal.

I went looking for the JavaScript rules for an identifier and found this in the 
MDN docs[2]:

> In JavaScript, identifiers can contain Unicode letters, $, _, and digits 
> (0-9), but may not start with a digit. An identifier differs from a string in 
> that a string is data, while an identifier is part of the code. In 
> JavaScript, there is no way to convert identifiers to strings, but sometimes 
> it is possible to parse strings into identifiers.


However, the Postgres parsing of jsonpath keys appears to follow the same rules 
as strings, allowing backslash escapes:

david=# select '$.fo\u00f8 == $x'::jsonpath;
 jsonpath   ---
 ($."foø" == $"x")

This would seem to contradict the documentation. Is this behavior required by 
the SQL standard? Do the docs need updating? Or should the code actually follow 
the JSON identifier behavior?

Thanks,

David

PS: Those excellent docs on strings mentions support for \v, but the grammar in 
the right nav of https://www.json.org/json-en.html does not. Another bonus 
feature?

[1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH
[2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier





Re: Patch: Add parse_type Function

2024-03-08 Thread David E. Wheeler
On Mar 7, 2024, at 23:39, Erik Wienhold  wrote:

> I think you need to swap the examples.  The text mentions the error case
> first and the NULL case second.

臘‍♂️

Thanks, fixed in the attached patch.

David



v11-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-03-07 Thread David E. Wheeler
Hello Hackers,

On Feb 25, 2024, at 13:00, David E. Wheeler  wrote:

>> postgres=# SELECT to_regtypemod('timestamp(-4)');
>> ERROR:  syntax error at or near "-"
>> LINE 1: SELECT to_regtypemod('timestamp(-4)');
>>  ^
>> CONTEXT:  invalid type name "timestamp(-4)"
>> 
>> postgres=# SELECT to_regtypemod('text(-4)');
>> ERROR:  type modifier is not allowed for type "text"
> 
> Yeah, there was quite a bit of discussion of this issue back in September[1].
> 
>> This behaviour is mentioned in the documentation, so I'd say it is ok.
> 
> This is my attempt to make it clearer that it can return an error, but I 
> don’t love the wording TBH.

I’ve rebased the patch and, in an attempt to clarify this behavior, added a 
couple of examples to the docs for to_regtype. Updated patch attached.

Best,

David



v10-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: An improved README experience for PostgreSQL

2024-02-28 Thread David E. Wheeler
On Feb 28, 2024, at 1:51 PM, Alvaro Herrera  wrote:
 
> *IF* people don't go overboard, yes.  I agree, but let's keep an eye so
> that it doesn't become an unreadable mess.  I've seen some really
> horrible markdown files that I'm sure most of you would object to.

Markdown++

IME the keys to decent-looking Markdown are:

1. Wrapping lines to a legible width (76-80 chars)
2. Link references rather than inline links

I try to follow these for my blog; posts end up looking like this:

https://justatheory.com/2024/02/extension-metadata-typology.text

(Append `.text` to any post to see the raw(ish) Markdown.

Best,

David





Re: Patch: Add parse_type Function

2024-02-25 Thread David E. Wheeler
On Feb 24, 2024, at 19:11, Jim Jones  wrote:

>> What’s the protocol for marking a patch ready for committer?
> 
> I guess after the review of the last assigned reviewer

Oh, I didn’t realize someone was assigned. :-)

> The fact that a completely invalid type returns NULL ..
> 
> SELECT to_regtypemod('foo');
>  to_regtypemod
> ---
>   
> (1 row)
> 
> 
> .. but a "partially" valid one returns an error might be confusing
> 
> postgres=# SELECT to_regtypemod('timestamp(-4)');
> ERROR:  syntax error at or near "-"
> LINE 1: SELECT to_regtypemod('timestamp(-4)');
>   ^
> CONTEXT:  invalid type name "timestamp(-4)"
> 
> postgres=# SELECT to_regtypemod('text(-4)');
> ERROR:  type modifier is not allowed for type "text"

Yeah, there was quite a bit of discussion of this issue back in September[1].

> This behaviour is mentioned in the documentation, so I'd say it is ok.

This is my attempt to make it clearer that it can return an error, but I don’t 
love the wording TBH.

> I would personally prefer either NULL or an error in both cases, but I
> can totally live with the current design.

SAME.

Best,

David

[1] 
https://www.postgresql.org/message-id/flat/09f9cad6-5096-43cc-b6a7-685703e47...@justatheory.com





Re: Patch: Add parse_type Function

2024-02-24 Thread David E. Wheeler
On Feb 21, 2024, at 19:13, David E. Wheeler  wrote:

> Thanks. Anyone else? Or is it ready for committer?

What’s the protocol for marking a patch ready for committer?

Thanks,

David





Re: Patch: Add parse_type Function

2024-02-21 Thread David E. Wheeler
On Feb 21, 2024, at 17:19, Erik Wienhold  wrote:

> Thanks David!  LGTM.

Thanks. Anyone else? Or is it ready for committer?

Best,

David





Re: Patch: Add parse_type Function

2024-02-21 Thread David E. Wheeler
On Feb 21, 2024, at 16:43, Erik Wienhold  wrote:

> The docs still state that to_regtypemod() has a named parameter, which
> is not the case per pg_proc.dat.

Bah, I cleaned it up before but somehow put it back. Thanks for the catch. 
Fixed.

Best,

David



v9-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: to_regtype() Raises Error

2024-02-21 Thread David E. Wheeler
On Feb 21, 2024, at 11:54 AM, David Wheeler  wrote:

> Merged this change into the [to_regtypemod 
> patch](https://commitfest.postgresql.org/47/4807/), which has exactly the 
> same issue.
> 
> The new status of this patch is: Needs review

Bah, withdrawn.

D





Re: Patch: Add parse_type Function

2024-02-21 Thread David E. Wheeler
On Feb 21, 2024, at 11:18, Erik Wienhold  wrote:

> Thanks.  But it's an applefile again, not a patch :P

WTF. I still have that feature disabled.

Oh, I think I deleted the file after dragged it into Mail but before sending, 
because it’s empty everywhere I look. 臘‍♂️ Let’s try that again.

Best,

David


v8-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-21 Thread David E. Wheeler
On Feb 20, 2024, at 21:09, Erik Wienhold  wrote:

> The references are printed as "???" right now.  Can be fixed with
> xreflabel="format_type" and xreflabel="to_regtype" on those 
> elements.

Thanks.

> The docs show parameter name "type" but pg_proc.data does not define
> proargnames.  So the to_regtypemod() cannot be called using named
> notation:
> 
> test=> select to_regtypemod(type => 'int'::text);
> ERROR:  function to_regtypemod(type => text) does not exist

Yes, this format is identical to that of to_regtype():

david=# select to_regtype(type => 'int'::text);
ERROR:  function to_regtype(type => text) does not exist

> +Parses a string of text, extracts a potential type name from it, and
>> +translates its typmod, the modifier for the type, if any. Failure to
> 
> s/typmod, the modifier of the type/type modifier/
> 
> Because format_type() already uses "type modifier" and I think it helps
> searchability to use the same wording.

Yes, definitely better wording, thanks. V8 attached.

Best,

David



v8-0001-Add-to_regtypemod-SQL-function.patch
Description: application/applefile




Re: to_regtype() Raises Error

2024-02-20 Thread David E. Wheeler


On Feb 5, 2024, at 09:01, David E. Wheeler  wrote:

> Ah, thank you. Updated patch attached.

I’ve moved this patch into the to_regtype patch thread[1], since it exhibits 
the same behavior.

Best,

David

[1] 
https://www.postgresql.org/message-id/60ef4e11-bc1c-4034-b37b-695662d28...@justatheory.com





Re: Patch: Add parse_type Function

2024-02-20 Thread David E. Wheeler
On Feb 20, 2024, at 01:30, jian he  wrote:

> the second hint `-- grammar error expected` seems to contradict with
> the results?

Quite right, thank you, that’s actually a trapped error. I’ve tweaked the 
comments and their order in v7, attached.

This goes back to the discussion of the error raising of to_regtype[1], so I’ve 
incorporated the patch from that thread into this patch, and set up the docs 
for to_regtypemod() with similar information. The wording is still a little 
opaque for my taste, though, written more for someone who knows a bit about the 
internals, but it’s a start.

I’ve also fixed the wayward parameter in the function signature in the docs, 
and added a note about why I’ve also patched genbki.pl.

Best,

David

[1] 
https://www.postgresql.org/message-id/flat/57E1FDDC-5A38-452D-82D7-A44DA2E13862%40justatheory.com#1ae0b11634bc33c7ad3cd728e43d504e



v7-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-19 Thread David E. Wheeler
On Feb 19, 2024, at 21:58, Erik Wienhold  wrote:

> See the patch I wrote for my benchmarks.  But it's pretty easy anyway to
> cut down parse_type() ;)

LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached.

> But you don't actually need reformat_type() in pgTAP.  You can just get
> the type OID and modifier of the want_type and have_type and compare
> those.  Then use format_type() for the error message.  Looks a bit
> cleaner to me than doing the string comparison.

Fair.

> On second thought, I guess comparing the reformatted type names is
> necessary in order to have a uniform API on older Postgres releases
> where pgTAP has to provide its own to_regtypmod() based on typmodin
> functions.

Maybe. Worth playing with.

>> For the latter, it could easily be an example in the docs.
> 
> Can be mentioned right under format_type().

Well I included it in the to_regtypemod docs here, but could so either.

Best,

David



v6-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-19 Thread David E. Wheeler
On Feb 19, 2024, at 15:47, Tom Lane  wrote:

>> 1. Add a to_regtypmod() for those who just want the typemod.
> 
> Seems like there's a good case for doing that.

I’ll work on that.

> I'm less thrilled about that, mainly because I can't think of
> a good name for it ("format_type_string" is certainly not that).
> Is the use-case for this functionality really strong enough that
> we need to provide it as a single function rather than something
> assembled out of spare parts?

Not essential for pgTAP, no, as we can just use the parts. It was the typmod 
bit that was missing.

On Feb 19, 2024, at 17:13, Tom Lane  wrote:

> After some time ruminating, a couple of possibilities occurred to
> me:
> reformat_type(text)
> canonical_type_name(text)

I was just thinking about hitting the thesaurus entry for “canonical”, but I 
quite like reformat_type. It’s super clear and draws the parallel to 
format_type() more clearly. Will likely steal the name for pgTAP if we don’t 
add it to core. :-)

> I'm still unconvinced about that, though.

I guess the questions are:

* What are the other use cases for it?
* How obvious is it how to do it?

For the latter, it could easily be an example in the docs.

Best,

David





Re: Patch: Add parse_type Function

2024-02-19 Thread David E. Wheeler
On Feb 18, 2024, at 15:55, Erik Wienhold  wrote:

>> The overhead of parse_type_and_format can be related to higher planning
>> time. PL/pgSQL can assign composite without usage FROM clause.
> 
> Thanks, didn't know that this makes a difference.  In that case both
> variants are on par.

Presumably this is a side-effect of the use of a RECORD here, which requires a 
FROM clause in pure SQL, yes?

The only way I can think of to avoid that is to:

1. Add a to_regtypmod() for those who just want the typemod.

2. Add a format_type_string() function that returns a string, the equivalent of 
this function but in C:

CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
   SELECT format_type(to_regtype($1), to_regtypmod($1))
$$;

We could also keep the record-returning function for users who want both the 
regtype and the regytypemod at once, but with the other two I consider it less 
essential.

Thoughts?

Best,

David





Re: Patch: Add parse_type Function

2024-02-13 Thread David E. Wheeler
On Feb 12, 2024, at 15:55, David E. Wheeler  wrote:

> Happy to move it wherever makes the most sense.

Update with a bunch of the suggested changes. Some questions still open from 
the previous post, though. 

Best,

David



v5-0001-Add-parse_type-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-12 Thread David E. Wheeler
On Feb 12, 2024, at 12:53 PM, Tom Lane  wrote:

> "David E. Wheeler"  writes:
>> [ v4-0001-Add-parse_type-SQL-function.patch ]
> 
> It strikes me that this is basically to_regtype() with the additional
> option to return the typmod.  That leads to some questions:

Huh. I saw it more on a par with format_type(), at least in the output I expect.

> * Should we choose a name related to to_regtype?  I don't have any
> immediate suggestions, but since you didn't seem entirely set on
> parse_type, maybe it's worth thinking along those lines.  OTOH,
> to the extent that people would use this with format_type, maybe
> parse_type is fine.

Yeah, and the `to_*` functions return a single OID, not two fields.

> * Perhaps the type OID output argument should be declared regtype
> not plain OID?  It wouldn't make any difference for passing it to
> format_type, but in other use-cases perhaps regtype would be more
> readable.  It's not a deal-breaker either way of course, since
> you can cast oid to regtype or vice versa.

Sure. I used `oid` because that’s exactly what the argument to format_type() is 
called, so thought the parity there more appropriate. Maybe its argument should 
be renamed to regtype?

> * Maybe the code should be in adt/regproc.c not format_type.c.

Happy to move it wherever makes the most sense.

> * Experience with to_regtype suggests strongly that people would
> prefer "return NULL" to failure for an unrecognized type name.

Could do, but as with to_regtype() there’s an issue with error handling when it 
descends into the SQL parser.

> Skimming the patch, I notice that the manual addition to
> builtins.h should be unnecessary: the pg_proc.dat entry
> should be enough to create an extern in fmgrprotos.h.

Confirmed, will remove in next patch.

> Also, I'm pretty sure that reformat_dat_file.pl will
> think your pg_proc.dat entry is overly verbose.  See
> https://www.postgresql.org/docs/devel/system-catalog-initial-data.html

There are a bunch of things it reformats:

❯ git diff --name-only   
src/include/catalog/pg_aggregate.dat
src/include/catalog/pg_database.dat
src/include/catalog/pg_proc.dat
src/include/catalog/pg_type.dat
src/include/utils/builtins.h

And even in pg_proc.dat there are 13 blocks it reformats. I can submit with 
just my block formatted if you’d prefer.

> BTW, another way that this problem could be approached is to use
> to_regtype() as-is, with a separate function to obtain the typmod:
> 
> select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)'));

Oh interesting.

> This is intellectually ugly, since it implies parsing the same
> typename string twice.  But on the other hand it avoids the notational
> pain and runtime overhead involved in using a record-returning
> function.  So I think it might be roughly a wash for performance.
> Question to think about is which way is easier to use.  I don't
> have an opinion particularly; just throwing the idea out there.

Honestly I haven’t cared for the fact that parse_type() returns a record; it 
makes it a bit clunky. But yeah, so does this to pass the same value to two 
function calls.

Perhaps it makes sense to add to_regtypmod() as you suggest, but then also 
something like:

CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
SELECT format_type(to_regtype($1), to_regtypmod($1))
$$;

Best,

David





Re: Patch: Add parse_type Function

2024-02-12 Thread David E. Wheeler
On Feb 10, 2024, at 20:52, Erik Wienhold  wrote:
> 
> Let me comment on some issues since I wrote the very first version of
> parse_type() on which David's patch is based.

Thanks Erik.

>> On 2024-02-01 01:00 +0100, jian he  wrote:
>> if you are wondering around other code deal with NULL, ErrorSaveContext.
>> NULL would be more correct?
>> `(void) parseTypeString(type, , , NULL);`

Fixed.

>> also parseTypeString already explained the third argument, our
>> comments can be simplified as:
>> /*
>> * Parse type-name argument to obtain type OID and encoded typmod. We don't
>> * need to handle parseTypeString failure, just let the error be
>> * raised.
>> */


Thanks, adopted that language.

>> cosmetic issue. Looking around other error handling code, the format
>> should be something like:
>> `
>> if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
>>ereport(ERROR,
>>  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>>errmsg("function returning record called in"
>> "context that cannot accept type record")));
>> `
> 
> +1

I poked around and found some other examples, yes. I went with a single long 
line for errmsg following the example in adminpack.c[1]

>> `PG_FUNCTION_INFO_V1(parse_type);`
>> is unnecessary?
>> based on the error information:  https://cirrus-ci.com/task/5899457502380032
>> not sure why it only fails on windows.
> 
> Yeah, it's not necessary for internal functions per [1].  It's a
> leftover from when this function was part of the pgtap extension.

Removed.

>> +#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */
>> +#undef PARSE_TYPE_STRING_COLS
>> Are these necessary?
>> given that the comments already say return the OID and type modifier.
> 
> Not sure what the convention here is but I found this in a couple of
> places, maybe even a tutorial on writing C functions.  See
> `git grep '_COLS\s\+[1-9]'` for those instances.  It's in line with my
> habit of avoiding magic constants.

Left in place for now.

> 
>> +( typid oid,
>> +  typmod int4 )
>> here `int4` should be `integer`?
> 
> +1

Fixed.

> Yes, the sentence is rather handwavy.  What is meant here is that this
> function also resolves types whose typmod is determined by the SQL
> parser or some step after that.  There are types whose typmod is
> whatever value is found inside the parenthesis, e.g. bit(13) has typmod
> 13.  Our first idea before coming up with that function was to do some
> string manipulation and extract the typmod from inside the parenthesis.
> But we soon found out that other typmods don't translate one to one,
> e.g.  varchar(13) has typmod 17.  The interval type is also special
> because the typmod is some bitmask encoding of e.g. 'second(0)'.  Hence
> the mention of the SQL grammar.

I adopted some of your language here --- and fixed the spelling errors :-)

>> can be simplified:
>> +   
>> +Parses a string representing an SQL data type, optionally
>> schema-qualified.
>> +Returns a record with two fields, typid and
>> +typmod, representing the OID and
>> modifier for the
>> +type. These correspond to the parameters to pass to the
>> +format_type
>> function.
>> +   
>> (I don't have a strong opinion though)
> 
> Sounds better.  The CREATE TABLE reference is superfluous.

Done.

Best,

David
[1] 
https://github.com/postgres/postgres/blob/09eb633e1baa3b7cd7929f3cc77f9c46f63c20b1/contrib/adminpack/adminpack.c#L508-L511



v4-0001-Add-parse_type-SQL-function.patch
Description: Binary data




Re: Patch: Add parse_type Function

2024-02-05 Thread David E. Wheeler
On Feb 5, 2024, at 10:47 AM, Jim Jones  wrote:

> The patch applies cleanly and it no longer crashes with NULL parameters.
> Docs render fine and regression tests pass. I have nothing more to add.
> Let's now wait for Pavel's review.

Much obliged!

D





Re: Patch: Add parse_type Function

2024-02-05 Thread David E. Wheeler
On Feb 5, 2024, at 09:49, Jim Jones  wrote:

> Yes, if the function was strict (which in the current design is not the
> case) there is no need to check for nulls.

Right, done, and cleaned up the redundant comments.

Best,

David



v3-0001-Add-parse_type-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-05 Thread David E. Wheeler
On Feb 5, 2024, at 08:02, Jim Jones  wrote:

> There are a few issues though:

Excellent, thank you very much! Updated patch attached.

Best,

David




v2-0001-Add-parse_type-SQL-function.patch
Description: Binary data


Re: to_regtype() Raises Error

2024-02-05 Thread David E. Wheeler
On Feb 4, 2024, at 19:11, Erik Wienhold  wrote:

> Here "extracted names" should be "extracted name" (singular).
> Otherwise, the text looks good.

Ah, thank you. Updated patch attached.

Best,

David


v2-0001-Update-to_regtype-docs-regarding-error-condition.patch
Description: Binary data


Re: to_regtype() Raises Error

2024-02-04 Thread David E. Wheeler
On Feb 2, 2024, at 15:33, David E. Wheeler  wrote:

> Anyway, I’m happy to submit a documentation patch along the lines you 
> suggested.

How’s this?

--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25460,11 +25460,12 @@ SELECT collation for ('foo' COLLATE "de_DE");
 regtype


-Translates a textual type name to its OID.  A similar result is
+Parses a string of text, extracts a potential type name from it, and
+translates that name into an OID. A similar result is
 obtained by casting the string to type regtype (see
-); however, this function will return
-NULL rather than throwing an error if the name is
-not found.
+). Failure to extract a valid potential
+type name results in an error; however, if the extracted names is not
+known to the system, this function will return NULL.

   
  

Does similar wording need to apply to other `to_reg*` functions?

Best,

David




v1-0001-Update-to_regtype-docs-regarding-error-condition.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-04 Thread David E. Wheeler
On Feb 4, 2024, at 13:52, Pavel Stehule  wrote:

> not yet, but I'll do it

Nice, thank you. I put it into the Commitfest.

  https://commitfest.postgresql.org/47/4807/

Best,

David





Re: Patch: Add parse_type Function

2024-02-04 Thread David E. Wheeler
On Feb 4, 2024, at 13:02, Pavel Stehule  wrote:

> I thinks so proposed functionality can be useful 

Great, thank you!

Is that a review? :-)

D






Patch: Add parse_type Function

2024-02-04 Thread David E. Wheeler
Hackers,

Attached is a patch to add a new function, `parse_type()`. It parses a type 
string and returns a record with the typid and typmod for the type, or raises 
an error if the type is invalid. It’s effectively a thin layer over the 
parser’s parseTypeString() function.

The purpose of this function is to allow uses to resolve any type name, 
including aliases, to its formal name as rendered by, for example, pg_dump. An 
example:

david=# WITH s(s) AS (
SELECT * FROM unnest(ARRAY[
'timestamp(4)',
'interval(0)',
'interval second(0)',
'timestamptz',
'timestamptz(6)',
'varchar',
'varchar(128)' 
]) 
),
p(type, typid, typmod) AS (
SELECT s, ((parse_type(s)).*)
  FROM s 
)   
SELECT type, format_type(typid, typmod) FROM p;
type|  format_type   
+
 timestamp(4)   | timestamp(4) without time zone
 interval(0)| interval(0)
 interval second(0) | interval second(0)
 timestamptz| timestamp with time zone
 timestamptz(6) | timestamp(6) with time zone
 varchar| character varying
 varchar(128)   | character varying(128)
(7 rows)


The use case leading to this patch was pgTAP column data type assertions. pgTAP 
traditionally required full type names for testing data types, but recently 
added support for aliases. This broke for some types, because it was difficult 
to extract the typmod correctly, and even when we did, it failed for types such 
as `interval second(0)`, where `second (0)` is the typmod, and there was no 
sensible way to access the bit mask to generate the proper typmod to pass to 
`format_type()`.

Erik Wienhold wrote this function to solve the problem, and I volunteered to 
submit a patch.

Potential issues and questions for this patch:

* Is `parse_type()` the right name to use? I can see arguments for 
`parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`. 
Whatever the consensus is works for me.

* The function returns a record, which is a little fussy. I could see adding a 
`format_type_string()` function that just contains `SELECT 
format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted 
type. But I think it might also be useful to have access to the typmod and 
typid directly via this method, since they’re already exposed as 
`format_type()`’s arguments.

* Is format_type.c the right home for the function? I put it there because I 
closely associate it with format_type(), but happy to move it to a more 
appropriate location.

* I tried to link to `format_type` from the docs entry for `parse_type`, and 
added an ID for the former, but I’m not sure it resolves.

Best,

David



v1-0001-Add-parse_type-SQL-function.patch
Description: Binary data


Re: to_regtype() Raises Error

2024-02-02 Thread David E. Wheeler
On Feb 2, 2024, at 3:23 PM, David G. Johnston  
wrote:

> Seems like most just want to leave well enough alone and deal with the rare 
> question for oddball input on the mailing list.  If you are interested enough 
> to come back after 4 months I'd suggest you write up and submit a patch.  I'm 
> happy to review it and see if that is enough to get a committer to respond.

LOL, “interested enough” is less the right term than “triaging email backlog 
and following up on a surprisingly controversial question.” I also just like to 
see decisions made and issues closed one way or another.

Anyway, I’m happy to submit a documentation patch along the lines you suggested.

D





Re: to_regtype() Raises Error

2024-01-29 Thread David E. Wheeler
Hey there, coming back to this. I poked at the logs in the master branch and 
saw no mention of to_regtype; did I miss it?

On Sep 17, 2023, at 10:58 PM, David G. Johnston  
wrote:

> Parses a string of text, extracts a potential type name from it, and 
> translates that name into an OID.  Failure to extract a valid potential type 
> name results in an error while a failure to determine that the extracted name 
> is known to the system results in a null output.
> 
> I take specific exception to describing your example as a “textual type name”.

More docs seem like a reasonable compromise. Perhaps it’d be useful to also 
describe when an error is likely and when it’s not.

Best,

David





Re: Bug: The "directory" control parameter does not work

2024-01-26 Thread David E. Wheeler
On Jan 26, 2024, at 5:05 PM, David E. Wheeler  wrote:

> I made a simpler test case here:
> 
>  https://github.com/theory/test-extension-directory
> 
> Not the difference between actual and expected output.

Bah! Need to also set `MODULEDIR = extension/click` and then it works.

D





Re: Bug: The "directory" control parameter does not work

2024-01-26 Thread David E. Wheeler
On Jan 26, 2024, at 4:40 PM, David E. Wheeler  wrote:

> But it doesn’t seem to work. I tried this experiment:

I made a simpler test case here:

  https://github.com/theory/test-extension-directory

Not the difference between actual and expected output.

Best,

David






Bug: The "directory" control parameter does not work

2024-01-26 Thread David E. Wheeler
Hackers,

I wanted to try to customize the subdirectory for an extension. The docs[1] say:

> directory (string)
> 
> The directory containing the extension's SQL script file(s). Unless an 
> absolute path is given, the name is relative to the installation's SHAREDIR 
> directory. The default behavior is equivalent to specifying directory = 
> 'extension'.


  [1]: https://www.postgresql.org/docs/16/extend-extensions.html

But it doesn’t seem to work. I tried this experiment:

❯ pg_config --version
PostgreSQL 16.1

❯ git clone g...@github.com:theory/kv-pair.git
❯ cd kv-pair
❯ echo directory = 'blah' >> pair.control
❯ cat pair.control
# pair extension
comment = 'A key/value pair data type'
default_version = '0.1.2'
module_pathname = '$libdir/pair'
relocatable = true
directory = blah

❯ make install
cp sql/pair.sql sql/pair--0.1.2.sql
/opt/homebrew/bin/gmkdir -p '/Users/david/.pgenv/pgsql-16.1/share/extension'
/opt/homebrew/bin/gmkdir -p '/Users/david/.pgenv/pgsql-16.1/share/extension'
/opt/homebrew/bin/gmkdir -p '/Users/david/.pgenv/pgsql-16.1/share/doc/extension'
/opt/homebrew/bin/ginstall -c -m 644 .//pair.control 
'/Users/david/.pgenv/pgsql-16.1/share/extension/'
/opt/homebrew/bin/ginstall -c -m 644 .//sql/pair--unpackaged--0.1.2.sql  
'/Users/david/.pgenv/pgsql-16.1/share/extension/'
/opt/homebrew/bin/ginstall -c -m 644 .//doc/pair.md 
'/Users/david/.pgenv/pgsql-16.1/share/doc/extension/‘

❯ find ~/.pgenv/pgsql-16.1/ -name '*pair*.*'
/Users/david/.pgenv/pgsql-16.1//include/server/lib/pairingheap.h
/Users/david/.pgenv/pgsql-16.1//share/extension/pair--unpackaged--0.1.2.sql
/Users/david/.pgenv/pgsql-16.1//share/extension/pair.control
/Users/david/.pgenv/pgsql-16.1//share/doc/extension/pair.md

I see two problems:

1. pair--0.1.2.sql is not installed. It does install without `directory` set.
2. pair--unpackaged--0.1.2.sql is installed, but not in a `blah` directory. 
Shouldn’t it be?

Am I missing something?

Best,

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-26 Thread David E. Wheeler
On Jan 25, 2024, at 11:03, Tom Lane  wrote:

> I changed the preceding para to say "... check expressions are
> required in ...", which I thought was sufficient to cover that.
> Also, the tabular description of the operator tells you not to do it.

Yeah, that’s good. I was perhaps leaning into being over-explicit after it took 
me a while to even figure out that there was a difference, let alone where 
matters.

>> What do you think of also dropping the article from all the references to 
>> “the strict mode” or “the lax mode”, to make them “strict mode” and “lax 
>> mode”, respectively?
> 
> Certainly most of 'em don't need it.  I'll make it so.

Nice, thanks!

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-24 Thread David E. Wheeler
On Jan 24, 2024, at 16:32, Tom Lane  wrote:

> "David E. Wheeler"  writes:
> 
>> In any event, something to do with @@, perhaps to have some compatibility 
>> with `jsonb @> jsonb`? I don’t know why @@ was important to have.
> 
> Yeah, that's certainly under-explained.  But it seems like I'm not
> getting traction for the idea of changing the behavior, so let's
> go back to just documenting it.

Curious about those discussions. On the one hand I find the distinction between 
the two behaviors to be odd, and to produce unexpected results when they’re not 
used in the proper context.

It’s reminds me of the Perl idea of context, where functions behave differently 
in scalar and list context, and if you expect list behavior on scalar context 
you’re gonna get a surprise. This is a bit of a challenge for those new to the 
language, as they’re not necessarily aware of the context.

> I spent some time going over your
> text and also cleaning up nearby shaky English, and ended with v8
> attached.  I'd be content to commit this if it looks good to you.

This looks very nice, thank you. A couple of comments.

> +  
> +   Predicate check expressions are required in the
> +   @@ operator (and the
> +   jsonb_path_match function), and should not be 
> used
> +   with the @? operator (or the
> +   jsonb_path_exists function).
> +  
> + 
> +

I had this bit here:

  
   Conversely, non-predicate jsonpath expressions should not be
   used with the @@ operator (or the
   jsonb_path_match function).
  

I think it’s important to let people know what the difference is in the 
behavior of the two forms, in every spot it’s likely to come up. SQL-standard 
JSON Path expressions should never be used in contexts (functions, operators) 
only designed to work with predicate check expressions, and the docs should say 
so IMO.

>
> -The lax mode facilitates matching of a JSON document structure and path
> -expression if the JSON data does not conform to the expected schema.
> +The lax mode facilitates matching of a JSON document and path
> +expression when the JSON data does not conform to the expected schema.


What do you think of also dropping the article from all the references to “the 
strict mode” or “the lax mode”, to make them “strict mode” and “lax mode”, 
respectively?

Thanks for the review!

Best,

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler
On Jan 21, 2024, at 14:58, David E. Wheeler  wrote:

> I make this interpretation based on this bit of the docs:

Sorry, that’s from my branch. Here it is in master:

   

A path expression can be a Boolean predicate, although the SQL/JSON
standard allows predicates only in filters. This is necessary for
implementation of the @@ operator. For example,
the following jsonpath expression is valid in
PostgreSQL:

$.track.segments[*].HR  70




In any event, something to do with @@, perhaps to have some compatibility with 
`jsonb @> jsonb`? I don’t know why @@ was important to have.

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler
On Jan 21, 2024, at 14:52, David E. Wheeler  wrote:

> This is the only way the different behaviors make sense to me. @? expects a 
> set, not a boolean, sees there is an item in the set, so returns true:

I make this interpretation based on this bit of the docs:

   
PostgreSQL's implementation of the SQL/JSON path
language has the following deviations from the SQL/JSON standard.



Boolean Predicate Check Expressions

As an extension to the SQL standard, a PostgreSQL
path expression can be a Boolean predicate, whereas the SQL standard allows
predicates only in filters. Where SQL standard path expressions return the
relevant contents of the queried JSON value, predicate check expressions
return the three-valued result of the predicate: true,
false, or unknown. Compare this
filter jsonpath expression:

= select jsonb_path_query(:'json', 
'$.track.segments ?(@[*].HR  130)');
jsonb_path_query
-
{"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}

To a predicate expression, which returns true

= select jsonb_path_query(:'json', 
'$.track.segments[*].HR  130');
jsonb_path_query
--
true



Best,

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler
On Jan 21, 2024, at 14:43, Tom Lane  wrote:

> I don't entirely buy this argument --- if that is the interpretation,
> of what use are predicate check expressions?  It seems to me that we
> have to consider them as being a shorthand notation for filter
> expressions, or else they simply do not make sense as jsonpath.

I believe it becomes pretty apparent when using jsonb_path_query(). The filter 
expression returns a set (using the previous \gset example):

david=# select jsonb_path_query(:'json', '$.track.segments[*].HR ? (@ > 10)');
 jsonb_path_query 
--
 73
 135
(2 rows)

The predicate check returns a boolean:

david=# select jsonb_path_query(:'json', '$.track.segments[*].HR > 10');
 jsonb_path_query 
--
 true
(1 row)

This is the only way the different behaviors make sense to me. @? expects a 
set, not a boolean, sees there is an item in the set, so returns true:

david=# select jsonb_path_query(:'json', '$.track.segments[*].HR > 1000');
 jsonb_path_query 
--
 false
(1 row)

david=# select :'json'::jsonb @? '$.track.segments[*].HR > 1000';
 ?column? 
--
 t
(1 row)

Best,

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler
On Jan 20, 2024, at 12:34, Tom Lane  wrote:

> It will take a predicate, but seems to always return true:
> 
> regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] < 5' ;
> ?column? 
> --
> t
> (1 row)
> 
> regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] > 5' ;
> ?column? 
> --
> t
> (1 row)

Just for the sake of clarity, this return value is “correct,” because @? and 
other functions and operators that expect SQL standard statements evaluate the 
SET returned by the JSONPath statement, but predicate check expressions don’t 
return a set, but a always a single scalar value (true, false, or null). From 
the POV of the code expecting SQL standard JSONPath results, that’s a set of 
one. @? sees that the set is not empty so returns true.

Best,

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E . Wheeler
On Jan 20, 2024, at 11:45, Tom Lane  wrote:

> You sure about that?  It would surprise me if we could effectively use
> a not-equal condition with an index.  If it is only == that works,
> then the preceding statement seems sufficient.

I’m not! I just assumed it in the same way creating an SQL = operator 
automatically respects NOT syntax (or so I recall). In fiddling a bit, I can’t 
get it to use an index:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s 
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json 
| jq -c ".[]" | sed "s|||g"';
create index on movies using gin (movie);
analyze movies;

david=# explain analyze select id from movies where movie @? '$ ?(@.genre[*] != 
"Teen")';
QUERY PLAN  
 
-
Seq Scan on movies  (cost=0.00..3741.41 rows=4 width=4) (actual 
time=19.222..19.223 rows=0 loops=1)
  Filter: (movie @? '$?(@."genre"[*] != "Teen")'::jsonpath)
  Rows Removed by Filter: 36273
Planning Time: 1.242 ms
Execution Time: 19.247 ms
(5 rows)

But that might be because the planner knows that the query is going to fetch 
most records, anyway. If I set most records to a single value:

david=# update movies set movie =  jsonb_set(movie, '{year}', '2020'::jsonb) 
where id < 3600;
UPDATE 3599
david=# analyze movies;
ANALYZE
david=# explain analyze select id from movies where movie @? '$ ?(@.year != 
2020)';
QUERY PLAN  


Seq Scan on movies  (cost=0.00..3884.41 rows=32609 width=4) (actual 
time=0.065..43.730 rows=32399 loops=1)
  Filter: (movie @? '$?(@."year" != 2020)'::jsonpath)
  Rows Removed by Filter: 3874
Planning Time: 1.759 ms
Execution Time: 45.368 ms
(5 rows)

Looks like it still doesn’t use the index with !=. Pity.

Best,

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler
On Jan 20, 2024, at 12:34, Tom Lane  wrote:

> Surely we're not helping anybody by leaving that behavior in place.
> Making it do something useful, throwing an error, or returning NULL
> all seem superior to this.  I observe that @@ returns NULL for the
> path type it doesn't like, so maybe that's what to do here.

I agree it would be far better for the behavior to be consistent, but frankly 
would like to see them raise an error. Ideally the hit would suggest the proper 
alternative operator or function to use, and maybe link to the docs that 
describe the difference between SQL-standard JSONPath and "predicate check 
expressions”, and how they have separate operators and functions.

I think of them as practically different data types (and wish they were, TBH). 
It makes sense that passing a JSON containment expression[1] would raise an 
error; so should passing the wrong flavor of JSONPath.

> BTW, jsonb_path_query_array and jsonb_path_query_first seem to
> take both types of path, like jsonb_path_query, so ISTM they need
> docs changes too.

Happy to update the patch, either to add those docs or, if we change the 
behavior to return a NULL or raise an error, then with that information, 
instead.

Best,

David

  [1]: 
https://www.postgresql.org/docs/current/datatype-json.html#JSON-CONTAINMENT





Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-20 Thread David E. Wheeler
On Jan 19, 2024, at 21:46, Erik Wienhold  wrote:

> Interesting... copy-pasting the entire \set command works for me with
> psql 16.1 in gnome-terminal and tmux.  Typing it out manually gives me
> the "unterminated quoted string" error.  Maybe has to do with my stty
> settings.

Yes, same on macOS Terminal.app and 16.1 compiled with readline. I didn’t 
realize that \set didn’t support newlines, because it works fine when you paste 
something with newlines. Curious.

>> I experimented with
>> 
>> SELECT '
>>  ... multiline json value ...
>> ' AS json
>> \gexec
>> 
>> but that didn't seem to work either.  Anybody have a better idea?
> 
> Fine with me (the \gset variant).

Much cleaner TBH.

david=# select '{ 
  "track": {
"segments": [
  {
"location":   [ 47.763, 13.4034 ],
"start time": "2018-10-14 10:05:14",
"HR": 73
  },
  {
"location":   [ 47.706, 13.2635 ],
"start time": "2018-10-14 10:39:21",
"HR": 135
  }
]
  }
}'::jsonb as json;

json
 

 {"track": {"segments": [{"HR": 73, "location": [47.763, 13.4034], "start 
time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], 
"start time": "2018-10-14 10:39:21"}]}}
(1 row)

david=# \gset

david=# select :'json'::jsonb;

   jsonb
 

 {"track": {"segments": [{"HR": 73, "location": [47.763, 13.4034], "start 
time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], 
"start time": "2018-10-14 10:39:21"}]}}
(1 row)

So great!

While you’re in there, Tom, would it make sense to fold in something like [this 
patch][1] I posted last month to clarify which JSONPath comparison operators 
can take advantage of a index?

--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -513,7 +513,7 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ 
'$.tags[*] == "qui"';

For these operators, a GIN index extracts clauses of the form
accessors_chain
-= constant out of
+== constant out of
the jsonpath pattern, and does the index search based on
the keys and values mentioned in these clauses.  The accessors chain
may include .key,
@@ -522,6 +522,9 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ 
'$.tags[*] == "qui"';
The jsonb_ops operator class also
supports .* and .** accessors,
but the jsonb_path_ops operator class does not.
+Only the == and != SQL/JSON Path Operators
+can use the index.
  

  

Best,

David

  [1]: 
https://www.postgresql.org/message-id/0ece6b9c-cdde-4b65-be5a-49d737204...@justatheory.com



Re: GIN-Indexable JSON Patterns

2023-12-21 Thread David E. Wheeler
On Dec 17, 2023, at 13:10, David E. Wheeler  wrote:

> Quick follow-up to my slew of questions back in [September][1]. I wanted to 
> update [my patch][2] to note that only JSON Path equality operators are 
> supported by indexes, as [previously discussed][3].

Should I just add it to the patch and let the reviews fall where they may? :-)

Best,

David

 [1]: 
https://www.postgresql.org/message-id/15dd78a5-b5c4-4332-acfe-55723259c...@justatheory.com
 [2]: https://commitfest.postgresql.org/45/4624/
 [3]: 
https://www.postgresql.org/message-id/973d6495-cf28-4d06-7d46-758bd2615...@xs4all.nl





Re: JSON Path and GIN Questions

2023-12-17 Thread David E. Wheeler
On Dec 17, 2023, at 16:08, Tom Lane  wrote:

> I'd waited because the discussion was still active, and then it
> kind of slipped off the radar.  I'll take another look and push
> some form of what I suggested.

Right on.

> That doesn't really address the
> jsonpath oddities you were on about, though.

No, I attempted to address those in [a patch][1].

  [1]: https://commitfest.postgresql.org/45/4624/

Best,

David





GIN-Indexable JSON Patterns

2023-12-17 Thread David E. Wheeler
Hey Hackers,

Quick follow-up to my slew of questions back in [September][1]. I wanted to 
update [my patch][2] to note that only JSON Path equality operators are 
supported by indexes, as [previously discussed][3]. I thought perhaps adding a 
note to this bit of the docs would be useful:

> For these operators, a GIN index extracts clauses of the form accessors_chain 
> = constant out of the jsonpath pattern, and does the index search based on 
> the keys and values mentioned in these clauses. The accessors chain may 
> include .key, [*], and [index] accessors. The jsonb_ops operator class also 
> supports .* and .** accessors, but the jsonb_path_ops operator class does not.

But perhaps that’s what `accessors_chain = constant` is supposed to mean? I’m 
not super clear on it, though, since the operator is `==` and not `=` (and I 
would presume that `!=` would use the index, as well. Is that correct?

If so, how would you feel about something like this?

--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -513,7 +513,7 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ 
'$.tags[*] == "qui"';
 
 For these operators, a GIN index extracts clauses of the form
 accessors_chain
-= constant out of
+== constant out of
 the jsonpath pattern, and does the index search based on
 the keys and values mentioned in these clauses.  The accessors chain
 may include .key,
@@ -522,6 +522,9 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ 
'$.tags[*] == "qui"';
 The jsonb_ops operator class also
 supports .* and .** accessors,
 but the jsonb_path_ops operator class does not.
+Only the == and != SQL/JSON Path Operators
+can use the index.
   
 
   

Best,

David

  [1]: 
https://www.postgresql.org/message-id/15dd78a5-b5c4-4332-acfe-55723259c...@justatheory.com
  [2]: https://commitfest.postgresql.org/45/4624/
  [3]: 
https://www.postgresql.org/message-id/973d6495-cf28-4d06-7d46-758bd2615...@xs4all.nl
  [4]: https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING





Re: JSON Path and GIN Questions

2023-12-17 Thread David E. Wheeler
On Sep 17, 2023, at 18:09, David E. Wheeler  wrote:

> I think this is useful, but also that it’s worth calling out explicitly that 
> functions do not count as indexable operators. True by definition, of course, 
> but I at least had assumed that since an operator is, in a sense, syntax 
> sugar for a function call, they are in some sense the same thing.
> 
> A header might be useful, something like “What Counts as an indexable 
> expression”.

Hey Tom, are you still thinking about adding this bit to the docs? I took a 
quick look at master and didn’t see it there.

Thanks,

David





Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-24 Thread David E. Wheeler
On Oct 23, 2023, at 20:20, Erik Wienhold  wrote:

> I thought that you may have missed that one because I saw this change
> that removes the article:
> 
>> -In the strict mode, the specified path must exactly match the structure 
>> of
>> +In strict mode, the specified path must exactly match the structure of

Oh, didn’t realize. Fixed.

> LGTM.  Would you create a commitfest entry?  I'll set the status to RfC.

Done. 

  https://commitfest.postgresql.org/45/4624/

Best,

David



v7-0001-Improve-boolean-predicate-JSON-Path-docs.patch
Description: Binary data


Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-23 Thread David E. Wheeler
On Oct 22, 2023, at 20:36, Erik Wienhold  wrote:

> That's an AppleSingle file according to [1][2].  It only contains the
> resource fork and file name but no data fork.

Ah, I had “Send large attachments with Mail Drop” enabled. To me 20K is not big 
but whatever. Let’s see if turning it off fixes the issue.

> Any reason for calling it "predicate check expressions" (e.g. the link
> text) and sometimes "predicate path expressions" (e.g. the linked
> section title)?  I think it should be named consistently to avoid
> confusion and also to simplify searching.

I think "predicate path expressions” is more descriptive, but "predicate check 
expressions” is what was in the docs before, so let’s stick with that.

> Linking the same section twice in the same paragraph seems excessive.

Fair. Will link the second one.

>> += select jsonb_path_query(:'json', 
>> '$.track.segments');
>> +select jsonb_path_query(:'json', '$.track.segments');
> 
> Please remove the second SELECT.

Done.

>> += select jsonb_path_query(:'json', 'strict 
>> $.track.segments[0].location');
>> + jsonb_path_query
>> +---
>> + [47.763, 13.4034]
> 
> Strict mode is unnecessary to get that result and I'd omit it because
> the different modes are not introduced yet at this point.

Yep, pasto.

> Strict mode is unnecessary here as well.

Fixed.

>> + using the lax mode. To avoid surprising results, we recommend using
>> + the .** accessor only in the strict mode. The
> 
> Please change to "in strict mode" (without "the").

Hrm, I prefer it without the article, too, but it is consistently used that way 
elsewhere, like here:

 https://github.com/postgres/postgres/blob/5b36e8f/doc/src/sgml/func.sgml#L17401

I’d be happy to change them all, but was keeping it consistent for now.

Updated patch attached, thank you!

David



v6-0001-Improve-boolean-predicate-JSON-Path-docs.patch
Description: Binary data


Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-19 Thread David E. Wheeler
On Oct 19, 2023, at 10:49 PM, Erik Wienhold  wrote:

> Just wanted to take a look at v5.  But it's an applefile again :P

I don’t get it. It was the other times too! Are you able to save it with a 
.patch suffix?

D






Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-19 Thread David E. Wheeler
On Oct 19, 2023, at 01:22, jian he  wrote:

> "Do not use with non-predicate", double negative is not easy to
> comprehend. Maybe we can simplify it.
> 
> 16933: value. Use only SQL-standard JSON path expressions, not not
> there are two "not".
> 
> 15842: SQL-standard JSON path expressions, not not
> there are two "not”.


Thank you, jian. Updated patch attached and also on GitHub.

  
https://github.com/postgres/postgres/compare/master...theory:postgres:jsonpath-pred-docs

Best,

David




v5-0001-Improve-boolean-predicate-JSON-Path-docs.patch
Description: application/applefile


Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-16 Thread David E. Wheeler
On Oct 16, 2023, at 18:07, Erik Wienhold  wrote:

>> Okay, added, let’s just put all our cards on the table. :-)
> 
> I'll have a look but the attached v3 is not a patch but some applefile.

Weird, should be no different from previous attachments. I believe Apple Mail 
always uses application/octet-stream for attachments it doesn’t recognize, 
which includes .patch and .diff files, sadly.

> One of the added  is invalid by the looks of it.  Maybe 
> is missing because it says "got (para para )" at the end.

Oh, I thought it would report issues from the files they were found in. You’re 
right, I forgot a title. Fixed in v4.

David



v4-0001-Improve-boolean-predicate-JSON-Path-docs.patch
Description: Binary data





Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-16 Thread David E. Wheeler
On Oct 15, 2023, at 23:03, Erik Wienhold  wrote:

> Your call but I'm not against including it in this patch because it
> already touches the modes section.

Okay, added, let’s just put all our cards on the table. :-)

>> Agreed. Would be good if we could teach these functions and operators
>> to reject path expressions they don’t support.
> 
> Right, you mentioned that idea in [1] (separate types).  Not sure what
> the best strategy here is but it's likely to break existing queries.
> Maybe deprecating unsupported path expressions in the next major release
> and changing that to an error in the major release after that.

Well if the functions have a JsonPathItem struct, they can check its type 
attribute and reject those with a root type that’s a predicate in @? and reject 
it if it’s not a predicate in @@. Example of checking type here:

https://github.com/postgres/postgres/blob/54b208f90963cb8b48b9794a5392b2fae4b40a98/src/backend/utils/adt/jsonpath_exec.c#L622

>>> This can be checked with `make -C doc/src/sgml check`.
>> 
>> Thanks. That produces a bunch of warnings for postgres.sgml and
>> legal.sgml (and a failure to load the docbook DTD), but func.sgml is
>> clean now.
> 
> Hmm... I get no warnings on 1f89b73c4e.  Did you install all tools as
> described in [2]?  The DTD needs to be installed as well.

Thanks, got it down to one:

postgres.sgml:112: element sect4: validity error : Element sect4 content does 
not follow the DTD, expecting (sect4info? , (title , subtitle? , titleabbrev?) 
, (toc | lot | index | glossary | bibliography)* , (((calloutlist | glosslist | 
bibliolist | itemizedlist | orderedlist | segmentedlist | simplelist | 
variablelist | caution | important | note | tip | warning | literallayout | 
programlisting | programlistingco | screen | screenco | screenshot | synopsis | 
cmdsynopsis | funcsynopsis | classsynopsis | fieldsynopsis | 
constructorsynopsis | destructorsynopsis | methodsynopsis | formalpara | para | 
simpara | address | blockquote | graphic | graphicco | mediaobject | 
mediaobjectco | informalequation | informalexample | informalfigure | 
informaltable | equation | example | figure | table | msgset | procedure | 
sidebar | qandaset | task | anchor | bridgehead | remark | highlights | 
abstract | authorblurb | epigraph | indexterm | beginpage)+ , (refentry* | 
sect5* | simplesect*)) | refentry+ | sect5+ | simplesect+) , (toc | lot | index 
| glossary | bibliography)*), got (para para )
  

David



v3-0001-Improve-boolean-predicate-JSON-Path-docs.patch
Description: application/applefile




Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-15 Thread David E. Wheeler
On Oct 14, 2023, at 19:51, Erik Wienhold  wrote:

> Thanks for putting this together.  See my review at the end.

Appreciate the speedy review!

> Nice.  This really does help to make some sense of it.  I checked all
> queries and they do work out except for two queries where the path
> expression string is not properly quoted (but the intended output is
> still correct).

臘‍♂️

>> Follow-ups I’d like to make:
>> 
>> 1. Expand the modes section to show how the types of results can vary
>> depending on the mode, thanks to the flattening. Examples:
>> 
>> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
>> jsonb_path_query 
>> --
>> 3
>> 4
>> 5
>> (3 rows)
>> 
>> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}',  'strict $.a ?(@[*] > 
>> 2)');
>> jsonb_path_query  
>> --
>> [1, 2, 3, 4, 5]
>> 
>> 2. Improve the descriptions and examples for @?/jsonb_path_exists()
>> and @@/jsonb_path_match().
> 
> +1

I planned to submit these changes in a separate patch, based on Tom Lane’s 
suggestion[1]. Would it be preferred to add them to this patch?

> Perhaps make it explicit that the reader must run this in psql in order
> to use \set and :'json' in the ensuing samples?  Some of the existing
> examples already use psql output but they do not rely on any psql
> features.

Good call, done.

> This should use , , and  if it shows
> a psql session, e.g.:
> 
> 
> select jsonb_path_query(:'json', '$.track.segments');
> 
> 
> jsonb_path_query
> ---
> [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 
> 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": 
> "2018-10-14 10:39:21"}]
> 
> 

I pokwds around, and it appears the computeroutput bit is used for function 
output. So I followed the precedent in queries.sgml[2] and omitted the 
computeroutput tags but added prompt, e.g.,


= select jsonb_path_query(:'json', 'strict 
$.**.HR');
jsonb_path_query
--
73
135


> Also the cast to jsonb is not necessary and only adds clutter IMO.

Right, removed them all in function calls.

>> + 
>> +  Predicate-only path expressions are necessary for implementation of 
>> the
>> +  @@ operator (and the
>> +  jsonb_path_match function), and should not be 
>> used
>> +  with the @? operator (or
>> +  jsonb_path_exists function).
>> + 
>> +
>> + 
>> +  Conversely, non-predicate jsonpath expressions should 
>> not be
>> +  used with the @@ operator (or the
>> +  jsonb_path_match function).
>> + 
>> +
> 
> Both paras should be wrapped in a single  so that they stand out
> from the rest of the text.  Maybe even , but  is already
> used on this page for things that I'd consider warnings.

Agreed. Would be good if we could teach these functions and operators to reject 
path expressions they don’t support.

>> +
>> +Regular Expression Interpretation
>> + 
>> +  There are minor differences in the interpretation of regular
>> +  expression patterns used in like_regex filters, as
>> +  described in .
>> + 
>> +
> 
>  should be closed here,
> otherwise the docs won't build.  This can be checked with
> `make -C doc/src/sgml check`.

Thanks. That produces a bunch of warnings for postgres.sgml and legal.sgml (and 
a failure to load the docbook DTD), but func.sgml is clean now.

> `git diff --check` shows a couple of lines with trailing whitespace
> (mostly psql output).

I must’ve cleaned those after I sent the patch, good now. Updated patch 
attached, this time created by `git format-patch -v2`.

Best,

David

[1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-JOIN




v2-0001-Improve-boolean-predicate-JSON-Path-docs.patch
Description: Binary data


Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-14 Thread David E. Wheeler
On Oct 14, 2023, at 16:40, David E. Wheeler  wrote:

> Following up from a suggestion from Tom Lane[1] to improve the documentation 
> of boolean predicate JSON path expressions, please find enclosed a draft 
> patch to do so.

And now I see I can’t spell “Deviations”. Will fix along with any other 
requested revisions. GitHub diff here if you’re into that sort of thing:

  
https://github.com/postgres/postgres/compare/master...theory:postgres:jsonpath-pred-docs

Best,

David





Patch: Improve Boolean Predicate JSON Path Docs

2023-10-14 Thread David E. Wheeler
Hackers,

Following up from a suggestion from Tom Lane[1] to improve the documentation of 
boolean predicate JSON path expressions, please find enclosed a draft patch to 
do so. It does three things:

1. Converts all of the example path queries to use jsonb_path_query() and show 
the results, to make it clearer what the behaviors are.

2. Replaces the list of deviations from the standards with a new subsection, 
with each deviation in its own sub-subsection. The regex section is unchanged, 
but I’ve greatly expanded the boolean expression JSON path section with 
examples comparing standard filter expressions and nonstandard boolean 
predicates. I’ve also added an exhortation not use boolean expressions with @? 
or standard path expressions with @@.

3. While converting the modes section to use jsonb_path_query() and show the 
results, I also added an example of strict mode returning an error.

Follow-ups I’d like to make:

1. Expand the modes section to show how the types of results can vary depending 
on the mode, thanks to the flattening. Examples:

david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query 
--
3
4
5
(3 rows)

david=# select jsonb_path_query('{"a":[1,2,3,4,5]}',  'strict $.a ?(@[*] > 2)');
jsonb_path_query  
--
[1, 2, 3, 4, 5]

2. Improve the descriptions and examples for @?/jsonb_path_exists() and 
@@/jsonb_path_match().

Best,

David

[1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us



jsonpath-pred-docs.patch
Description: Binary data


Re: JSON Path and GIN Questions

2023-10-14 Thread David E. Wheeler
Thanks for the reply, Erik. Have appreciated collaborating with you on a few 
different things lately!

> On Oct 13, 2023, at 22:50, Erik Wienhold  wrote:

>> Hi, finally getting back to this, still fiddling to figure out the
>> differences. From the thread you reference [1], is the point that @@
>> and jsonb_path_match() can only be properly used with a JSON Path
>> expression that’s a predicate check?
> 
> I think so.  That's also supported by the existing docs which only
> mention "JSON path predicate" for @@ and jsonb_path_match().

Okay, good.

>> If so, as far as I can tell, only exists() around the entire path
>> query, or the deviation from the SQL standard that allows an
>> expression to be a predicate?
> 
> Looks like that.  But note that exists() is also a filter expression.
> So wrapping the entire jsonpath in exists() is also a deviation from the
> SQL standard which only allows predicates in filter expressions, i.e.
> ' ? ()'.

Yeah. I’m starting to get the sense that the Postgres extension of the standard 
to allow predicates without filters is almost a different thing, like there are 
two Pg SQL/JSON Path languages:

1. SQL Standard path language for selecting values and includes predicates. 
Returns the selected value(s). Supported by `@?` and jsonb_path_exists().

2. The Postgres predicate path language which returns a boolean, akin to a 
WHERE expression. Supported by `@@` and jsonb_path_match()

Both are supported by jsonb_path_query(), but if you use a standard path you 
get the values and if you use a predicate path you get a boolean. This feels a 
big overloaded to me, TBH; I find myself wanting them to be separate types 
since the behaviors vary quite a bit!

>> This suggest to me that the "Only the first item of the result is
>> taken into account” bit from the docs may not be quite right.
> 
> Yes, this was also the issue in the referenced thread[1].  I think my
> suggesstion in [2] explains it (as far as I understand it).

Yeah, lax vs. strict mode stuff definitely creates some added complexity. I see 
now I missed the rest of that thread; seeing the entire thread on one page[1] 
really helps. I’d like to take a stab at the doc improvements Tom suggests[2].

>> jsonb_path_match(), it turns out, only wants a single result. But
>> furthermore perhaps the use of a filter predicate rather than a
>> predicate expression for the entire path query is an error?
> 
> Yes, I think @@ and jsonb_path_match() should not be used with filter
> expressions because the jsonpath returns whatever the path expression
> yields (which may be an actual boolean value in the jsonb).  The filter
> expression only filters (as the name suggests) what the path expression
> yields.

Agreed. It only gets worse with a filter expression that selects a single value:

david=# select jsonb_path_match('{"a":[false,true]}',  '$.a ?(@[*] == false)');
 jsonb_path_match  
--
 f

Presumably it returns false because the value selected is JSON `false`:

david=# select jsonb_path_query('{"a":[false,true]}',  '$.a ?(@[*] == false)');
 jsonb_path_query  
--
 false

Which seems misleading, frankly. Would it be possible to update 
jsonb_path_match and @@ to raise an error when the path expression is not a 
predicate?


>> Curiously, @@ seems okay with it:
>> 
>> david=#  select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
>> ?column? 
>> --
>> t
>> 
>> Not a predicate query, and somehow returns true even though the first
>> item of the result is false? Is that how it should be?
> 
> Your example does a text search equivalent to:
> 
> select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? 
> (@[*] == true)')
> 
> You forgot the cast to jsonb.  

Oh good grief 臘‍♂️

> jsonb @@ jsonpath actually returns null:
> 
> test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)';
>  ?column?
> --
>  
> (1 row)

Yes, much better, though see the result above that returns a single `false` and 
confuses things.

> This matches the note right after the docs for @@:

Yeah, that makes sense. But here’s a bit about lax mode[3] that confuses me:

> The lax mode facilitates matching of a JSON document structure and path 
> expression if the JSON data does not conform to the expected schema. If an 
> operand does not match the requirements of a particular operation, it can be 
> automatically wrapped as an SQL/JSON array or unwrapped by converting its 
> elements into an SQL/JSON sequence before performing this operation. Besides, 
> comparison operators automatically unwrap their operands in the lax mode, so 
> you can compare SQL/JSON arrays out-of-the-box.

This automatic flattening in lax mode seems odd, because it means you get 
different results in strict and lax mode where there are no errors. In lax 
mode, you get a set:

david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query 
--
3
4
5
(3 rows)

But 

Re: JSON Path and GIN Questions

2023-10-08 Thread David E. Wheeler
On Sep 12, 2023, at 21:00, Erik Wienhold  wrote:

>> I posted this question on Stack Overflow 
>> (https://stackoverflow.com/q/77046554/79202),
>> and from the suggestion I got there, it seems that @@ expects a boolean to be
>> returned by the path query, while @? wraps it in an implicit exists(). Is 
>> that
>> right?
> 
> That's also my understanding.  We had a discussion about the docs on @@, @?, 
> and
> jsonb_path_query on -general a while back [1].  Maybe it's useful also.

Hi, finally getting back to this, still fiddling to figure out the differences. 
From the thread you reference [1], is the point that @@ and jsonb_path_match() 
can only be properly used with a JSON Path expression that’s a predicate check?

If so, as far as I can tell, only exists() around the entire path query, or the 
deviation from the SQL standard that allows an expression to be a predicate?

This suggest to me that the "Only the first item of the result is taken into 
account” bit from the docs may not be quite right. Consider this example:

david=#  select jsonb_path_query('{"a":[false,true,false]}',  '$.a ?(@[*] == 
false)');
 jsonb_path_query
--
 false
 false
(2 rows)

david=#  select jsonb_path_match('{"a":[false,true,false]}',  '$.a ?(@[*] == 
false)');
ERROR:  single boolean result is expected

jsonb_path_match(), it turns out, only wants a single result. But furthermore 
perhaps the use of a filter predicate rather than a predicate expression for 
the entire path query is an error?

Curiously, @@ seems okay with it:

david=#  select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)';
 ?column? 
--
 t

Not a predicate query, and somehow returns true even though the first item of 
the result is false? Is that how it should be?

Best,

David

[1] 
https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com




Re: to_regtype() Raises Error

2023-09-17 Thread David E. Wheeler
On Sep 17, 2023, at 19:28, Tom Lane  wrote:

>> No, that is precisely the point.  The result should be null instead of
>> an error.
> 
> Yeah, ideally so, but the cost/benefit of making it happen seems
> pretty unattractive for now.  See the soft-errors thread at [1],
> particularly [2] (but searching in that thread for references to
> regclassin, regtypein, and to_reg* will find additional detail).

For my purposes I’m wrapping it in an exception-catching PL/pgSQL function, but 
it might be worth noting the condition in which it *will* raise an error on the 
docs.

Best,

David



signature.asc
Description: Message signed with OpenPGP


to_regtype() Raises Error

2023-09-17 Thread David E. Wheeler
The docs for `to_regtype()` say, “this function will return NULL rather than 
throwing an error if the name is not found.” And it’s true most of the time:

david=# select to_regtype('foo'), to_regtype('clam');
 to_regtype | to_regtype
+
 [null] | [null]

But not others:

david=# select to_regtype('inteval second');
ERROR:  syntax error at or near "second"
LINE 1: select to_regtype('inteval second');
^
CONTEXT:  invalid type name "inteval second”

I presume this has something to do with not catching errors from the parser?

david=# select to_regtype('clam bake');
ERROR:  syntax error at or near "bake"
LINE 1: select to_regtype('clam bake');
 ^
CONTEXT:  invalid type name "clam bake"

Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-17 Thread David E. Wheeler
On Sep 17, 2023, at 12:20, Tom Lane  wrote:

> After thinking about it for awhile, I think we need some more
> discursive explanation of what's allowed, perhaps along the lines
> of the attached.  (I still can't shake the feeling that this is
> duplicative; but I can't find anything comparable until you get
> into the weeds in Section V.)
> 
> I put the new text at the end of section 11.1, but perhaps it
> belongs a little further up in that section; it seems more
> important than some of the preceding paras.

I think this is useful, but also that it’s worth calling out explicitly that 
functions do not count as indexable operators. True by definition, of course, 
but I at least had assumed that since an operator is, in a sense, syntax sugar 
for a function call, they are in some sense the same thing.

A header might be useful, something like “What Counts as an indexable 
expression”.

Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 16, 2023, at 18:13, Erik Wienhold  wrote:

> Looks like the effect of lax mode which may unwrap arrays when necessary [1].
> The array unwrapping looks like the result of jsonb_array_elements().
> 
> It kinda works in strict mode:
> 
> SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] 
> < 14)');
> 
>jsonb_path_query
> ---
>  [47.763, 13.4034]
>  [47.706, 13.2635]
> (2 rows)
> 
> But it does not remove elements from the matching arrays.  Which I don't even
> expect here because the path specifies the location array as the object to be
> returned.  The filter expression then only decides whether to return the
> location array or not.  Nowhere in the docs does it say that the filter
> expression itself removes any elements from a matched array.

Yes, this is what I expected. It means “select the location array if any of its 
contents is less that 14.”

I don’t understand why it’s different in lax mode, though, as `@[*]` is not a 
structural error; it confirms to the schema, as the docs say. The flattening in 
this case seems weird.

Ah, here’s why:, from the docs:

"Besides, comparison operators automatically unwrap their operands in the lax 
mode, so you can compare SQL/JSON arrays out-of-the-box.”

There follow some discussion of the need to specify `[*]` on segments in strict 
mode, but since that’s exactly what my example does (and the same for the 
locations array inside the filter), it doesn’t seem right to me that it would 
be unwrapped here.

> Here's a query that filter's out individual array elements.  It's quite a
> mouthful (especially to preserve the order of array elements):

Wow fun, and yeah, it makes sense to take things apart in SQL for this sort of 
thing!

Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 16, 2023, at 16:50, Erik Wienhold  wrote:

> "For these operators, a GIN index extracts clauses of the form
> **accessors_chain = constant** out of the jsonpath pattern, and does the
> index search based on the keys and values mentioned in these clauses."
> 
> I don't know if this is a general limitation of GIN indexes or just how these
> operators are implemented right now.
> 
> [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING


The detail that jumps out at me is this one on jsonb_path_ops:

“Basically, each jsonb_path_ops index item is a hash of the value and the 
key(s) leading to it”

Because jsonb_path_ops indexes hashes, I would assume it would only support 
path equality. But it’s not clear to me from these docs that jsonb_ops also 
indexes hashes. Does it?

Best,

D



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 12, 2023, at 21:00, Erik Wienhold  wrote:

>> If so, I’d like to submit a patch to the docs talking about this, and
>> suggesting the use of jsonb_path_query() to test paths to see if they return
>> a boolean or not.
> 
> +1

I’ve started work on this; there’s so much to learn! Here’s a new example that 
surprised me a bit. Using the GPS tracker example from the docs [1] loaded into 
a `:json` psql variable, this output of this query makes perfect sense to me:

david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 
14)');
 jsonb_path_query
--
 13.4034
 13.2635

Because `[*]` selects all the values. This, however, I did not expect:

david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 
14)');
 jsonb_path_query
--
 13.4034
 13.2635
(2 rows)

I had expected it to return two single-value arrays, instead:

 [13.4034]
 [13.2635]

It appears that the filter expression is doing some sub-selection, too. Is that 
expected?

Best,

David

  [1]: 
https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH


signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 15, 2023, at 23:59, Erik Rijkers  wrote:

> movie @? '$ ?($.year >= 2023)'
> 
> I believe it is indeed not possible to have such a unequality-search use the 
> GIN index.  It is another weakness of JSON that can be unexpected to those 
> not in the fullness of Knowledge of the manual. Yes, this too would be good 
> to explain in the doc where JSON indexes are explained.

Is that a limitation of GIN indexes in general? Or could there be opclass 
improvements in the future that would enable such comparisons?

Thanks,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-16 Thread David E. Wheeler
On Sep 15, 2023, at 20:36, Tom Lane  wrote:

> I think that that indicates that you're putting the info in the
> wrong place.  Perhaps the right answer is to insert something
> more explicit in section 11.2, which is the first place where
> we really spend any effort discussing what can be indexed.

Fair enough. How ’bout this?

--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -120,7 +120,7 @@ CREATE INDEX test1_id_index ON test1 (id);
B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension bloom.
Each index type uses a different
-   algorithm that is best suited to different types of queries.
+   algorithm that is best suited to different types of queries and operators.
By default, the CREATE
INDEX command creates
B-tree indexes, which fit the most common situations.
@@ -132,6 +132,14 @@ CREATE INDEX name ON 
table
 
   

+  
+
+Only operators on indexed columns are considered as index qualifications.
+Functions never qualify for index usage, aside from
+indexes on expressions.
+
+  
+
   
B-Tree




signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-15 Thread David E. Wheeler
On Sep 12, 2023, at 21:00, Erik Wienhold  wrote:

> That's also my understanding.  We had a discussion about the docs on @@, @?, 
> and
> jsonb_path_query on -general a while back [1].  Maybe it's useful also.

Okay, I’ll take a pass at expanding the docs on this. I think a little 
mini-tutorial on these two operators would be useful.

Meanwhile, I’d like to re-up this question about the index qualification of 
non-equality JSON Path operators.

On Sep 12, 2023, at 20:16, David E. Wheeler  wrote:

> Issue 3: Index Use for Comparison
> -
> 
> From the docs 
> (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I 
> had assumed any JSON Path query would be able to use the GIN index. However 
> while the use of the == JSON Path operator is able to take advantage of the 
> GIN index, apparently the >= operator cannot:
> 
> david=# explain analyze select id from movies where movie @? '$ ?($.year >= 
> 2023)';
>   QUERY PLAN  
> 
> -
> Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual 
> time=34.815..36.259 rows=192 loops=1)
>   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
>   Rows Removed by Filter: 36081
> Planning Time: 1.864 ms
> Execution Time: 36.338 ms
> (5 rows)
> 
> Is this expected? Originally I tried with json_path_ops, which I can 
> understand not working, since it stores hashes of paths, which would allow 
> only exact matches. But a plain old GIN index doesn’t appear to work, either. 
> Should it? Is there perhaps some other op class that would allow it to work? 
> Or would I have to create a separate BTREE index on `movie -> 'year'`?

Thanks,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-15 Thread David E. Wheeler
On Sep 14, 2023, at 00:41, Tom Lane  wrote:

> As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]
> is pretty clear about what is or is not supported.

How do you feel about this note, then?

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index b6c2ddbf55..7dda727f0d 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -413,6 +413,13 @@ SELECT doc-'site_name' FROM websites
 Two GIN operator classes are provided, offering different
 performance and flexibility trade-offs.
   
+  
+
+As with all indexes, only operators on indexed columns are considered as
+index qualifications. In other words, only jsonb operators can
+take advantage of GIN indexes; jsonb functions cannot.
+
+  
   
 The default GIN operator class for jsonb supports queries with
 the key-exists operators ?, ?|


Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-13 Thread David E. Wheeler
On Sep 13, 2023, at 01:11, Erik Rijkers  wrote:

> "All use of json*() functions preclude index usage."
> 
> That sentence is missing from the documentation.

Where did that come from? Why wouldn’t JSON* functions use indexes? I see that 
the docs only mention operators; why would the corresponding functions behave 
the same?

D



signature.asc
Description: Message signed with OpenPGP


JSON Path and GIN Questions

2023-09-12 Thread David E. Wheeler
Greetings Hackers,

Been a while! I’m working on some experiments with JSONB columns and GIN 
indexes, and have operated on the assumption that JSON Path operations would 
take advantage of GIN indexes, with json_path_ops as a nice optimization. But 
I’ve run into what appear to be some inconsistencies and oddities I’m hoping to 
figure out with your help.

For the examples in this email, I’m using this simple table:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s 
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json 
| jq -c ".[]" | sed "s|||g"';
create index on movies using gin (movie);
analyze movies;

That gives me a simple table with around 3600 rows. Not a lot of data, but 
hopefully enough to demonstrate the issues.

Issue 1: @@ vs @?
-

I have been confused as to the difference between @@ vs @?: Why do these return 
different results?

david=# select id from movies where movie @@ '$ ?(@.title == "New Life 
Rescue")';
 id

(0 rows)

david=# select id from movies where movie @? '$ ?(@.title == "New Life 
Rescue")';
 id

 10
(1 row)

I posted this question on Stack Overflow 
(https://stackoverflow.com/q/77046554/79202), and from the suggestion I got 
there, it seems that @@ expects a boolean to be returned by the path query, 
while @? wraps it in an implicit exists(). Is that right?

If so, I’d like to submit a patch to the docs talking about this, and 
suggesting the use of jsonb_path_query() to test paths to see if they return a 
boolean or not.


Issue 2: @? Index Use
-

From Oleg’s (happy belated birthday!) notes 
(https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators):


> Operators @? and @@ are interchangeable:
> 
> js @? '$.a' <=> js @@ 'exists($.a)’
> js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’

For the purposes of the above example, this appears to hold true: if I wrap the 
path query in exists(), @@ returns a result:

david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life 
Rescue"))';
 id

 10
(1 row)

Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN 
index while @? does not.

Or, no, fiddling with it again just now, I think I have still been confusing 
these operators! @@ was using the index with an an explicit exists(), but @? 
was not…because I was still using an explicit exists.

In other words:

* @@ 'exists($ ?($.year == 1944))'  Uses the index
* @? '$ ?(@.year == 1944)'  Uses the index
* @? 'exists($ ?($.year == 1944))'  Does not use the index

That last one presumably doesn’t work, because there is an implicit exists() 
around the exists(), making it `exists(exists($ ?($.year == 1944)))`, which 
returns true for every row  (true and false both exists)! 臘‍♂️.

Anyway, if I have this right, I’d like to flesh out the docs a bit.

Issue 3: Index Use for Comparison
-

From the docs 
(https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I 
had assumed any JSON Path query would be able to use the GIN index. However 
while the use of the == JSON Path operator is able to take advantage of the GIN 
index, apparently the >= operator cannot:

david=# explain analyze select id from movies where movie @? '$ ?($.year >= 
2023)';
   QUERY PLAN   
  
-
 Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual 
time=34.815..36.259 rows=192 loops=1)
   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
   Rows Removed by Filter: 36081
 Planning Time: 1.864 ms
 Execution Time: 36.338 ms
(5 rows)

Is this expected? Originally I tried with json_path_ops, which I can understand 
not working, since it stores hashes of paths, which would allow only exact 
matches. But a plain old GIN index doesn’t appear to work, either. Should it? 
Is there perhaps some other op class that would allow it to work? Or would I 
have to create a separate BTREE index on `movie -> 'year'`?

Thanks your your patience with my questions!

Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSONPath Child Operator?

2023-01-30 Thread David E . Wheeler
On Jan 30, 2023, at 08:17, Filipp Krylov  wrote:

>> My question: Are there plans to support square bracket syntax for JSON 
>> object field name strings like this? Or to update to follow the standard as 
>> it’s finalized?
> 
> This syntax is a part of "jsonpath syntax extensions" patchset: 
> https://www.postgresql.org/message-id/e0fe4f7b-da0b-471c-b3da-d8adaf314357%40postgrespro.ru

Nice, thanks. I learned since sending this email that SQL/JSON Path is not at 
all the same as plain JSON Path, so now I’m less concerned bout it. I like the 
new object subscript syntax, though, I’ve been thinking about this myself.

D





JSONPath Child Operator?

2022-11-10 Thread David E. Wheeler
Greetings! Long time no see, I know. How are you, Hackers?

I notice from the docs in the Postgres JSONPath type, brackets are described as:

> • Square brackets ([]) are used for array access.


  https://www.postgresql.org/docs/current/datatype-json.html#DATATYPE-JSONPATH

Notably they are not used for object field path specifications, in contrast to 
the original JSON Path design, which says:

> JSONPath expressions can use the dot–notation
> 
> $.store.book[0].title
> 
> or the bracket–notation
> 
> $['store']['book'][0]['title']

  https://goessner.net/articles/JsonPath/index.html#e2

Similarly, the current IETF RFC Draft says:

> JSONPath expressions use the _bracket notation_, for example:
> 
>$['store']['book'][0]['title']
> 
>or the more compact _dot notation_, for example:
> 
>$.store.book[0].title

  https://datatracker.ietf.org/doc/draft-ietf-jsonpath-base/


My question: Are there plans to support square bracket syntax for JSON object 
field name strings like this? Or to update to follow the standard as it’s 
finalized?

Thanks,

David





Re: When is int32 not an int32?

2021-09-26 Thread David E. Wheeler
On Sep 26, 2021, at 19:25, Tom Lane  wrote:

> More to the point, you should be checking whether strtol reports overflow.
> Having now seen your code, I'll opine that the failing platforms have
> 32-bit long.

Thanks for the pointer, Tom. I believe this fixes that particular issue.

  https://github.com/theory/pg-semver/commit/4d79dcc

Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: When is int32 not an int32?

2021-09-26 Thread David E. Wheeler
On Sep 26, 2021, at 18:31, Tom Lane  wrote:

> I'd bet more along the lines of "your overflow check is less portable than
> you thought”.

Oh well now that you mention it and I look past things, I see we’re using 
INT_MAX, but should probably use INT32_MAX.

  
https://github.com/theory/pg-semver/blob/87cc30cbe80aa3992a4af6f19a35a944a86c/src/semver.c#L145-L149

And also that the destination value we’re storing it in is an int parts[], not 
int32 parts[]. Which we do so we can parse numbers up to int size. But to 
Fetter’s point, we’re not properly handling something greater than int (usually 
int64, presumably). Not sure what changes are required to improve memory safety 
over and above using INT32_MAX instead of INT_MAX.

Thanks,

Daavid



signature.asc
Description: Message signed with OpenPGP


When is int32 not an int32?

2021-09-26 Thread David E. Wheeler
Hell Hackers, long time no email!

I got a bug report for the semver extension:

  https://github.com/theory/pg-semver/issues/58

It claims that a test unexpected passes. That is, Test #31 is expected to fail, 
because it intentionally tests a version in which its parts overflow the 
int32[3] they’re stored in, with the expectation that one day we can refactor 
the type to handle larger version parts.

I can’t imagine there would be any circumstance under which int32 would somehow 
be larger than a signed 32-bit integer, but perhaps there is?

Scroll to the bottom of these pages to see the unexpected passes on i386 and 
armhf:

  
https://ci.debian.net/data/autopkgtest/unstable/i386/p/postgresql-semver/15208658/log.gz
  
https://ci.debian.net/data/autopkgtest/unstable/armhf/p/postgresql-semver/15208657/log.gz

Here’s the Postgres build output for those two platforms, as well, though 
nothing jumps out at me:

  
https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=i386=13.4-3=1630408269=0
  
https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=armhf=13.4-3=1630412028=0


Thanks,

David



signature.asc
Description: Message signed with OpenPGP


Re: [HACKERS] Look-behind regular expressions

2020-07-07 Thread David E. Wheeler
On Jul 7, 2020, at 21:32, Andreas Karlsson  wrote:

> On 7/7/20 6:51 PM, Thom Brown wrote:
> 
>> 10 years later, and I've noticed that both look-behind and negative
>> look-behind have been implemented.
>> Thanks to whomever did this.
> 
> I think that it is Tom Lane that you should thank for this.
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=12c9a04008870c283931d6b3b648ee21bbc2cfda

In 2015! Thank you, Tom. I’ll have to find a use for it one of these days.

D




signature.asc
Description: Message signed with OpenPGP