jsonpath Time and Timestamp Special Cases
Hello hackers, I noticed that the jsonpath date/time functions (.time() and timestamp(), et al.) don’t support some valid but special-case PostgreSQL values, notably `infinity`, `-infinity`, and, for times, '24:00:00`: ❯ psql psql (17devel) Type "help" for help. david=# select jsonb_path_query(to_jsonb('24:00:00'::time), '$.time()'); ERROR: time format is not recognized: "24:00:00" david=# select jsonb_path_query(to_jsonb('infinity'::timestamptz), '$.timestamp_tz()'); ERROR: timestamp_tz format is not recognized: "infinity" I assume this is because the standard doesn’t support these, or references JavaScript-only values or some such. Is that right? Best, David
Re: New committers: Melanie Plageman, Richard Guo
On Apr 26, 2024, at 07:54, Jonathan S. Katz wrote: > The Core Team would like to extend our congratulations to Melanie Plageman > and Richard Guo, who have accepted invitations to become our newest > PostgreSQL committers. > > Please join us in wishing them much success and few reverts! Great news, congratulations! Best, David
Re: Q: Escapes in jsonpath Idents
On Apr 24, 2024, at 3:22 PM, Erik Wienhold wrote: > Thanks Peter! But what is the definition of the entire path expression? > Perhaps something like: > > ::= { "." } > > That would imply that "$.$foo" is a valid path that accesses a variable > member (but I guess the path evaluation is also specified somewhere). I read it as “if it starts with a dollar sign, it’s a variable and not a path identifier”, and I assume any `.foo` expression is a path identifier. > What bugs me about this description, after reading it a couple of times, > is that it's not clear what is meant by ."$varname". It could mean two > things: (1) the double-quoting masks $varname in order to not interpret > those characters as a variable or (2) an interpolated string that > resolves $varname and yields a dynamic member accessor. My understanding is that if it’s in double quotes it’s never anything other than a string (whether a string literal or a path identifier string literal). IOW, variables don’t interpolate inside strings. > Under case (2) I'd expect that query to return 456 (because $foo > resolves to "bar"). (Similar to how psql would resolve :'foo' to > 'bar'.) Yes, I suspect this is the correct interpretation, but agree the wording could use some massaging, especially since path identifiers cannot start with a dollar sign anyway. Perhaps: "If the key name starts with $ or does not meet the JavaScript rules for an identifier, it must be enclosed in double quotes to make it a string literal." > Variables already work in array accessors and table 8.25 says that "The > specified index can be an integer, as well as an expression returning a > single numeric value [...]". A variable is such an expression. > >=> select jsonb_path_query('[2,3,5]', '$[$i]', '{"i":1}'); > jsonb_path_query >-- > 3 >(1 row) > > So I'd expect a similar behavior for member accessors as well when > seeing ."$varname" in the same table. Oh, interesting point! Now I wonder if the standard has this inconsistency (and is aware of it). > Yes, I think so. That would be case C in the spec excerpt provided by > Peter. So it's just a key name that happens to contain (but not start > with) the dollar sign. Exactly. It also matches the doc you quote above. Something would have to change in src/backend/utils/adt/jsonpath_scan.l to fix that, but that file makes my eyes water, so I’m not gonna take a stab at it. :-) D
Re: Q: Escapes in jsonpath Idents
On Apr 24, 2024, at 05:46, Peter Eisentraut wrote: > I have committed this patch, and backpatched it, as a bug fix, because the > existing description was wrong. To keep the patch minimal for backpatching, > I didn't do the conversion to a list. I'm not sure I like that anyway, > because it tends to draw more attention to that part over the surrounding > parts, which didn't seem appropriate in this case. But anyway, if you have > any more non-bug-fix editing in this area, which would then target PG18, > please send more patches. Makes sense, that level of detail gets into the weeks so maybe doesn’t need to be quite so prominent as a list. Thank you! David
Re: Q: Escapes in jsonpath Idents
On Apr 24, 2024, at 05:51, Peter Eisentraut wrote: >A is classified as follows. > >Case: > >a) A that is a is acontext variable>. > >b) A that begins with is a > . > >c) Otherwise, a is a . > > Does this help? I wasn't following all the discussion to see if there is > anything wrong with the implementation. Yes, it does, as it ties the special meaning of the dollar sign to the *beginning* of an expression. So it makes sense that this would be an error: david=# select '$.$foo'::jsonpath; ERROR: syntax error at or near "$foo" of jsonpath input LINE 1: select '$.$foo'::jsonpath; ^ But I’m less sure when a dollar sign is used in the *middle* (or end) of a json path identifier: david=# select '$.xx$foo'::jsonpath; ERROR: syntax error at or near "$foo" of jsonpath input LINE 1: select '$.xx$foo'::jsonpath; ^ Perhaps that should be valid? Best, David
Re: RFC: Additional Directory for Extensions
On Apr 4, 2024, at 1:20 PM, David E. Wheeler wrote: > I’ve added some docs based on your GUC description; updated patch attached. Here’s a rebase. I realize this probably isn’t going to happen for 17, given the freeze, but I would very much welcome feedback and pointers to address concerns about providing a second directory for extensions and DSOs. Quite a few people have talked about the need for this in the Extension Mini Summits[1], so I’m sure I could get some collaborators to make improvements or look at a different approach. Best, David [1] https://justatheory.com/2024/02/extension-ecosystem-summit/#extension-ecosystem-mini-summit v3-0001-Add-extension_directory-GUC.patch Description: Binary data
Re: ❓ JSON Path Dot Precedence
On Apr 10, 2024, at 10:29, Peter Eisentraut wrote: > So the whole thing is > > > > The syntax of and is then punted to > ECMAScript 5.1. > > 0x2 is a HexIntegerLiteral. (There can be no dots in that.) > > p10 is an Identifier. > > So I think this is all correct. That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point: what path can `'0x2.p10` even select? Best, David
Re: ❓ JSON Path Dot Precedence
On Apr 7, 2024, at 15:46, Erik Wienhold wrote: > I guess jsonpath assumes that hex, octal, and binary literals are > integers. So there's no ambiguity about any fractional part that might > follow. Yeah, that’s what the comment in the flex file says: https://github.com/postgres/postgres/blob/b4a71cf/src/backend/utils/adt/jsonpath_scan.l#L102-L105 > Also, is there even a use case for path "0x2.p10"? The path has to > start with "$" or ("@" in case of a filter expression), doesn't it? And > it that case it doesn't parse: > >test=# select '$.0x2.p10'::jsonpath; >ERROR: trailing junk after numeric literal at or near ".0x" of jsonpath > input >LINE 1: select '$.0x2.p10'::jsonpath; > > Even with extra whitespace: > >test=# select '$ . 0x2 . p10'::jsonpath; >ERROR: syntax error at or near "0x2" of jsonpath input >LINE 1: select '$ . 0x2 . p10'::jsonpath; > > Or should it behave like an array accessor? Similar to: > >test=# select jsonb_path_query('[0,1,{"p10":42},3]', > '$[0x2].p10'::jsonpath); > jsonb_path_query >-- > 42 >(1 row) I too am curious why these parse successfully, but don’t appear to be useful. Best, David
❓ JSON Path Dot Precedence
Hello Hackers, A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers: > Numeric literals in SQL/JSON path expressions follow JavaScript rules, which > are different from both SQL and JSON in some minor details. For example, > SQL/JSON path allows .1 and 1., which are invalid in JSON. In other words, this is valid: david=# select '2.'::jsonpath; jsonpath -- 2 But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How should that be parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2 followed by the path segment “p10”? Here’s the parser’s answer: david=# select '0x2.p10'::jsonpath; jsonpath --- (2)."p10" So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a decimal expression. Is that intentional/correct? Discovered while writing my Go lexer and throwing all of Go’s floating point literal examples[2] at it and comparing to the Postgres path parser. Curiously, this is only an issue for 0x/0o/0b numeric expressions; a decimal expression does not behave in the same way: david=# select '2.p10'::jsonpath; ERROR: trailing junk after numeric literal at or near "2.p" of jsonpath input LINE 1: select '2.p10'::jsonpath; Which maybe seems a bit inconsistent. Thoughts on what the “correct” behavior should be? Best, David [1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH [2]: https://tip.golang.org/ref/spec#Floating-point_literals
Re: RFC: Additional Directory for Extensions
On Apr 3, 2024, at 12:46 PM, Christoph Berg wrote: > Thanks for bringing this up, I should have submitted this years ago. > (The patch is originally from September 2020.) That’s okay, it’s still 2020 in some ways. > I designed the patch to require a superuser to set it, so it doesn't > matter very much by which mechanism it gets updated. There should be > little reason to vary it at run-time, so I'd be fine with requiring a > restart, but otoh, why restrict the superuser from reloading it if > they know what they are doing? I think that’s fair. I’ll keep it requiring a restart now, on the theory it would be easier to loosen it later than have to tighten it later. > I'm not sure the concept of "core libraries" exists. PG happens to > dlopen things at run time, and it doesn't know/care if they were > installed by users or by the original PG server. Also, an exploited > libpgtypes library is not worse than any other untrusted "user" > library, so you really don't want to allow users to provide their own > .so files, no matter by what mechanism. Yes, I guess my concern is whether it could be used to “shadow” core libraries. Maybe it’s no different, really. >> This would also allow the lookup of extension libraries prefixed by the >> directory field from the control file, which would enable much tidier >> extension installation: The control file, SQL scripts, and DSOs could all be >> in a single directory for an extension. > > Nice idea, but that would mean installing .so files into PGSHAREDIR. > Perhaps the whole extension stuff would have to move to PKGLIBDIR > instead. Yes, I was just poking around the code, and realized that, when extension functions are created they may or may not not use `MODULE_PATHNAME`, but in any event, there is nothing different about loading an extension DSO than any other DSO. I was hoping to find a path where it knows it’s opening a DSO for the purpose of an extension, so we could limit the lookup there. But that does not (currently) exist. Maybe we could add an `$extensiondir` variable to complement `$libdir`? Or is PGKLIBDIR is the way to go? I’m not familiar with it. It looks like extension JIT files are put there already. > Fwiw, I wrote this patch to solve the problem of testing extensions at > build-time where the build process does not have write access to > PGSHAREDIR. It solves that problem quite well, almost all PG > extensions have build-time test coverage now (where there was > basically 0 before). Yeah, good additional use case. On Apr 3, 2024, at 1:03 PM, Christoph Berg wrote: > Also, it's called extension "destdir" because it behaves like DESTDIR > in Makefiles: It prepends the given path to the path that PG is trying > to open when set. So it doesn't allow arbitrary new locations as of > now, just /home/build/foo-1/debian/foo/usr/share/postgresql/17/extension > in addition to /usr/share/postgresql/17/extension. (That is what the > Debian package build process needs, so that restriction/design choice > made sense. Right, this makes perfect sense, in that you don’t have to copy all the extension files from the destdir to the SHAREDIR to test them, which I imagine could be a PITA. > That's also included in the current GUC description: > > This directory is prepended to paths when loading extensions > (control and SQL files), and to the '$libdir' directive when > loading modules that back functions. The location is made > configurable to allow build-time testing of extensions that do not > have been installed to their proper location yet. > > Perhaps I should have included a more verbose "NOT FOR PRODUCTION" > there. The use cases I described upthread are very much production use cases. Do you think it’s not for production just because we need to really think it through? I’ve added some docs based on your GUC description; updated patch attached. Best, David v2-0001-Add-extension_directory-GUC.patch Description: Binary data
Re: RFC: Additional Directory for Extensions
On Apr 3, 2024, at 8:54 AM, David E. Wheeler wrote: > Yes, I like the suggestion to make it require a restart, which lets the > sysadmin control it and not limited to whatever the person who compiled it > thought would make sense. Here’s a revision of the Debian patch that requires a server start. However, in studying the patch, it appears that the `extension_directory` is searched for *all* shared libraries, not just those being loaded for an extension. Am I reading the `expand_dynamic_library_name()` function right? If so, this seems like a good way for a bad actor to muck with things, by putting an exploited libpgtypes library into the extension directory, where it would be loaded in preference to the core libpgtypes library, if they couldn’t exploit the original. I’m thinking it would be better to have the dynamic library lookup for extension libraries (and LOAD libraries?) separate, so that the `extension_directory` would not be used for core libraries. This would also allow the lookup of extension libraries prefixed by the directory field from the control file, which would enable much tidier extension installation: The control file, SQL scripts, and DSOs could all be in a single directory for an extension. Thoughts? Best, David v1-0001-Add-extension_directory-GUC.patch Description: Binary data
Re: RFC: Additional Directory for Extensions
On Apr 3, 2024, at 8:54 AM, David E. Wheeler wrote: > Yes, I like the suggestion to make it require a restart, which lets the > sysadmin control it and not limited to whatever the person who compiled it > thought would make sense. Or SIGHUP? D
Re: RFC: Additional Directory for Extensions
On Apr 3, 2024, at 3:57 AM, walt...@technowledgy.de wrote: > I can also imagine that it would be very helpful in a container setup to be > able to set an environment variable with this path instead of having to > recompile all of postgres to change it. Yes, I like the suggestion to make it require a restart, which lets the sysadmin control it and not limited to whatever the person who compiled it thought would make sense. Best, David
RFC: Additional Directory for Extensions
Hackers, In the Security lessons from liblzma thread[1], walther broached the subject of an extension directory path[1]: > Also a configurable directoy to look up extensions, possibly even to be > changed at run-time like [2]. The patch says this: > >> This directory is prepended to paths when loading extensions (control and >> SQL files), and to the '$libdir' directive when loading modules that back >> functions. The location is made configurable to allow build-time testing of >> extensions that do not have been installed to their proper location yet. > > This seems like a great thing to have. This might also be relevant in > light of recent discussions in the ecosystem around extension management. That quotation comes from this Debian patch[2] maintained by Christoph Berg. I’d like to formally propose integrating this patch into the core. And not only because it’s overhead for package maintainers like Christoph, but because a number of use cases have emerged since we originally discussed something like this back in 2013[3]: Docker Immutability --- Docker images must be immutable. In order for users of a Docker image to install extensions that persist, they must create a persistent volume, map it to SHAREDIR/extensions, and copy over all the core extensions (or muck with symlink magic[4]). This makes upgrades trickier, because the core extensions are mixed in with third party extensions. By supporting a second directory pretended to the list of directories to search, as the Debian patch does, users of Docker images can keep extensions they install separate from core extensions, in a directory mounted to a persistent volume with none of the core extensions. Along with tweaking dynamic_library_path to support additional directories for shared object libraries, which can also be mounted to a separate path, we can have a persistent and clean separation of immutable core extensions and extensions installed at runtime. Postgres.app The Postgres.app project also supports installing extensions. However, because they must go into the SHAREDIR/extensions, once a user installs one the package has been modified and the Apple bundle signature will be broken. The OS will no longer be able to validate that the app is legit. If the core supported an additional extension (and PKGLIBDIR), it would allow an immutable PostgreSQL base package and still allow extensions to be installed into directories outside the app bundle, and thus preserve bundle signing on macOS (and presumably other systems --- is this the nix issue, too?) RFC --- I know there was some objection to changes like this in the past, but the support I’m seeing in the liblzma thread for making pkglibdir configurable me optimistic that this might be the right time to support additional configuration for the extension directory, as well, starting with the Debian patch, perhaps. Thoughts? I would be happy to submit a clean version of the Debian patch[2]. Best, David [1] https://www.postgresql.org/message-id/99c41b46-616e-49d0-9ffd-a29432cec818%40technowledgy.de [2] https://salsa.debian.org/postgresql/postgresql/-/blob/17/debian/patches/extension_destdir?ref_type=heads [3] https://www.postgresql.org/message-id/flat/51ae0845.8010...@ocharles.org.uk [4] https://speakerdeck.com/ongres/postgres-extensions-in-kubernetes?slide=14
Re: Security lessons from liblzma
On Apr 1, 2024, at 06:55, walt...@technowledgy.de wrote: > Also a configurable directoy to look up extensions, possibly even to be > changed at run-time like [2]. The patch says this: > >> This directory is prepended to paths when loading extensions (control and >> SQL files), and to the '$libdir' directive when loading modules that back >> functions. The location is made configurable to allow build-time testing of >> extensions that do not have been installed to their proper location yet. > > This seems like a great thing to have. This might also be relevant in light > of recent discussions in the ecosystem around extension management. > > All the path-related issues have in common, that while it's easy to move > files around to their proper locations later, they all need to adjust > pg_config's output. Funny timing, I was planning to resurrect this old patch[1] and propose that patch this week. One of motivators is the increasing use of Docker images in Kubernetes to run Postgres, where there’s a desire to keep the core service and extensions immutable, and to have a second directory mounted to a persistent volume into which other extensions can be installed and preserved independently of the Docker image. The current approach involves symlinking shenanigans[2] that complicate things pretty substantially, making it more difficult to administer. A second directory fit for purpose would be far better. There are some other motivators, so I’ll do some additional diligence and start a separate thread (or reply to the original[3]). Best, David [1] https://commitfest.postgresql.org/5/170/ [2] https://speakerdeck.com/ongres/postgres-extensions-in-kubernetes?slide=14 [3] https://www.postgresql.org/message-id/flat/51AE0845.8010600%40ocharles.org.uk
Re: Patch: Add parse_type Function
On Mar 20, 2024, at 17:23, Tom Lane wrote: > Pushed with some editorialization. Mostly, I whacked the > documentation around pretty heavily: we have a convention for what > examples in function descriptions should look like, and this wasn't > it. Not entirely your fault, since some nearby entries in that > table hadn't gotten the word either. Ah, great, and your wording on the parser error issue is much better, thank you! Best, David
Re: Q: Escapes in jsonpath Idents
On Mar 17, 2024, at 20:09, Erik Wienhold wrote: > > On 2024-03-17 20:50 +0100, David E. Wheeler wrote: >> On Mar 17, 2024, at 15:12, Erik Wienhold wrote: >>> So I think it makes sense to reword the entire backslash part of the >>> paragraph and remove references to JSON entirely. The attached patch >>> does that and also formats the backslash escapes as a bulleted list for >>> readability. >> >> Ah, it’s JavaScript format, not JSON! This does clarify things quite >> nicely, thank you. Happy to add my review once it’s in a commit fest. > > Thanks. https://commitfest.postgresql.org/48/4899/ Applies cleanly, `make -C doc/src/sgml check` runs without error. Doc improvement welcome and much clearer than before. > I had the same reasoning while writing my first reply but scrapped that > part because I found it obvious: That jsonpath is parsed before calling > jsonb_path_exists() and therefore the parser has no context about any > variables, which might not even be hardcoded but may result from a > query. Right, there’s a chicken/egg problem. > Unfortunately, I don't have access to that part of the SQL spec. So I > don't know how the jsonpath grammar is specified. Seems quite logical; I think it should be documented, but I’d also be interested to know what the 2016 and 2023 standards say, exactly. > Also checked git log src/backend/utils/adt/jsonpath_scan.l for some > insights but haven't found any yet. Everybody’s taking shortcuts relative to the standard, AFAICT. For example, jsonpath_scan.l matches unqouted identifiers with these two regular expressions: {other}+ \/\* \\. Plus the backslash escapes. {other} is defined as: /* "other" means anything that's not special, blank, or '\' or '"' */ other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f] Which is wy more liberal than the ECMA standard[1], by my reading, but the MSDN[2] description is quite succinct (thanks for the links!): > In JavaScript, identifiers are commonly made of alphanumeric characters, > underscores (_), and dollar signs ($). Identifiers are not allowed to start > with numbers. However, JavaScript identifiers are not only limited to ASCII — > many Unicode code points are allowed as well. Namely, any character in the > ID_Start category can start an identifier, while any character in the > ID_Continue category can appear after the first character. ID_Start[3] and ID_Continue[4] point to the unicode standard codes lister, nether of which reference Emoji. Sure enough, in Safari: > x = {"": true} < {: true} > x. < SyntaxError: Invalid character '\ud83c’ But in Postgres jsonpath: david=# select '$.'::jsonpath; jsonpath -- $."" If the MSDN references to ID_Start and ID_Continue are correct, then the Postgres path parser is being overly-liberal. Maybe that’s totally fine? Not sure what should be documented and what’s not worth it. Aside: I’m only digging into these details because I’m busy porting the path parser, so trying to figure out where to be compatible and where not to. So far I’m rejecting '$' (but allowing '\$' and '\u0024') but taking advantage of the unicode support in Go to specifically validate against ID_Start and ID_Continue. Best, David [1] https://262.ecma-international.org/#sec-identifier-names [2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers [3] https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Start%7D [4] https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Continue%7D
Re: Q: Escapes in jsonpath Idents
On Mar 17, 2024, at 15:12, Erik Wienhold wrote: > Hi David, Hey Erik. Thanks for the detailed reply and patch! > So I think it makes sense to reword the entire backslash part of the > paragraph and remove references to JSON entirely. The attached patch > does that and also formats the backslash escapes as a bulleted list for > readability. Ah, it’s JavaScript format, not JSON! This does clarify things quite nicely, thank you. Happy to add my review once it’s in a commit fest. > The first case ($.$foo) is in line with the restriction on member > accessors that you quoted first. Huh, that’s now how I read it. Here it is again: >> Member accessor that returns an object member with the specified >> key. If the key name matches some named variable starting with $ or >> does not meet the JavaScript rules for an identifier, it must be >> enclosed in double quotes to make it a string literal. Note that in my example `$foo` does not match a variable. I mean it looks like a variable, but none is used here. I guess it’s being conservative because it might be used in one of the functions, like jsonb_path_exists(), to which variables might be passed. > The error message 'syntax error at or near "$oo" of jsonpath input' for > the second case ($.f$oo), however, looks as if the scanner identifies > '$oo' as a variable instead of contiuing the scan of identifier (f$oo) > for the member accessor. Looks like a bug to me because a variable > doesn't even make sense in that place. Right. Maybe the docs should be updated to say that a literal dollar sign isn’t supported in identifiers, unlike in JavaScript, except through escapes like this: > What works though, besides double quoting, is escaping the dollar sign: > > regress=# select '$.\u0024foo'::jsonpath; > jsonpath > -- > $."$foo" > (1 row) > > And we've come full circle :) Best, David
Re: Q: Escapes in jsonpath Idents
On Mar 16, 2024, at 14:39, David E. Wheeler wrote: > I went looking for the JavaScript rules for an identifier and found this in > the MDN docs[2]: > >> In JavaScript, identifiers can contain Unicode letters, $, _, and digits >> (0-9), but may not start with a digit. An identifier differs from a string >> in that a string is data, while an identifier is part of the code. In >> JavaScript, there is no way to convert identifiers to strings, but sometimes >> it is possible to parse strings into identifiers. Coda: Dollar signs don’t work at all outside double-quoted string identifiers: david=# select '$.$foo'::jsonpath; ERROR: syntax error at or near "$foo" of jsonpath input LINE 1: select '$.$foo'::jsonpath; ^ david=# select '$.f$oo'::jsonpath; ERROR: syntax error at or near "$oo" of jsonpath input LINE 1: select '$.f$oo'::jsonpath; ^ david=# select '$."$foo"'::jsonpath; jsonpath -- $."$foo" This, too, contradicts the MDM definition an identifier (and some quick browser tests). Best, David
Q: Escapes in jsonpath Idents
Hackers, The jsonpath doc[1] has an excellent description of the format of strings, but for unquoted path keys, it simply says: > Member accessor that returns an object member with the specified key. If the > key name matches some named variable starting with $ or does not meet the > JavaScript rules for an identifier, it must be enclosed in double quotes to > make it a string literal. I went looking for the JavaScript rules for an identifier and found this in the MDN docs[2]: > In JavaScript, identifiers can contain Unicode letters, $, _, and digits > (0-9), but may not start with a digit. An identifier differs from a string in > that a string is data, while an identifier is part of the code. In > JavaScript, there is no way to convert identifiers to strings, but sometimes > it is possible to parse strings into identifiers. However, the Postgres parsing of jsonpath keys appears to follow the same rules as strings, allowing backslash escapes: david=# select '$.fo\u00f8 == $x'::jsonpath; jsonpath --- ($."foø" == $"x") This would seem to contradict the documentation. Is this behavior required by the SQL standard? Do the docs need updating? Or should the code actually follow the JSON identifier behavior? Thanks, David PS: Those excellent docs on strings mentions support for \v, but the grammar in the right nav of https://www.json.org/json-en.html does not. Another bonus feature? [1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH [2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier
Re: Patch: Add parse_type Function
On Mar 7, 2024, at 23:39, Erik Wienhold wrote: > I think you need to swap the examples. The text mentions the error case > first and the NULL case second. 臘♂️ Thanks, fixed in the attached patch. David v11-0001-Add-to_regtypemod-SQL-function.patch Description: Binary data
Re: Patch: Add parse_type Function
Hello Hackers, On Feb 25, 2024, at 13:00, David E. Wheeler wrote: >> postgres=# SELECT to_regtypemod('timestamp(-4)'); >> ERROR: syntax error at or near "-" >> LINE 1: SELECT to_regtypemod('timestamp(-4)'); >> ^ >> CONTEXT: invalid type name "timestamp(-4)" >> >> postgres=# SELECT to_regtypemod('text(-4)'); >> ERROR: type modifier is not allowed for type "text" > > Yeah, there was quite a bit of discussion of this issue back in September[1]. > >> This behaviour is mentioned in the documentation, so I'd say it is ok. > > This is my attempt to make it clearer that it can return an error, but I > don’t love the wording TBH. I’ve rebased the patch and, in an attempt to clarify this behavior, added a couple of examples to the docs for to_regtype. Updated patch attached. Best, David v10-0001-Add-to_regtypemod-SQL-function.patch Description: Binary data
Re: An improved README experience for PostgreSQL
On Feb 28, 2024, at 1:51 PM, Alvaro Herrera wrote: > *IF* people don't go overboard, yes. I agree, but let's keep an eye so > that it doesn't become an unreadable mess. I've seen some really > horrible markdown files that I'm sure most of you would object to. Markdown++ IME the keys to decent-looking Markdown are: 1. Wrapping lines to a legible width (76-80 chars) 2. Link references rather than inline links I try to follow these for my blog; posts end up looking like this: https://justatheory.com/2024/02/extension-metadata-typology.text (Append `.text` to any post to see the raw(ish) Markdown. Best, David
Re: Patch: Add parse_type Function
On Feb 24, 2024, at 19:11, Jim Jones wrote: >> What’s the protocol for marking a patch ready for committer? > > I guess after the review of the last assigned reviewer Oh, I didn’t realize someone was assigned. :-) > The fact that a completely invalid type returns NULL .. > > SELECT to_regtypemod('foo'); > to_regtypemod > --- > > (1 row) > > > .. but a "partially" valid one returns an error might be confusing > > postgres=# SELECT to_regtypemod('timestamp(-4)'); > ERROR: syntax error at or near "-" > LINE 1: SELECT to_regtypemod('timestamp(-4)'); > ^ > CONTEXT: invalid type name "timestamp(-4)" > > postgres=# SELECT to_regtypemod('text(-4)'); > ERROR: type modifier is not allowed for type "text" Yeah, there was quite a bit of discussion of this issue back in September[1]. > This behaviour is mentioned in the documentation, so I'd say it is ok. This is my attempt to make it clearer that it can return an error, but I don’t love the wording TBH. > I would personally prefer either NULL or an error in both cases, but I > can totally live with the current design. SAME. Best, David [1] https://www.postgresql.org/message-id/flat/09f9cad6-5096-43cc-b6a7-685703e47...@justatheory.com
Re: Patch: Add parse_type Function
On Feb 21, 2024, at 19:13, David E. Wheeler wrote: > Thanks. Anyone else? Or is it ready for committer? What’s the protocol for marking a patch ready for committer? Thanks, David
Re: Patch: Add parse_type Function
On Feb 21, 2024, at 17:19, Erik Wienhold wrote: > Thanks David! LGTM. Thanks. Anyone else? Or is it ready for committer? Best, David
Re: Patch: Add parse_type Function
On Feb 21, 2024, at 16:43, Erik Wienhold wrote: > The docs still state that to_regtypemod() has a named parameter, which > is not the case per pg_proc.dat. Bah, I cleaned it up before but somehow put it back. Thanks for the catch. Fixed. Best, David v9-0001-Add-to_regtypemod-SQL-function.patch Description: Binary data
Re: to_regtype() Raises Error
On Feb 21, 2024, at 11:54 AM, David Wheeler wrote: > Merged this change into the [to_regtypemod > patch](https://commitfest.postgresql.org/47/4807/), which has exactly the > same issue. > > The new status of this patch is: Needs review Bah, withdrawn. D
Re: Patch: Add parse_type Function
On Feb 21, 2024, at 11:18, Erik Wienhold wrote: > Thanks. But it's an applefile again, not a patch :P WTF. I still have that feature disabled. Oh, I think I deleted the file after dragged it into Mail but before sending, because it’s empty everywhere I look. 臘♂️ Let’s try that again. Best, David v8-0001-Add-to_regtypemod-SQL-function.patch Description: Binary data
Re: Patch: Add parse_type Function
On Feb 20, 2024, at 21:09, Erik Wienhold wrote: > The references are printed as "???" right now. Can be fixed with > xreflabel="format_type" and xreflabel="to_regtype" on those > elements. Thanks. > The docs show parameter name "type" but pg_proc.data does not define > proargnames. So the to_regtypemod() cannot be called using named > notation: > > test=> select to_regtypemod(type => 'int'::text); > ERROR: function to_regtypemod(type => text) does not exist Yes, this format is identical to that of to_regtype(): david=# select to_regtype(type => 'int'::text); ERROR: function to_regtype(type => text) does not exist > +Parses a string of text, extracts a potential type name from it, and >> +translates its typmod, the modifier for the type, if any. Failure to > > s/typmod, the modifier of the type/type modifier/ > > Because format_type() already uses "type modifier" and I think it helps > searchability to use the same wording. Yes, definitely better wording, thanks. V8 attached. Best, David v8-0001-Add-to_regtypemod-SQL-function.patch Description: application/applefile
Re: to_regtype() Raises Error
On Feb 5, 2024, at 09:01, David E. Wheeler wrote: > Ah, thank you. Updated patch attached. I’ve moved this patch into the to_regtype patch thread[1], since it exhibits the same behavior. Best, David [1] https://www.postgresql.org/message-id/60ef4e11-bc1c-4034-b37b-695662d28...@justatheory.com
Re: Patch: Add parse_type Function
On Feb 20, 2024, at 01:30, jian he wrote: > the second hint `-- grammar error expected` seems to contradict with > the results? Quite right, thank you, that’s actually a trapped error. I’ve tweaked the comments and their order in v7, attached. This goes back to the discussion of the error raising of to_regtype[1], so I’ve incorporated the patch from that thread into this patch, and set up the docs for to_regtypemod() with similar information. The wording is still a little opaque for my taste, though, written more for someone who knows a bit about the internals, but it’s a start. I’ve also fixed the wayward parameter in the function signature in the docs, and added a note about why I’ve also patched genbki.pl. Best, David [1] https://www.postgresql.org/message-id/flat/57E1FDDC-5A38-452D-82D7-A44DA2E13862%40justatheory.com#1ae0b11634bc33c7ad3cd728e43d504e v7-0001-Add-to_regtypemod-SQL-function.patch Description: Binary data
Re: Patch: Add parse_type Function
On Feb 19, 2024, at 21:58, Erik Wienhold wrote: > See the patch I wrote for my benchmarks. But it's pretty easy anyway to > cut down parse_type() ;) LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached. > But you don't actually need reformat_type() in pgTAP. You can just get > the type OID and modifier of the want_type and have_type and compare > those. Then use format_type() for the error message. Looks a bit > cleaner to me than doing the string comparison. Fair. > On second thought, I guess comparing the reformatted type names is > necessary in order to have a uniform API on older Postgres releases > where pgTAP has to provide its own to_regtypmod() based on typmodin > functions. Maybe. Worth playing with. >> For the latter, it could easily be an example in the docs. > > Can be mentioned right under format_type(). Well I included it in the to_regtypemod docs here, but could so either. Best, David v6-0001-Add-to_regtypemod-SQL-function.patch Description: Binary data
Re: Patch: Add parse_type Function
On Feb 19, 2024, at 15:47, Tom Lane wrote: >> 1. Add a to_regtypmod() for those who just want the typemod. > > Seems like there's a good case for doing that. I’ll work on that. > I'm less thrilled about that, mainly because I can't think of > a good name for it ("format_type_string" is certainly not that). > Is the use-case for this functionality really strong enough that > we need to provide it as a single function rather than something > assembled out of spare parts? Not essential for pgTAP, no, as we can just use the parts. It was the typmod bit that was missing. On Feb 19, 2024, at 17:13, Tom Lane wrote: > After some time ruminating, a couple of possibilities occurred to > me: > reformat_type(text) > canonical_type_name(text) I was just thinking about hitting the thesaurus entry for “canonical”, but I quite like reformat_type. It’s super clear and draws the parallel to format_type() more clearly. Will likely steal the name for pgTAP if we don’t add it to core. :-) > I'm still unconvinced about that, though. I guess the questions are: * What are the other use cases for it? * How obvious is it how to do it? For the latter, it could easily be an example in the docs. Best, David
Re: Patch: Add parse_type Function
On Feb 18, 2024, at 15:55, Erik Wienhold wrote: >> The overhead of parse_type_and_format can be related to higher planning >> time. PL/pgSQL can assign composite without usage FROM clause. > > Thanks, didn't know that this makes a difference. In that case both > variants are on par. Presumably this is a side-effect of the use of a RECORD here, which requires a FROM clause in pure SQL, yes? The only way I can think of to avoid that is to: 1. Add a to_regtypmod() for those who just want the typemod. 2. Add a format_type_string() function that returns a string, the equivalent of this function but in C: CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$ SELECT format_type(to_regtype($1), to_regtypmod($1)) $$; We could also keep the record-returning function for users who want both the regtype and the regytypemod at once, but with the other two I consider it less essential. Thoughts? Best, David
Re: Patch: Add parse_type Function
On Feb 12, 2024, at 15:55, David E. Wheeler wrote: > Happy to move it wherever makes the most sense. Update with a bunch of the suggested changes. Some questions still open from the previous post, though. Best, David v5-0001-Add-parse_type-SQL-function.patch Description: Binary data
Re: Patch: Add parse_type Function
On Feb 12, 2024, at 12:53 PM, Tom Lane wrote: > "David E. Wheeler" writes: >> [ v4-0001-Add-parse_type-SQL-function.patch ] > > It strikes me that this is basically to_regtype() with the additional > option to return the typmod. That leads to some questions: Huh. I saw it more on a par with format_type(), at least in the output I expect. > * Should we choose a name related to to_regtype? I don't have any > immediate suggestions, but since you didn't seem entirely set on > parse_type, maybe it's worth thinking along those lines. OTOH, > to the extent that people would use this with format_type, maybe > parse_type is fine. Yeah, and the `to_*` functions return a single OID, not two fields. > * Perhaps the type OID output argument should be declared regtype > not plain OID? It wouldn't make any difference for passing it to > format_type, but in other use-cases perhaps regtype would be more > readable. It's not a deal-breaker either way of course, since > you can cast oid to regtype or vice versa. Sure. I used `oid` because that’s exactly what the argument to format_type() is called, so thought the parity there more appropriate. Maybe its argument should be renamed to regtype? > * Maybe the code should be in adt/regproc.c not format_type.c. Happy to move it wherever makes the most sense. > * Experience with to_regtype suggests strongly that people would > prefer "return NULL" to failure for an unrecognized type name. Could do, but as with to_regtype() there’s an issue with error handling when it descends into the SQL parser. > Skimming the patch, I notice that the manual addition to > builtins.h should be unnecessary: the pg_proc.dat entry > should be enough to create an extern in fmgrprotos.h. Confirmed, will remove in next patch. > Also, I'm pretty sure that reformat_dat_file.pl will > think your pg_proc.dat entry is overly verbose. See > https://www.postgresql.org/docs/devel/system-catalog-initial-data.html There are a bunch of things it reformats: ❯ git diff --name-only src/include/catalog/pg_aggregate.dat src/include/catalog/pg_database.dat src/include/catalog/pg_proc.dat src/include/catalog/pg_type.dat src/include/utils/builtins.h And even in pg_proc.dat there are 13 blocks it reformats. I can submit with just my block formatted if you’d prefer. > BTW, another way that this problem could be approached is to use > to_regtype() as-is, with a separate function to obtain the typmod: > > select format_type(to_regtype('timestamp(4)'), to_regtypmod('timestamp(4)')); Oh interesting. > This is intellectually ugly, since it implies parsing the same > typename string twice. But on the other hand it avoids the notational > pain and runtime overhead involved in using a record-returning > function. So I think it might be roughly a wash for performance. > Question to think about is which way is easier to use. I don't > have an opinion particularly; just throwing the idea out there. Honestly I haven’t cared for the fact that parse_type() returns a record; it makes it a bit clunky. But yeah, so does this to pass the same value to two function calls. Perhaps it makes sense to add to_regtypmod() as you suggest, but then also something like: CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$ SELECT format_type(to_regtype($1), to_regtypmod($1)) $$; Best, David
Re: Patch: Add parse_type Function
On Feb 10, 2024, at 20:52, Erik Wienhold wrote: > > Let me comment on some issues since I wrote the very first version of > parse_type() on which David's patch is based. Thanks Erik. >> On 2024-02-01 01:00 +0100, jian he wrote: >> if you are wondering around other code deal with NULL, ErrorSaveContext. >> NULL would be more correct? >> `(void) parseTypeString(type, , , NULL);` Fixed. >> also parseTypeString already explained the third argument, our >> comments can be simplified as: >> /* >> * Parse type-name argument to obtain type OID and encoded typmod. We don't >> * need to handle parseTypeString failure, just let the error be >> * raised. >> */ Thanks, adopted that language. >> cosmetic issue. Looking around other error handling code, the format >> should be something like: >> ` >> if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE) >>ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >>errmsg("function returning record called in" >> "context that cannot accept type record"))); >> ` > > +1 I poked around and found some other examples, yes. I went with a single long line for errmsg following the example in adminpack.c[1] >> `PG_FUNCTION_INFO_V1(parse_type);` >> is unnecessary? >> based on the error information: https://cirrus-ci.com/task/5899457502380032 >> not sure why it only fails on windows. > > Yeah, it's not necessary for internal functions per [1]. It's a > leftover from when this function was part of the pgtap extension. Removed. >> +#define PARSE_TYPE_STRING_COLS 2 /* Returns two columns. */ >> +#undef PARSE_TYPE_STRING_COLS >> Are these necessary? >> given that the comments already say return the OID and type modifier. > > Not sure what the convention here is but I found this in a couple of > places, maybe even a tutorial on writing C functions. See > `git grep '_COLS\s\+[1-9]'` for those instances. It's in line with my > habit of avoiding magic constants. Left in place for now. > >> +( typid oid, >> + typmod int4 ) >> here `int4` should be `integer`? > > +1 Fixed. > Yes, the sentence is rather handwavy. What is meant here is that this > function also resolves types whose typmod is determined by the SQL > parser or some step after that. There are types whose typmod is > whatever value is found inside the parenthesis, e.g. bit(13) has typmod > 13. Our first idea before coming up with that function was to do some > string manipulation and extract the typmod from inside the parenthesis. > But we soon found out that other typmods don't translate one to one, > e.g. varchar(13) has typmod 17. The interval type is also special > because the typmod is some bitmask encoding of e.g. 'second(0)'. Hence > the mention of the SQL grammar. I adopted some of your language here --- and fixed the spelling errors :-) >> can be simplified: >> + >> +Parses a string representing an SQL data type, optionally >> schema-qualified. >> +Returns a record with two fields, typid and >> +typmod, representing the OID and >> modifier for the >> +type. These correspond to the parameters to pass to the >> +format_type >> function. >> + >> (I don't have a strong opinion though) > > Sounds better. The CREATE TABLE reference is superfluous. Done. Best, David [1] https://github.com/postgres/postgres/blob/09eb633e1baa3b7cd7929f3cc77f9c46f63c20b1/contrib/adminpack/adminpack.c#L508-L511 v4-0001-Add-parse_type-SQL-function.patch Description: Binary data
Re: Patch: Add parse_type Function
On Feb 5, 2024, at 10:47 AM, Jim Jones wrote: > The patch applies cleanly and it no longer crashes with NULL parameters. > Docs render fine and regression tests pass. I have nothing more to add. > Let's now wait for Pavel's review. Much obliged! D
Re: Patch: Add parse_type Function
On Feb 5, 2024, at 09:49, Jim Jones wrote: > Yes, if the function was strict (which in the current design is not the > case) there is no need to check for nulls. Right, done, and cleaned up the redundant comments. Best, David v3-0001-Add-parse_type-SQL-function.patch Description: Binary data
Re: Patch: Add parse_type Function
On Feb 5, 2024, at 08:02, Jim Jones wrote: > There are a few issues though: Excellent, thank you very much! Updated patch attached. Best, David v2-0001-Add-parse_type-SQL-function.patch Description: Binary data
Re: to_regtype() Raises Error
On Feb 4, 2024, at 19:11, Erik Wienhold wrote: > Here "extracted names" should be "extracted name" (singular). > Otherwise, the text looks good. Ah, thank you. Updated patch attached. Best, David v2-0001-Update-to_regtype-docs-regarding-error-condition.patch Description: Binary data
Re: to_regtype() Raises Error
On Feb 2, 2024, at 15:33, David E. Wheeler wrote: > Anyway, I’m happy to submit a documentation patch along the lines you > suggested. How’s this? --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25460,11 +25460,12 @@ SELECT collation for ('foo' COLLATE "de_DE"); regtype -Translates a textual type name to its OID. A similar result is +Parses a string of text, extracts a potential type name from it, and +translates that name into an OID. A similar result is obtained by casting the string to type regtype (see -); however, this function will return -NULL rather than throwing an error if the name is -not found. +). Failure to extract a valid potential +type name results in an error; however, if the extracted names is not +known to the system, this function will return NULL. Does similar wording need to apply to other `to_reg*` functions? Best, David v1-0001-Update-to_regtype-docs-regarding-error-condition.patch Description: Binary data
Re: Patch: Add parse_type Function
On Feb 4, 2024, at 13:52, Pavel Stehule wrote: > not yet, but I'll do it Nice, thank you. I put it into the Commitfest. https://commitfest.postgresql.org/47/4807/ Best, David
Re: Patch: Add parse_type Function
On Feb 4, 2024, at 13:02, Pavel Stehule wrote: > I thinks so proposed functionality can be useful Great, thank you! Is that a review? :-) D
Patch: Add parse_type Function
Hackers, Attached is a patch to add a new function, `parse_type()`. It parses a type string and returns a record with the typid and typmod for the type, or raises an error if the type is invalid. It’s effectively a thin layer over the parser’s parseTypeString() function. The purpose of this function is to allow uses to resolve any type name, including aliases, to its formal name as rendered by, for example, pg_dump. An example: david=# WITH s(s) AS ( SELECT * FROM unnest(ARRAY[ 'timestamp(4)', 'interval(0)', 'interval second(0)', 'timestamptz', 'timestamptz(6)', 'varchar', 'varchar(128)' ]) ), p(type, typid, typmod) AS ( SELECT s, ((parse_type(s)).*) FROM s ) SELECT type, format_type(typid, typmod) FROM p; type| format_type + timestamp(4) | timestamp(4) without time zone interval(0)| interval(0) interval second(0) | interval second(0) timestamptz| timestamp with time zone timestamptz(6) | timestamp(6) with time zone varchar| character varying varchar(128) | character varying(128) (7 rows) The use case leading to this patch was pgTAP column data type assertions. pgTAP traditionally required full type names for testing data types, but recently added support for aliases. This broke for some types, because it was difficult to extract the typmod correctly, and even when we did, it failed for types such as `interval second(0)`, where `second (0)` is the typmod, and there was no sensible way to access the bit mask to generate the proper typmod to pass to `format_type()`. Erik Wienhold wrote this function to solve the problem, and I volunteered to submit a patch. Potential issues and questions for this patch: * Is `parse_type()` the right name to use? I can see arguments for `parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`. Whatever the consensus is works for me. * The function returns a record, which is a little fussy. I could see adding a `format_type_string()` function that just contains `SELECT format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted type. But I think it might also be useful to have access to the typmod and typid directly via this method, since they’re already exposed as `format_type()`’s arguments. * Is format_type.c the right home for the function? I put it there because I closely associate it with format_type(), but happy to move it to a more appropriate location. * I tried to link to `format_type` from the docs entry for `parse_type`, and added an ID for the former, but I’m not sure it resolves. Best, David v1-0001-Add-parse_type-SQL-function.patch Description: Binary data
Re: to_regtype() Raises Error
On Feb 2, 2024, at 3:23 PM, David G. Johnston wrote: > Seems like most just want to leave well enough alone and deal with the rare > question for oddball input on the mailing list. If you are interested enough > to come back after 4 months I'd suggest you write up and submit a patch. I'm > happy to review it and see if that is enough to get a committer to respond. LOL, “interested enough” is less the right term than “triaging email backlog and following up on a surprisingly controversial question.” I also just like to see decisions made and issues closed one way or another. Anyway, I’m happy to submit a documentation patch along the lines you suggested. D
Re: to_regtype() Raises Error
Hey there, coming back to this. I poked at the logs in the master branch and saw no mention of to_regtype; did I miss it? On Sep 17, 2023, at 10:58 PM, David G. Johnston wrote: > Parses a string of text, extracts a potential type name from it, and > translates that name into an OID. Failure to extract a valid potential type > name results in an error while a failure to determine that the extracted name > is known to the system results in a null output. > > I take specific exception to describing your example as a “textual type name”. More docs seem like a reasonable compromise. Perhaps it’d be useful to also describe when an error is likely and when it’s not. Best, David
Re: Bug: The "directory" control parameter does not work
On Jan 26, 2024, at 5:05 PM, David E. Wheeler wrote: > I made a simpler test case here: > > https://github.com/theory/test-extension-directory > > Not the difference between actual and expected output. Bah! Need to also set `MODULEDIR = extension/click` and then it works. D
Re: Bug: The "directory" control parameter does not work
On Jan 26, 2024, at 4:40 PM, David E. Wheeler wrote: > But it doesn’t seem to work. I tried this experiment: I made a simpler test case here: https://github.com/theory/test-extension-directory Not the difference between actual and expected output. Best, David
Bug: The "directory" control parameter does not work
Hackers, I wanted to try to customize the subdirectory for an extension. The docs[1] say: > directory (string) > > The directory containing the extension's SQL script file(s). Unless an > absolute path is given, the name is relative to the installation's SHAREDIR > directory. The default behavior is equivalent to specifying directory = > 'extension'. [1]: https://www.postgresql.org/docs/16/extend-extensions.html But it doesn’t seem to work. I tried this experiment: ❯ pg_config --version PostgreSQL 16.1 ❯ git clone g...@github.com:theory/kv-pair.git ❯ cd kv-pair ❯ echo directory = 'blah' >> pair.control ❯ cat pair.control # pair extension comment = 'A key/value pair data type' default_version = '0.1.2' module_pathname = '$libdir/pair' relocatable = true directory = blah ❯ make install cp sql/pair.sql sql/pair--0.1.2.sql /opt/homebrew/bin/gmkdir -p '/Users/david/.pgenv/pgsql-16.1/share/extension' /opt/homebrew/bin/gmkdir -p '/Users/david/.pgenv/pgsql-16.1/share/extension' /opt/homebrew/bin/gmkdir -p '/Users/david/.pgenv/pgsql-16.1/share/doc/extension' /opt/homebrew/bin/ginstall -c -m 644 .//pair.control '/Users/david/.pgenv/pgsql-16.1/share/extension/' /opt/homebrew/bin/ginstall -c -m 644 .//sql/pair--unpackaged--0.1.2.sql '/Users/david/.pgenv/pgsql-16.1/share/extension/' /opt/homebrew/bin/ginstall -c -m 644 .//doc/pair.md '/Users/david/.pgenv/pgsql-16.1/share/doc/extension/‘ ❯ find ~/.pgenv/pgsql-16.1/ -name '*pair*.*' /Users/david/.pgenv/pgsql-16.1//include/server/lib/pairingheap.h /Users/david/.pgenv/pgsql-16.1//share/extension/pair--unpackaged--0.1.2.sql /Users/david/.pgenv/pgsql-16.1//share/extension/pair.control /Users/david/.pgenv/pgsql-16.1//share/doc/extension/pair.md I see two problems: 1. pair--0.1.2.sql is not installed. It does install without `directory` set. 2. pair--unpackaged--0.1.2.sql is installed, but not in a `blah` directory. Shouldn’t it be? Am I missing something? Best, David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 25, 2024, at 11:03, Tom Lane wrote: > I changed the preceding para to say "... check expressions are > required in ...", which I thought was sufficient to cover that. > Also, the tabular description of the operator tells you not to do it. Yeah, that’s good. I was perhaps leaning into being over-explicit after it took me a while to even figure out that there was a difference, let alone where matters. >> What do you think of also dropping the article from all the references to >> “the strict mode” or “the lax mode”, to make them “strict mode” and “lax >> mode”, respectively? > > Certainly most of 'em don't need it. I'll make it so. Nice, thanks! David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 24, 2024, at 16:32, Tom Lane wrote: > "David E. Wheeler" writes: > >> In any event, something to do with @@, perhaps to have some compatibility >> with `jsonb @> jsonb`? I don’t know why @@ was important to have. > > Yeah, that's certainly under-explained. But it seems like I'm not > getting traction for the idea of changing the behavior, so let's > go back to just documenting it. Curious about those discussions. On the one hand I find the distinction between the two behaviors to be odd, and to produce unexpected results when they’re not used in the proper context. It’s reminds me of the Perl idea of context, where functions behave differently in scalar and list context, and if you expect list behavior on scalar context you’re gonna get a surprise. This is a bit of a challenge for those new to the language, as they’re not necessarily aware of the context. > I spent some time going over your > text and also cleaning up nearby shaky English, and ended with v8 > attached. I'd be content to commit this if it looks good to you. This looks very nice, thank you. A couple of comments. > + > + Predicate check expressions are required in the > + @@ operator (and the > + jsonb_path_match function), and should not be > used > + with the @? operator (or the > + jsonb_path_exists function). > + > + > + I had this bit here: Conversely, non-predicate jsonpath expressions should not be used with the @@ operator (or the jsonb_path_match function). I think it’s important to let people know what the difference is in the behavior of the two forms, in every spot it’s likely to come up. SQL-standard JSON Path expressions should never be used in contexts (functions, operators) only designed to work with predicate check expressions, and the docs should say so IMO. > > -The lax mode facilitates matching of a JSON document structure and path > -expression if the JSON data does not conform to the expected schema. > +The lax mode facilitates matching of a JSON document and path > +expression when the JSON data does not conform to the expected schema. What do you think of also dropping the article from all the references to “the strict mode” or “the lax mode”, to make them “strict mode” and “lax mode”, respectively? Thanks for the review! Best, David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 21, 2024, at 14:58, David E. Wheeler wrote: > I make this interpretation based on this bit of the docs: Sorry, that’s from my branch. Here it is in master: A path expression can be a Boolean predicate, although the SQL/JSON standard allows predicates only in filters. This is necessary for implementation of the @@ operator. For example, the following jsonpath expression is valid in PostgreSQL: $.track.segments[*].HR 70 In any event, something to do with @@, perhaps to have some compatibility with `jsonb @> jsonb`? I don’t know why @@ was important to have. David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 21, 2024, at 14:52, David E. Wheeler wrote: > This is the only way the different behaviors make sense to me. @? expects a > set, not a boolean, sees there is an item in the set, so returns true: I make this interpretation based on this bit of the docs: PostgreSQL's implementation of the SQL/JSON path language has the following deviations from the SQL/JSON standard. Boolean Predicate Check Expressions As an extension to the SQL standard, a PostgreSQL path expression can be a Boolean predicate, whereas the SQL standard allows predicates only in filters. Where SQL standard path expressions return the relevant contents of the queried JSON value, predicate check expressions return the three-valued result of the predicate: true, false, or unknown. Compare this filter jsonpath expression: = select jsonb_path_query(:'json', '$.track.segments ?(@[*].HR 130)'); jsonb_path_query - {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"} To a predicate expression, which returns true = select jsonb_path_query(:'json', '$.track.segments[*].HR 130'); jsonb_path_query -- true Best, David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 21, 2024, at 14:43, Tom Lane wrote: > I don't entirely buy this argument --- if that is the interpretation, > of what use are predicate check expressions? It seems to me that we > have to consider them as being a shorthand notation for filter > expressions, or else they simply do not make sense as jsonpath. I believe it becomes pretty apparent when using jsonb_path_query(). The filter expression returns a set (using the previous \gset example): david=# select jsonb_path_query(:'json', '$.track.segments[*].HR ? (@ > 10)'); jsonb_path_query -- 73 135 (2 rows) The predicate check returns a boolean: david=# select jsonb_path_query(:'json', '$.track.segments[*].HR > 10'); jsonb_path_query -- true (1 row) This is the only way the different behaviors make sense to me. @? expects a set, not a boolean, sees there is an item in the set, so returns true: david=# select jsonb_path_query(:'json', '$.track.segments[*].HR > 1000'); jsonb_path_query -- false (1 row) david=# select :'json'::jsonb @? '$.track.segments[*].HR > 1000'; ?column? -- t (1 row) Best, David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 20, 2024, at 12:34, Tom Lane wrote: > It will take a predicate, but seems to always return true: > > regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] < 5' ; > ?column? > -- > t > (1 row) > > regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] > 5' ; > ?column? > -- > t > (1 row) Just for the sake of clarity, this return value is “correct,” because @? and other functions and operators that expect SQL standard statements evaluate the SET returned by the JSONPath statement, but predicate check expressions don’t return a set, but a always a single scalar value (true, false, or null). From the POV of the code expecting SQL standard JSONPath results, that’s a set of one. @? sees that the set is not empty so returns true. Best, David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 20, 2024, at 11:45, Tom Lane wrote: > You sure about that? It would surprise me if we could effectively use > a not-equal condition with an index. If it is only == that works, > then the preceding statement seems sufficient. I’m not! I just assumed it in the same way creating an SQL = operator automatically respects NOT syntax (or so I recall). In fiddling a bit, I can’t get it to use an index: CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL); \copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json | jq -c ".[]" | sed "s|||g"'; create index on movies using gin (movie); analyze movies; david=# explain analyze select id from movies where movie @? '$ ?(@.genre[*] != "Teen")'; QUERY PLAN - Seq Scan on movies (cost=0.00..3741.41 rows=4 width=4) (actual time=19.222..19.223 rows=0 loops=1) Filter: (movie @? '$?(@."genre"[*] != "Teen")'::jsonpath) Rows Removed by Filter: 36273 Planning Time: 1.242 ms Execution Time: 19.247 ms (5 rows) But that might be because the planner knows that the query is going to fetch most records, anyway. If I set most records to a single value: david=# update movies set movie = jsonb_set(movie, '{year}', '2020'::jsonb) where id < 3600; UPDATE 3599 david=# analyze movies; ANALYZE david=# explain analyze select id from movies where movie @? '$ ?(@.year != 2020)'; QUERY PLAN Seq Scan on movies (cost=0.00..3884.41 rows=32609 width=4) (actual time=0.065..43.730 rows=32399 loops=1) Filter: (movie @? '$?(@."year" != 2020)'::jsonpath) Rows Removed by Filter: 3874 Planning Time: 1.759 ms Execution Time: 45.368 ms (5 rows) Looks like it still doesn’t use the index with !=. Pity. Best, David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 20, 2024, at 12:34, Tom Lane wrote: > Surely we're not helping anybody by leaving that behavior in place. > Making it do something useful, throwing an error, or returning NULL > all seem superior to this. I observe that @@ returns NULL for the > path type it doesn't like, so maybe that's what to do here. I agree it would be far better for the behavior to be consistent, but frankly would like to see them raise an error. Ideally the hit would suggest the proper alternative operator or function to use, and maybe link to the docs that describe the difference between SQL-standard JSONPath and "predicate check expressions”, and how they have separate operators and functions. I think of them as practically different data types (and wish they were, TBH). It makes sense that passing a JSON containment expression[1] would raise an error; so should passing the wrong flavor of JSONPath. > BTW, jsonb_path_query_array and jsonb_path_query_first seem to > take both types of path, like jsonb_path_query, so ISTM they need > docs changes too. Happy to update the patch, either to add those docs or, if we change the behavior to return a NULL or raise an error, then with that information, instead. Best, David [1]: https://www.postgresql.org/docs/current/datatype-json.html#JSON-CONTAINMENT
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Jan 19, 2024, at 21:46, Erik Wienhold wrote: > Interesting... copy-pasting the entire \set command works for me with > psql 16.1 in gnome-terminal and tmux. Typing it out manually gives me > the "unterminated quoted string" error. Maybe has to do with my stty > settings. Yes, same on macOS Terminal.app and 16.1 compiled with readline. I didn’t realize that \set didn’t support newlines, because it works fine when you paste something with newlines. Curious. >> I experimented with >> >> SELECT ' >> ... multiline json value ... >> ' AS json >> \gexec >> >> but that didn't seem to work either. Anybody have a better idea? > > Fine with me (the \gset variant). Much cleaner TBH. david=# select '{ "track": { "segments": [ { "location": [ 47.763, 13.4034 ], "start time": "2018-10-14 10:05:14", "HR": 73 }, { "location": [ 47.706, 13.2635 ], "start time": "2018-10-14 10:39:21", "HR": 135 } ] } }'::jsonb as json; json {"track": {"segments": [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]}} (1 row) david=# \gset david=# select :'json'::jsonb; jsonb {"track": {"segments": [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]}} (1 row) So great! While you’re in there, Tom, would it make sense to fold in something like [this patch][1] I posted last month to clarify which JSONPath comparison operators can take advantage of a index? --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -513,7 +513,7 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; For these operators, a GIN index extracts clauses of the form accessors_chain -= constant out of +== constant out of the jsonpath pattern, and does the index search based on the keys and values mentioned in these clauses. The accessors chain may include .key, @@ -522,6 +522,9 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; The jsonb_ops operator class also supports .* and .** accessors, but the jsonb_path_ops operator class does not. +Only the == and != SQL/JSON Path Operators +can use the index. Best, David [1]: https://www.postgresql.org/message-id/0ece6b9c-cdde-4b65-be5a-49d737204...@justatheory.com
Re: GIN-Indexable JSON Patterns
On Dec 17, 2023, at 13:10, David E. Wheeler wrote: > Quick follow-up to my slew of questions back in [September][1]. I wanted to > update [my patch][2] to note that only JSON Path equality operators are > supported by indexes, as [previously discussed][3]. Should I just add it to the patch and let the reviews fall where they may? :-) Best, David [1]: https://www.postgresql.org/message-id/15dd78a5-b5c4-4332-acfe-55723259c...@justatheory.com [2]: https://commitfest.postgresql.org/45/4624/ [3]: https://www.postgresql.org/message-id/973d6495-cf28-4d06-7d46-758bd2615...@xs4all.nl
Re: JSON Path and GIN Questions
On Dec 17, 2023, at 16:08, Tom Lane wrote: > I'd waited because the discussion was still active, and then it > kind of slipped off the radar. I'll take another look and push > some form of what I suggested. Right on. > That doesn't really address the > jsonpath oddities you were on about, though. No, I attempted to address those in [a patch][1]. [1]: https://commitfest.postgresql.org/45/4624/ Best, David
GIN-Indexable JSON Patterns
Hey Hackers, Quick follow-up to my slew of questions back in [September][1]. I wanted to update [my patch][2] to note that only JSON Path equality operators are supported by indexes, as [previously discussed][3]. I thought perhaps adding a note to this bit of the docs would be useful: > For these operators, a GIN index extracts clauses of the form accessors_chain > = constant out of the jsonpath pattern, and does the index search based on > the keys and values mentioned in these clauses. The accessors chain may > include .key, [*], and [index] accessors. The jsonb_ops operator class also > supports .* and .** accessors, but the jsonb_path_ops operator class does not. But perhaps that’s what `accessors_chain = constant` is supposed to mean? I’m not super clear on it, though, since the operator is `==` and not `=` (and I would presume that `!=` would use the index, as well. Is that correct? If so, how would you feel about something like this? --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -513,7 +513,7 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; For these operators, a GIN index extracts clauses of the form accessors_chain -= constant out of +== constant out of the jsonpath pattern, and does the index search based on the keys and values mentioned in these clauses. The accessors chain may include .key, @@ -522,6 +522,9 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; The jsonb_ops operator class also supports .* and .** accessors, but the jsonb_path_ops operator class does not. +Only the == and != SQL/JSON Path Operators +can use the index. Best, David [1]: https://www.postgresql.org/message-id/15dd78a5-b5c4-4332-acfe-55723259c...@justatheory.com [2]: https://commitfest.postgresql.org/45/4624/ [3]: https://www.postgresql.org/message-id/973d6495-cf28-4d06-7d46-758bd2615...@xs4all.nl [4]: https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
Re: JSON Path and GIN Questions
On Sep 17, 2023, at 18:09, David E. Wheeler wrote: > I think this is useful, but also that it’s worth calling out explicitly that > functions do not count as indexable operators. True by definition, of course, > but I at least had assumed that since an operator is, in a sense, syntax > sugar for a function call, they are in some sense the same thing. > > A header might be useful, something like “What Counts as an indexable > expression”. Hey Tom, are you still thinking about adding this bit to the docs? I took a quick look at master and didn’t see it there. Thanks, David
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Oct 23, 2023, at 20:20, Erik Wienhold wrote: > I thought that you may have missed that one because I saw this change > that removes the article: > >> -In the strict mode, the specified path must exactly match the structure >> of >> +In strict mode, the specified path must exactly match the structure of Oh, didn’t realize. Fixed. > LGTM. Would you create a commitfest entry? I'll set the status to RfC. Done. https://commitfest.postgresql.org/45/4624/ Best, David v7-0001-Improve-boolean-predicate-JSON-Path-docs.patch Description: Binary data
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Oct 22, 2023, at 20:36, Erik Wienhold wrote: > That's an AppleSingle file according to [1][2]. It only contains the > resource fork and file name but no data fork. Ah, I had “Send large attachments with Mail Drop” enabled. To me 20K is not big but whatever. Let’s see if turning it off fixes the issue. > Any reason for calling it "predicate check expressions" (e.g. the link > text) and sometimes "predicate path expressions" (e.g. the linked > section title)? I think it should be named consistently to avoid > confusion and also to simplify searching. I think "predicate path expressions” is more descriptive, but "predicate check expressions” is what was in the docs before, so let’s stick with that. > Linking the same section twice in the same paragraph seems excessive. Fair. Will link the second one. >> += select jsonb_path_query(:'json', >> '$.track.segments'); >> +select jsonb_path_query(:'json', '$.track.segments'); > > Please remove the second SELECT. Done. >> += select jsonb_path_query(:'json', 'strict >> $.track.segments[0].location'); >> + jsonb_path_query >> +--- >> + [47.763, 13.4034] > > Strict mode is unnecessary to get that result and I'd omit it because > the different modes are not introduced yet at this point. Yep, pasto. > Strict mode is unnecessary here as well. Fixed. >> + using the lax mode. To avoid surprising results, we recommend using >> + the .** accessor only in the strict mode. The > > Please change to "in strict mode" (without "the"). Hrm, I prefer it without the article, too, but it is consistently used that way elsewhere, like here: https://github.com/postgres/postgres/blob/5b36e8f/doc/src/sgml/func.sgml#L17401 I’d be happy to change them all, but was keeping it consistent for now. Updated patch attached, thank you! David v6-0001-Improve-boolean-predicate-JSON-Path-docs.patch Description: Binary data
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Oct 19, 2023, at 10:49 PM, Erik Wienhold wrote: > Just wanted to take a look at v5. But it's an applefile again :P I don’t get it. It was the other times too! Are you able to save it with a .patch suffix? D
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Oct 19, 2023, at 01:22, jian he wrote: > "Do not use with non-predicate", double negative is not easy to > comprehend. Maybe we can simplify it. > > 16933: value. Use only SQL-standard JSON path expressions, not not > there are two "not". > > 15842: SQL-standard JSON path expressions, not not > there are two "not”. Thank you, jian. Updated patch attached and also on GitHub. https://github.com/postgres/postgres/compare/master...theory:postgres:jsonpath-pred-docs Best, David v5-0001-Improve-boolean-predicate-JSON-Path-docs.patch Description: application/applefile
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Oct 16, 2023, at 18:07, Erik Wienhold wrote: >> Okay, added, let’s just put all our cards on the table. :-) > > I'll have a look but the attached v3 is not a patch but some applefile. Weird, should be no different from previous attachments. I believe Apple Mail always uses application/octet-stream for attachments it doesn’t recognize, which includes .patch and .diff files, sadly. > One of the added is invalid by the looks of it. Maybe > is missing because it says "got (para para )" at the end. Oh, I thought it would report issues from the files they were found in. You’re right, I forgot a title. Fixed in v4. David v4-0001-Improve-boolean-predicate-JSON-Path-docs.patch Description: Binary data
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Oct 15, 2023, at 23:03, Erik Wienhold wrote: > Your call but I'm not against including it in this patch because it > already touches the modes section. Okay, added, let’s just put all our cards on the table. :-) >> Agreed. Would be good if we could teach these functions and operators >> to reject path expressions they don’t support. > > Right, you mentioned that idea in [1] (separate types). Not sure what > the best strategy here is but it's likely to break existing queries. > Maybe deprecating unsupported path expressions in the next major release > and changing that to an error in the major release after that. Well if the functions have a JsonPathItem struct, they can check its type attribute and reject those with a root type that’s a predicate in @? and reject it if it’s not a predicate in @@. Example of checking type here: https://github.com/postgres/postgres/blob/54b208f90963cb8b48b9794a5392b2fae4b40a98/src/backend/utils/adt/jsonpath_exec.c#L622 >>> This can be checked with `make -C doc/src/sgml check`. >> >> Thanks. That produces a bunch of warnings for postgres.sgml and >> legal.sgml (and a failure to load the docbook DTD), but func.sgml is >> clean now. > > Hmm... I get no warnings on 1f89b73c4e. Did you install all tools as > described in [2]? The DTD needs to be installed as well. Thanks, got it down to one: postgres.sgml:112: element sect4: validity error : Element sect4 content does not follow the DTD, expecting (sect4info? , (title , subtitle? , titleabbrev?) , (toc | lot | index | glossary | bibliography)* , (((calloutlist | glosslist | bibliolist | itemizedlist | orderedlist | segmentedlist | simplelist | variablelist | caution | important | note | tip | warning | literallayout | programlisting | programlistingco | screen | screenco | screenshot | synopsis | cmdsynopsis | funcsynopsis | classsynopsis | fieldsynopsis | constructorsynopsis | destructorsynopsis | methodsynopsis | formalpara | para | simpara | address | blockquote | graphic | graphicco | mediaobject | mediaobjectco | informalequation | informalexample | informalfigure | informaltable | equation | example | figure | table | msgset | procedure | sidebar | qandaset | task | anchor | bridgehead | remark | highlights | abstract | authorblurb | epigraph | indexterm | beginpage)+ , (refentry* | sect5* | simplesect*)) | refentry+ | sect5+ | simplesect+) , (toc | lot | index | glossary | bibliography)*), got (para para ) David v3-0001-Improve-boolean-predicate-JSON-Path-docs.patch Description: application/applefile
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Oct 14, 2023, at 19:51, Erik Wienhold wrote: > Thanks for putting this together. See my review at the end. Appreciate the speedy review! > Nice. This really does help to make some sense of it. I checked all > queries and they do work out except for two queries where the path > expression string is not properly quoted (but the intended output is > still correct). 臘♂️ >> Follow-ups I’d like to make: >> >> 1. Expand the modes section to show how the types of results can vary >> depending on the mode, thanks to the flattening. Examples: >> >> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); >> jsonb_path_query >> -- >> 3 >> 4 >> 5 >> (3 rows) >> >> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > >> 2)'); >> jsonb_path_query >> -- >> [1, 2, 3, 4, 5] >> >> 2. Improve the descriptions and examples for @?/jsonb_path_exists() >> and @@/jsonb_path_match(). > > +1 I planned to submit these changes in a separate patch, based on Tom Lane’s suggestion[1]. Would it be preferred to add them to this patch? > Perhaps make it explicit that the reader must run this in psql in order > to use \set and :'json' in the ensuing samples? Some of the existing > examples already use psql output but they do not rely on any psql > features. Good call, done. > This should use , , and if it shows > a psql session, e.g.: > > > select jsonb_path_query(:'json', '$.track.segments'); > > > jsonb_path_query > --- > [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 > 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": > "2018-10-14 10:39:21"}] > > I pokwds around, and it appears the computeroutput bit is used for function output. So I followed the precedent in queries.sgml[2] and omitted the computeroutput tags but added prompt, e.g., = select jsonb_path_query(:'json', 'strict $.**.HR'); jsonb_path_query -- 73 135 > Also the cast to jsonb is not necessary and only adds clutter IMO. Right, removed them all in function calls. >> + >> + Predicate-only path expressions are necessary for implementation of >> the >> + @@ operator (and the >> + jsonb_path_match function), and should not be >> used >> + with the @? operator (or >> + jsonb_path_exists function). >> + >> + >> + >> + Conversely, non-predicate jsonpath expressions should >> not be >> + used with the @@ operator (or the >> + jsonb_path_match function). >> + >> + > > Both paras should be wrapped in a single so that they stand out > from the rest of the text. Maybe even , but is already > used on this page for things that I'd consider warnings. Agreed. Would be good if we could teach these functions and operators to reject path expressions they don’t support. >> + >> +Regular Expression Interpretation >> + >> + There are minor differences in the interpretation of regular >> + expression patterns used in like_regex filters, as >> + described in . >> + >> + > > should be closed here, > otherwise the docs won't build. This can be checked with > `make -C doc/src/sgml check`. Thanks. That produces a bunch of warnings for postgres.sgml and legal.sgml (and a failure to load the docbook DTD), but func.sgml is clean now. > `git diff --check` shows a couple of lines with trailing whitespace > (mostly psql output). I must’ve cleaned those after I sent the patch, good now. Updated patch attached, this time created by `git format-patch -v2`. Best, David [1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us [2] https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-JOIN v2-0001-Improve-boolean-predicate-JSON-Path-docs.patch Description: Binary data
Re: Patch: Improve Boolean Predicate JSON Path Docs
On Oct 14, 2023, at 16:40, David E. Wheeler wrote: > Following up from a suggestion from Tom Lane[1] to improve the documentation > of boolean predicate JSON path expressions, please find enclosed a draft > patch to do so. And now I see I can’t spell “Deviations”. Will fix along with any other requested revisions. GitHub diff here if you’re into that sort of thing: https://github.com/postgres/postgres/compare/master...theory:postgres:jsonpath-pred-docs Best, David
Patch: Improve Boolean Predicate JSON Path Docs
Hackers, Following up from a suggestion from Tom Lane[1] to improve the documentation of boolean predicate JSON path expressions, please find enclosed a draft patch to do so. It does three things: 1. Converts all of the example path queries to use jsonb_path_query() and show the results, to make it clearer what the behaviors are. 2. Replaces the list of deviations from the standards with a new subsection, with each deviation in its own sub-subsection. The regex section is unchanged, but I’ve greatly expanded the boolean expression JSON path section with examples comparing standard filter expressions and nonstandard boolean predicates. I’ve also added an exhortation not use boolean expressions with @? or standard path expressions with @@. 3. While converting the modes section to use jsonb_path_query() and show the results, I also added an example of strict mode returning an error. Follow-ups I’d like to make: 1. Expand the modes section to show how the types of results can vary depending on the mode, thanks to the flattening. Examples: david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); jsonb_path_query -- 3 4 5 (3 rows) david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)'); jsonb_path_query -- [1, 2, 3, 4, 5] 2. Improve the descriptions and examples for @?/jsonb_path_exists() and @@/jsonb_path_match(). Best, David [1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us jsonpath-pred-docs.patch Description: Binary data
Re: JSON Path and GIN Questions
Thanks for the reply, Erik. Have appreciated collaborating with you on a few different things lately! > On Oct 13, 2023, at 22:50, Erik Wienhold wrote: >> Hi, finally getting back to this, still fiddling to figure out the >> differences. From the thread you reference [1], is the point that @@ >> and jsonb_path_match() can only be properly used with a JSON Path >> expression that’s a predicate check? > > I think so. That's also supported by the existing docs which only > mention "JSON path predicate" for @@ and jsonb_path_match(). Okay, good. >> If so, as far as I can tell, only exists() around the entire path >> query, or the deviation from the SQL standard that allows an >> expression to be a predicate? > > Looks like that. But note that exists() is also a filter expression. > So wrapping the entire jsonpath in exists() is also a deviation from the > SQL standard which only allows predicates in filter expressions, i.e. > ' ? ()'. Yeah. I’m starting to get the sense that the Postgres extension of the standard to allow predicates without filters is almost a different thing, like there are two Pg SQL/JSON Path languages: 1. SQL Standard path language for selecting values and includes predicates. Returns the selected value(s). Supported by `@?` and jsonb_path_exists(). 2. The Postgres predicate path language which returns a boolean, akin to a WHERE expression. Supported by `@@` and jsonb_path_match() Both are supported by jsonb_path_query(), but if you use a standard path you get the values and if you use a predicate path you get a boolean. This feels a big overloaded to me, TBH; I find myself wanting them to be separate types since the behaviors vary quite a bit! >> This suggest to me that the "Only the first item of the result is >> taken into account” bit from the docs may not be quite right. > > Yes, this was also the issue in the referenced thread[1]. I think my > suggesstion in [2] explains it (as far as I understand it). Yeah, lax vs. strict mode stuff definitely creates some added complexity. I see now I missed the rest of that thread; seeing the entire thread on one page[1] really helps. I’d like to take a stab at the doc improvements Tom suggests[2]. >> jsonb_path_match(), it turns out, only wants a single result. But >> furthermore perhaps the use of a filter predicate rather than a >> predicate expression for the entire path query is an error? > > Yes, I think @@ and jsonb_path_match() should not be used with filter > expressions because the jsonpath returns whatever the path expression > yields (which may be an actual boolean value in the jsonb). The filter > expression only filters (as the name suggests) what the path expression > yields. Agreed. It only gets worse with a filter expression that selects a single value: david=# select jsonb_path_match('{"a":[false,true]}', '$.a ?(@[*] == false)'); jsonb_path_match -- f Presumably it returns false because the value selected is JSON `false`: david=# select jsonb_path_query('{"a":[false,true]}', '$.a ?(@[*] == false)'); jsonb_path_query -- false Which seems misleading, frankly. Would it be possible to update jsonb_path_match and @@ to raise an error when the path expression is not a predicate? >> Curiously, @@ seems okay with it: >> >> david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)'; >> ?column? >> -- >> t >> >> Not a predicate query, and somehow returns true even though the first >> item of the result is false? Is that how it should be? > > Your example does a text search equivalent to: > > select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? > (@[*] == true)') > > You forgot the cast to jsonb. Oh good grief 臘♂️ > jsonb @@ jsonpath actually returns null: > > test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)'; > ?column? > -- > > (1 row) Yes, much better, though see the result above that returns a single `false` and confuses things. > This matches the note right after the docs for @@: Yeah, that makes sense. But here’s a bit about lax mode[3] that confuses me: > The lax mode facilitates matching of a JSON document structure and path > expression if the JSON data does not conform to the expected schema. If an > operand does not match the requirements of a particular operation, it can be > automatically wrapped as an SQL/JSON array or unwrapped by converting its > elements into an SQL/JSON sequence before performing this operation. Besides, > comparison operators automatically unwrap their operands in the lax mode, so > you can compare SQL/JSON arrays out-of-the-box. This automatic flattening in lax mode seems odd, because it means you get different results in strict and lax mode where there are no errors. In lax mode, you get a set: david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); jsonb_path_query -- 3 4 5 (3 rows) But
Re: JSON Path and GIN Questions
On Sep 12, 2023, at 21:00, Erik Wienhold wrote: >> I posted this question on Stack Overflow >> (https://stackoverflow.com/q/77046554/79202), >> and from the suggestion I got there, it seems that @@ expects a boolean to be >> returned by the path query, while @? wraps it in an implicit exists(). Is >> that >> right? > > That's also my understanding. We had a discussion about the docs on @@, @?, > and > jsonb_path_query on -general a while back [1]. Maybe it's useful also. Hi, finally getting back to this, still fiddling to figure out the differences. From the thread you reference [1], is the point that @@ and jsonb_path_match() can only be properly used with a JSON Path expression that’s a predicate check? If so, as far as I can tell, only exists() around the entire path query, or the deviation from the SQL standard that allows an expression to be a predicate? This suggest to me that the "Only the first item of the result is taken into account” bit from the docs may not be quite right. Consider this example: david=# select jsonb_path_query('{"a":[false,true,false]}', '$.a ?(@[*] == false)'); jsonb_path_query -- false false (2 rows) david=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ?(@[*] == false)'); ERROR: single boolean result is expected jsonb_path_match(), it turns out, only wants a single result. But furthermore perhaps the use of a filter predicate rather than a predicate expression for the entire path query is an error? Curiously, @@ seems okay with it: david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)'; ?column? -- t Not a predicate query, and somehow returns true even though the first item of the result is false? Is that how it should be? Best, David [1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com
Re: to_regtype() Raises Error
On Sep 17, 2023, at 19:28, Tom Lane wrote: >> No, that is precisely the point. The result should be null instead of >> an error. > > Yeah, ideally so, but the cost/benefit of making it happen seems > pretty unattractive for now. See the soft-errors thread at [1], > particularly [2] (but searching in that thread for references to > regclassin, regtypein, and to_reg* will find additional detail). For my purposes I’m wrapping it in an exception-catching PL/pgSQL function, but it might be worth noting the condition in which it *will* raise an error on the docs. Best, David signature.asc Description: Message signed with OpenPGP
to_regtype() Raises Error
The docs for `to_regtype()` say, “this function will return NULL rather than throwing an error if the name is not found.” And it’s true most of the time: david=# select to_regtype('foo'), to_regtype('clam'); to_regtype | to_regtype + [null] | [null] But not others: david=# select to_regtype('inteval second'); ERROR: syntax error at or near "second" LINE 1: select to_regtype('inteval second'); ^ CONTEXT: invalid type name "inteval second” I presume this has something to do with not catching errors from the parser? david=# select to_regtype('clam bake'); ERROR: syntax error at or near "bake" LINE 1: select to_regtype('clam bake'); ^ CONTEXT: invalid type name "clam bake" Best, David signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 17, 2023, at 12:20, Tom Lane wrote: > After thinking about it for awhile, I think we need some more > discursive explanation of what's allowed, perhaps along the lines > of the attached. (I still can't shake the feeling that this is > duplicative; but I can't find anything comparable until you get > into the weeds in Section V.) > > I put the new text at the end of section 11.1, but perhaps it > belongs a little further up in that section; it seems more > important than some of the preceding paras. I think this is useful, but also that it’s worth calling out explicitly that functions do not count as indexable operators. True by definition, of course, but I at least had assumed that since an operator is, in a sense, syntax sugar for a function call, they are in some sense the same thing. A header might be useful, something like “What Counts as an indexable expression”. Best, David signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 16, 2023, at 18:13, Erik Wienhold wrote: > Looks like the effect of lax mode which may unwrap arrays when necessary [1]. > The array unwrapping looks like the result of jsonb_array_elements(). > > It kinda works in strict mode: > > SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] > < 14)'); > >jsonb_path_query > --- > [47.763, 13.4034] > [47.706, 13.2635] > (2 rows) > > But it does not remove elements from the matching arrays. Which I don't even > expect here because the path specifies the location array as the object to be > returned. The filter expression then only decides whether to return the > location array or not. Nowhere in the docs does it say that the filter > expression itself removes any elements from a matched array. Yes, this is what I expected. It means “select the location array if any of its contents is less that 14.” I don’t understand why it’s different in lax mode, though, as `@[*]` is not a structural error; it confirms to the schema, as the docs say. The flattening in this case seems weird. Ah, here’s why:, from the docs: "Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays out-of-the-box.” There follow some discussion of the need to specify `[*]` on segments in strict mode, but since that’s exactly what my example does (and the same for the locations array inside the filter), it doesn’t seem right to me that it would be unwrapped here. > Here's a query that filter's out individual array elements. It's quite a > mouthful (especially to preserve the order of array elements): Wow fun, and yeah, it makes sense to take things apart in SQL for this sort of thing! Best, David signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 16, 2023, at 16:50, Erik Wienhold wrote: > "For these operators, a GIN index extracts clauses of the form > **accessors_chain = constant** out of the jsonpath pattern, and does the > index search based on the keys and values mentioned in these clauses." > > I don't know if this is a general limitation of GIN indexes or just how these > operators are implemented right now. > > [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING The detail that jumps out at me is this one on jsonb_path_ops: “Basically, each jsonb_path_ops index item is a hash of the value and the key(s) leading to it” Because jsonb_path_ops indexes hashes, I would assume it would only support path equality. But it’s not clear to me from these docs that jsonb_ops also indexes hashes. Does it? Best, D signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 12, 2023, at 21:00, Erik Wienhold wrote: >> If so, I’d like to submit a patch to the docs talking about this, and >> suggesting the use of jsonb_path_query() to test paths to see if they return >> a boolean or not. > > +1 I’ve started work on this; there’s so much to learn! Here’s a new example that surprised me a bit. Using the GPS tracker example from the docs [1] loaded into a `:json` psql variable, this output of this query makes perfect sense to me: david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 14)'); jsonb_path_query -- 13.4034 13.2635 Because `[*]` selects all the values. This, however, I did not expect: david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 14)'); jsonb_path_query -- 13.4034 13.2635 (2 rows) I had expected it to return two single-value arrays, instead: [13.4034] [13.2635] It appears that the filter expression is doing some sub-selection, too. Is that expected? Best, David [1]: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 15, 2023, at 23:59, Erik Rijkers wrote: > movie @? '$ ?($.year >= 2023)' > > I believe it is indeed not possible to have such a unequality-search use the > GIN index. It is another weakness of JSON that can be unexpected to those > not in the fullness of Knowledge of the manual. Yes, this too would be good > to explain in the doc where JSON indexes are explained. Is that a limitation of GIN indexes in general? Or could there be opclass improvements in the future that would enable such comparisons? Thanks, David signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 15, 2023, at 20:36, Tom Lane wrote: > I think that that indicates that you're putting the info in the > wrong place. Perhaps the right answer is to insert something > more explicit in section 11.2, which is the first place where > we really spend any effort discussing what can be indexed. Fair enough. How ’bout this? --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -120,7 +120,7 @@ CREATE INDEX test1_id_index ON test1 (id); B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension bloom. Each index type uses a different - algorithm that is best suited to different types of queries. + algorithm that is best suited to different types of queries and operators. By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations. @@ -132,6 +132,14 @@ CREATE INDEX name ON table + + +Only operators on indexed columns are considered as index qualifications. +Functions never qualify for index usage, aside from +indexes on expressions. + + + B-Tree signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 12, 2023, at 21:00, Erik Wienhold wrote: > That's also my understanding. We had a discussion about the docs on @@, @?, > and > jsonb_path_query on -general a while back [1]. Maybe it's useful also. Okay, I’ll take a pass at expanding the docs on this. I think a little mini-tutorial on these two operators would be useful. Meanwhile, I’d like to re-up this question about the index qualification of non-equality JSON Path operators. On Sep 12, 2023, at 20:16, David E. Wheeler wrote: > Issue 3: Index Use for Comparison > - > > From the docs > (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I > had assumed any JSON Path query would be able to use the GIN index. However > while the use of the == JSON Path operator is able to take advantage of the > GIN index, apparently the >= operator cannot: > > david=# explain analyze select id from movies where movie @? '$ ?($.year >= > 2023)'; > QUERY PLAN > > - > Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual > time=34.815..36.259 rows=192 loops=1) > Filter: (movie @? '$?($."year" >= 2023)'::jsonpath) > Rows Removed by Filter: 36081 > Planning Time: 1.864 ms > Execution Time: 36.338 ms > (5 rows) > > Is this expected? Originally I tried with json_path_ops, which I can > understand not working, since it stores hashes of paths, which would allow > only exact matches. But a plain old GIN index doesn’t appear to work, either. > Should it? Is there perhaps some other op class that would allow it to work? > Or would I have to create a separate BTREE index on `movie -> 'year'`? Thanks, David signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 14, 2023, at 00:41, Tom Lane wrote: > As far as json in particular is concerned, 8.14.4 jsonb Indexing [4] > is pretty clear about what is or is not supported. How do you feel about this note, then? diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index b6c2ddbf55..7dda727f0d 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -413,6 +413,13 @@ SELECT doc-'site_name' FROM websites Two GIN operator classes are provided, offering different performance and flexibility trade-offs. + + +As with all indexes, only operators on indexed columns are considered as +index qualifications. In other words, only jsonb operators can +take advantage of GIN indexes; jsonb functions cannot. + + The default GIN operator class for jsonb supports queries with the key-exists operators ?, ?| Best, David signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 13, 2023, at 01:11, Erik Rijkers wrote: > "All use of json*() functions preclude index usage." > > That sentence is missing from the documentation. Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why would the corresponding functions behave the same? D signature.asc Description: Message signed with OpenPGP
JSON Path and GIN Questions
Greetings Hackers, Been a while! I’m working on some experiments with JSONB columns and GIN indexes, and have operated on the assumption that JSON Path operations would take advantage of GIN indexes, with json_path_ops as a nice optimization. But I’ve run into what appear to be some inconsistencies and oddities I’m hoping to figure out with your help. For the examples in this email, I’m using this simple table: CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL); \copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json | jq -c ".[]" | sed "s|||g"'; create index on movies using gin (movie); analyze movies; That gives me a simple table with around 3600 rows. Not a lot of data, but hopefully enough to demonstrate the issues. Issue 1: @@ vs @? - I have been confused as to the difference between @@ vs @?: Why do these return different results? david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")'; id (0 rows) david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")'; id 10 (1 row) I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), and from the suggestion I got there, it seems that @@ expects a boolean to be returned by the path query, while @? wraps it in an implicit exists(). Is that right? If so, I’d like to submit a patch to the docs talking about this, and suggesting the use of jsonb_path_query() to test paths to see if they return a boolean or not. Issue 2: @? Index Use - From Oleg’s (happy belated birthday!) notes (https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators): > Operators @? and @@ are interchangeable: > > js @? '$.a' <=> js @@ 'exists($.a)’ > js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’ For the purposes of the above example, this appears to hold true: if I wrap the path query in exists(), @@ returns a result: david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life Rescue"))'; id 10 (1 row) Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN index while @? does not. Or, no, fiddling with it again just now, I think I have still been confusing these operators! @@ was using the index with an an explicit exists(), but @? was not…because I was still using an explicit exists. In other words: * @@ 'exists($ ?($.year == 1944))' Uses the index * @? '$ ?(@.year == 1944)' Uses the index * @? 'exists($ ?($.year == 1944))' Does not use the index That last one presumably doesn’t work, because there is an implicit exists() around the exists(), making it `exists(exists($ ?($.year == 1944)))`, which returns true for every row (true and false both exists)! 臘♂️. Anyway, if I have this right, I’d like to flesh out the docs a bit. Issue 3: Index Use for Comparison - From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path query would be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage of the GIN index, apparently the >= operator cannot: david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)'; QUERY PLAN - Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1) Filter: (movie @? '$?($."year" >= 2023)'::jsonpath) Rows Removed by Filter: 36081 Planning Time: 1.864 ms Execution Time: 36.338 ms (5 rows) Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes of paths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is there perhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on `movie -> 'year'`? Thanks your your patience with my questions! Best, David signature.asc Description: Message signed with OpenPGP
Re: JSONPath Child Operator?
On Jan 30, 2023, at 08:17, Filipp Krylov wrote: >> My question: Are there plans to support square bracket syntax for JSON >> object field name strings like this? Or to update to follow the standard as >> it’s finalized? > > This syntax is a part of "jsonpath syntax extensions" patchset: > https://www.postgresql.org/message-id/e0fe4f7b-da0b-471c-b3da-d8adaf314357%40postgrespro.ru Nice, thanks. I learned since sending this email that SQL/JSON Path is not at all the same as plain JSON Path, so now I’m less concerned bout it. I like the new object subscript syntax, though, I’ve been thinking about this myself. D
JSONPath Child Operator?
Greetings! Long time no see, I know. How are you, Hackers? I notice from the docs in the Postgres JSONPath type, brackets are described as: > • Square brackets ([]) are used for array access. https://www.postgresql.org/docs/current/datatype-json.html#DATATYPE-JSONPATH Notably they are not used for object field path specifications, in contrast to the original JSON Path design, which says: > JSONPath expressions can use the dot–notation > > $.store.book[0].title > > or the bracket–notation > > $['store']['book'][0]['title'] https://goessner.net/articles/JsonPath/index.html#e2 Similarly, the current IETF RFC Draft says: > JSONPath expressions use the _bracket notation_, for example: > >$['store']['book'][0]['title'] > >or the more compact _dot notation_, for example: > >$.store.book[0].title https://datatracker.ietf.org/doc/draft-ietf-jsonpath-base/ My question: Are there plans to support square bracket syntax for JSON object field name strings like this? Or to update to follow the standard as it’s finalized? Thanks, David
Re: When is int32 not an int32?
On Sep 26, 2021, at 19:25, Tom Lane wrote: > More to the point, you should be checking whether strtol reports overflow. > Having now seen your code, I'll opine that the failing platforms have > 32-bit long. Thanks for the pointer, Tom. I believe this fixes that particular issue. https://github.com/theory/pg-semver/commit/4d79dcc Best, David signature.asc Description: Message signed with OpenPGP
Re: When is int32 not an int32?
On Sep 26, 2021, at 18:31, Tom Lane wrote: > I'd bet more along the lines of "your overflow check is less portable than > you thought”. Oh well now that you mention it and I look past things, I see we’re using INT_MAX, but should probably use INT32_MAX. https://github.com/theory/pg-semver/blob/87cc30cbe80aa3992a4af6f19a35a944a86c/src/semver.c#L145-L149 And also that the destination value we’re storing it in is an int parts[], not int32 parts[]. Which we do so we can parse numbers up to int size. But to Fetter’s point, we’re not properly handling something greater than int (usually int64, presumably). Not sure what changes are required to improve memory safety over and above using INT32_MAX instead of INT_MAX. Thanks, Daavid signature.asc Description: Message signed with OpenPGP
When is int32 not an int32?
Hell Hackers, long time no email! I got a bug report for the semver extension: https://github.com/theory/pg-semver/issues/58 It claims that a test unexpected passes. That is, Test #31 is expected to fail, because it intentionally tests a version in which its parts overflow the int32[3] they’re stored in, with the expectation that one day we can refactor the type to handle larger version parts. I can’t imagine there would be any circumstance under which int32 would somehow be larger than a signed 32-bit integer, but perhaps there is? Scroll to the bottom of these pages to see the unexpected passes on i386 and armhf: https://ci.debian.net/data/autopkgtest/unstable/i386/p/postgresql-semver/15208658/log.gz https://ci.debian.net/data/autopkgtest/unstable/armhf/p/postgresql-semver/15208657/log.gz Here’s the Postgres build output for those two platforms, as well, though nothing jumps out at me: https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=i386=13.4-3=1630408269=0 https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=armhf=13.4-3=1630412028=0 Thanks, David signature.asc Description: Message signed with OpenPGP
Re: [HACKERS] Look-behind regular expressions
On Jul 7, 2020, at 21:32, Andreas Karlsson wrote: > On 7/7/20 6:51 PM, Thom Brown wrote: > >> 10 years later, and I've noticed that both look-behind and negative >> look-behind have been implemented. >> Thanks to whomever did this. > > I think that it is Tom Lane that you should thank for this. > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=12c9a04008870c283931d6b3b648ee21bbc2cfda In 2015! Thank you, Tom. I’ll have to find a use for it one of these days. D signature.asc Description: Message signed with OpenPGP