Re: Does 'instead of delete' trigger support modification of OLD
> 8 нояб. 2019 г., в 0:26, Bruce Momjian написал(а): > > First, notice "only", which was missing from the later sentence: > >For INSERT and UPDATE >operations [only], the trigger may modify the >NEW row before returning it. > > which I have now added with my applied patch to all supported releases. > Hi Bruce, I happened to browse recent documentation-related commits and I didn’t see this patch in REL_12_STABLE. Judging by the commit message, it should be applied there too.
Re: JSONPATH documentation
On 9/25/19 12:08 AM, Peter Eisentraut wrote: On 2019-09-23 00:03, Tom Lane wrote: While we're whining about this, I find it very off-putting that the jsonpath stuff was inserted in the JSON functions section ahead of the actual JSON functions. I think it should have gone after them, because it feels like a barely-related interjection as it stands. Maybe there's even a case that it should be its own , after the "functions-json" section. I'd just switch the sect2's around. As more SQL/JSON functionality gets added, I believe a separate sect1 is likely to be more justified. However, for v12 I'd vote for moving sect2 down. The patch is attached, it also fixes the ambiguous sentence that has raised questions in this thread. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 432dbad868..6cb5d6 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -11646,14 +11646,14 @@ table2-mapping - SQL/JSON path expressions - (see ). + PostgreSQL-specific functions and operators for JSON + data types (see ). - PostgreSQL-specific functions and operators for JSON - data types (see ). + SQL/JSON path expressions + (see ). @@ -11665,1748 +11665,1748 @@ table2-mapping see . - - SQL/JSON Path Expressions + + JSON Functions and Operators -SQL/JSON -path expressions +JSON +functions and operators - SQL/JSON path expressions specify the items to be retrieved - from the JSON data, similar to XPath expressions used - for SQL access to XML. In PostgreSQL, - path expressions are implemented as the jsonpath - data type and can use any elements described in - . +shows the operators that + are available for use with JSON data types (see ). - JSON query functions and operators - pass the provided path expression to the path engine - for evaluation. If the expression matches the queried JSON data, - the corresponding SQL/JSON item is returned. - Path expressions are written in the SQL/JSON path language - and can also include arithmetic expressions and functions. - Query functions treat the provided expression as a - text string, so it must be enclosed in single quotes. - + + json and jsonb Operators + + + +Operator +Right Operand Type +Return type +Description +Example +Example Result + + + + +- +int +json or jsonb +Get JSON array element (indexed from zero, negative +integers count from the end) +'[{"a":"foo"},{"b":"bar"},{"c":"baz"}]'::json-2 +{"c":"baz"} + + +- +text +json or jsonb +Get JSON object field by key +'{"a": {"b":"foo"}}'::json-'a' +{"b":"foo"} + + +- +int +text +Get JSON array element as text +'[1,2,3]'::json-2 +3 + + +- +text +text +Get JSON object field as text +'{"a":1,"b":2}'::json-'b' +2 + + +# +text[] +json or jsonb +Get JSON object at the specified path +'{"a": {"b":{"c": "foo"}}}'::json#'{a,b}' +{"c": "foo"} + + +# +text[] +text +Get JSON object at the specified path as text +'{"a":[1,2,3],"b":[4,5,6]}'::json#'{a,2}' +3 + + + + + + +There are parallel variants of these operators for both the +json and jsonb types. +The field/element/path extraction operators +return the same type as their left-hand input (either json +or jsonb), except for those specified as +returning text, which coerce the value to text. +The field/element/path extraction operators return NULL, rather than +failing, if the JSON input does not have the right structure to match +the request; for example if no such element exists. The +field/element/path extraction operators that accept integer JSON +array subscripts all support negative subscripting from the end of +arrays. + + - A path expression consists of a sequence of elements allowed - by the jsonpath data type. - The path expression is evaluated from left to right, but - you can use parentheses to change the order of operations. - If the evaluation is successful, a sequence of SQL/JSON items - (SQL/JSON sequence) is produced, - and the evaluation result is ret
Re: s/rewinded/rewound/?
On 8/7/19 12:00 PM, Michael Paquier wrote: On Wed, Aug 07, 2019 at 10:53:45AM +0200, Magnus Hagander wrote: To me this sounds like a classic non-English-native-speaker-mistake. But it seems at least the one in the docs come from Bruce, who definitely is... So perhaps it's intentional to refer to "what pg_rewind does", and not necessarily to the regular word for it? I am not sure :) "rewound" sounds much more natural. -- Michael +1 for rewound from a non-English-native-speaker. The use of "rewound" in the same file also supports Michael's view. If we decide to fix this, we should probably revise and back-patch the whole paragraph where it appears as it seems to mix up scanning target cluster WALs and applying source cluster WALs. A small patch is attached for your consideration (originally proposed on pgsql-docs [1]). [1] https://www.postgresql.org/message-id/ad6ac5bb-6689-ddb0-dc60-c5fc197d728e%40postgrespro.ru -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 52a1caa..a7e1705 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -66,14 +66,12 @@ PostgreSQL documentation can be found either on the target timeline, the source timeline, or their common ancestor. In the typical failover scenario where the target cluster was shut down soon after the divergence, this is not a problem, but if the - target cluster ran for a long time after the divergence, the old WAL + target cluster ran for a long time after the divergence, its old WAL files might no longer be present. In that case, they can be manually - copied from the WAL archive to the pg_wal directory, or - fetched on startup by configuring or - . The use of - pg_rewind is not limited to failover, e.g. a standby - server can be promoted, run some write transactions, and then rewinded - to become a standby again. + copied from the WAL archive to the pg_wal directory. + The use of pg_rewind is not limited to failover, + e.g. a standby server can be promoted, run some write transactions, and then + get rewound to become a standby again.
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
On 7/1/19 5:20 PM, Alexey Kondratov wrote: Hi Thomas, On 01.07.2019 15:02, Thomas Munro wrote: Hi Alexey, This no longer applies. Since the Commitfest is starting now, could you please rebase it? Thank you for a reminder. Rebased version of the patch is attached. I've also modified my logging code in order to obey new unified logging system for command-line programs commited by Peter (cc8d415117). Regards Hi Alexey, I would like to suggest a couple of changes to docs and comments, please see the attachment. The "...or fetched on startup" part also seems wrong here, but it's not a part of your patch, so I'm going to ask about it on psql-docs separately. It might also be useful to reword the following error messages: - "using restored from archive version of file \"%s\"" - "could not open restored from archive file \"%s\" We could probably say something like "could not open file \"%s\" restored from WAL archive" instead. On a more general note, I wonder if everyone is happy with the --using-postgresql-conf option name, or we should continue searching for a narrower term. Unfortunately, I don't have any better suggestions right now, but I believe it should be clear that its purpose is to fetch missing WAL files for target. What do you think? -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 52a1caa..7e76fcc 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -66,9 +66,12 @@ PostgreSQL documentation can be found either on the target timeline, the source timeline, or their common ancestor. In the typical failover scenario where the target cluster was shut down soon after the divergence, this is not a problem, but if the - target cluster ran for a long time after the divergence, the old WAL - files might no longer be present. In that case, they can be manually - copied from the WAL archive to the pg_wal directory, or + target cluster ran for a long time after the divergence, its old WAL + files might no longer be present. In this case, you can manually copy them + from the WAL archive to the pg_wal directory, or run + pg_rewind with the -r or + -R option to automatically retrieve them from the WAL + archive, or fetched on startup by configuring or . The use of pg_rewind is not limited to failover, e.g. a standby @@ -203,6 +206,39 @@ PostgreSQL documentation + -r + --use-postgresql-conf + + +Use the restore_command defined in +postgresql.conf to retrieve WAL files from +the WAL archive if these files are no longer available in the +pg_wal directory of the target cluster. + + +This option cannot be used together with --restore-command. + + + + + + -R restore_command + --restore-command=restore_command + + +Specifies the restore_command to use for retrieving +WAL files from the WAL archive if these files are no longer available +in the pg_wal directory of the target cluster. + + +If restore_command is already set in +postgresql.conf, you can provide the +--use-postgresql-conf option instead. + + + + + --debug @@ -288,7 +324,10 @@ GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, b history forked off from the target cluster. For each WAL record, record each data block that was touched. This yields a list of all the data blocks that were changed in the target cluster, after the - source cluster forked off. + source cluster forked off. If some of the WAL files are no longer + available, try re-running pg_rewind with + the -r or -R option to search + for the missing files in the WAL archive. diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c index 287af60..5a7f759 100644 --- a/src/bin/pg_rewind/parsexlog.c +++ b/src/bin/pg_rewind/parsexlog.c @@ -12,6 +12,7 @@ #include "postgres_fe.h" #include +#include #include "pg_rewind.h" #include "filemap.h" @@ -44,6 +45,7 @@ static char xlogfpath[MAXPGPATH]; typedef struct XLogPageReadPrivate { const char *datadir; + const char *restoreCommand; int tliIndex; } XLogPageReadPrivate; @@ -52,6 +54,9 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader, int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *pageTLI); +static int RestoreArchivedWAL(const char *path, const char *xlogfname, + off_t expectedSize, const char *restoreCommand); + /* * Read WAL from the datadir/pg_wal, starting from 'start
Re: SQL/JSON path issues/questions
On 7/3/19 11:59 PM, Alexander Korotkov wrote: Hi! On Wed, Jul 3, 2019 at 5:27 PM Liudmila Mantrova wrote: I have rechecked the standard and I agree that we should use "filter expression" whenever possible. "A filter expression must be enclosed in parentheses..." looks like an oversight, so I fixed it. As for what's actually enclosed, I believe we can still use the word "condition" here as it's easy to understand and is already used in our docs, e.g. in description of the WHERE clause that serves a similar purpose. The new version of the patch fixes the terminology, tweaks the examples, and provides some grammar and style fixes in the jsonpath-related chapters. It looks good to me. But this sentence looks a bit too complicated. "It can be followed by one or more accessor operators to define the JSON element on a lower nesting level by which to filter the result." Could we phrase this as following? "In order to filter the result by values lying on lower nesting level, @ operator can be followed by one or more accessor operators." -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Thank you! I think we can make this sentence even shorter, the fix is attached: "To refer to a JSON element stored at a lower nesting level, add one or more accessor operators after @." -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3a8581d..6d2aefb 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -11538,7 +11538,8 @@ table2-mapping from the JSON data, similar to XPath expressions used for SQL access to XML. In PostgreSQL, path expressions are implemented as the jsonpath - data type, described in . + data type and can use any elements described in + . JSON query functions and operators @@ -11585,7 +11586,7 @@ table2-mapping }, { "location": [ 47.706, 13.2635 ], "start time": "2018-10-14 10:39:21", -"HR": 130 +"HR": 135 } ] } } @@ -11637,23 +11638,33 @@ table2-mapping When defining the path, you can also use one or more - filter expressions, which work similar to - the WHERE clause in SQL. Each filter expression - can provide one or more filtering conditions that are applied - to the result of the path evaluation. Each filter expression must - be enclosed in parentheses and preceded by a question mark. - Filter expressions are evaluated from left to right and can be nested. - The @ variable denotes the current path evaluation - result to be filtered, and can be followed by one or more accessor - operators to define the JSON element by which to filter the result. - Functions and operators that can be used in the filtering condition - are listed in . - SQL/JSON defines three-valued logic, so the result of the filter - expression may be true, false, + filter expressions that work similar to the + WHERE clause in SQL. A filter expression begins with + a question mark and provides a condition in parentheses: + + +? (condition) + + + + + Filter expressions must be specified right after the path evaluation step + to which they are applied. The result of this step is filtered to include + only those items that satisfy the provided condition. SQL/JSON defines + three-valued logic, so the condition can be true, false, or unknown. The unknown value - plays the same role as SQL NULL. Further path + plays the same role as SQL NULL and can be tested + for with the is unknown predicate. Further path evaluation steps use only those items for which filter expressions - return true. + return true. + + + + Functions and operators that can be used in filter expressions are listed + in . The path + evaluation result to be filtered is denoted by the @ + variable. To refer to a JSON element stored at a lower nesting level, + add one or more accessor operators after @. @@ -11667,8 +11678,8 @@ table2-mapping To get the start time of segments with such values instead, you have to filter out irrelevant segments before returning the start time, so the - filter is applied to the previous step and the path in the filtering - condition is different: + filter expression is applied to the previous step, and the path used + in the condition is different: '$.track.segments[*] ? (@.HR 130)."start time"' @@ -11693,9 +11704,9 @@ table2-mapping - You can also nest filters within each other: + You can also nest filter expressions within each other: -'$.track ? (@.segments[*] ? (@.HR 130)).segments.size()' +'$.track ? (exists(@.segments[*] ? (@.HR 130))).segments.size()' This expression r
Re: SQL/JSON path issues/questions
On 6/28/19 6:47 AM, Alexander Korotkov wrote: On Tue, Jun 25, 2019 at 6:38 PM Liudmila Mantrova wrote: Thank you for the catch! Please see the modified version of patch 0004 attached. I tried to review and revise the part related to filters, but I failed because I don't understand the notions used in the documentation. What is the difference between filter expression and filter condition? I can guess that filter expression contains question mark, parentheses and filter condition inside. But this sentence is in contradiction with my guess: "A filter expression must be enclosed in parentheses and preceded by a question mark". So, filter expression is inside the parentheses. Then what is filter condition? The same? Each filter expression can provide one or more filters that are applied to the result of the path evaluation. So additionally to filter condition and filter expression we introduce the notion of just filter. What is it? Could we make it without introduction of new notion? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Hi, I have rechecked the standard and I agree that we should use "filter expression" whenever possible. "A filter expression must be enclosed in parentheses..." looks like an oversight, so I fixed it. As for what's actually enclosed, I believe we can still use the word "condition" here as it's easy to understand and is already used in our docs, e.g. in description of the WHERE clause that serves a similar purpose. The new version of the patch fixes the terminology, tweaks the examples, and provides some grammar and style fixes in the jsonpath-related chapters. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3a8581d..b0de624 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -11538,7 +11538,8 @@ table2-mapping from the JSON data, similar to XPath expressions used for SQL access to XML. In PostgreSQL, path expressions are implemented as the jsonpath - data type, described in . + data type and can use any elements described in + . JSON query functions and operators @@ -11585,7 +11586,7 @@ table2-mapping }, { "location": [ 47.706, 13.2635 ], "start time": "2018-10-14 10:39:21", -"HR": 130 +"HR": 135 } ] } } @@ -11637,23 +11638,33 @@ table2-mapping When defining the path, you can also use one or more - filter expressions, which work similar to - the WHERE clause in SQL. Each filter expression - can provide one or more filtering conditions that are applied - to the result of the path evaluation. Each filter expression must - be enclosed in parentheses and preceded by a question mark. - Filter expressions are evaluated from left to right and can be nested. - The @ variable denotes the current path evaluation - result to be filtered, and can be followed by one or more accessor - operators to define the JSON element by which to filter the result. - Functions and operators that can be used in the filtering condition - are listed in . - SQL/JSON defines three-valued logic, so the result of the filter - expression may be true, false, + filter expressions that work similar to the + WHERE clause in SQL. A filter expression begins with + a question mark and provides a condition in parentheses: + + +? (condition) + + + + + Filter expressions must be specified right after the path evaluation step + to which they are applied. The result of this step is filtered to include + only those items that satisfy the provided condition. SQL/JSON defines + three-valued logic, so the condition can be true, false, or unknown. The unknown value - plays the same role as SQL NULL. Further path + plays the same role as SQL NULL and can be tested + for with the is unknown predicate. Further path evaluation steps use only those items for which filter expressions - return true. + return true. + + + + Functions and operators that can be used in filter expressions are listed + in . The path + evaluation result to be filtered is denoted by the @ + variable. It can be followed by one or more accessor operators to define + the JSON element on a lower nesting level by which to filter the result. @@ -11667,8 +11678,8 @@ table2-mapping To get the start time of segments with such values instead, you have to filter out irrelevant segments before returning the start time, so the - filter is applied to the previous step and the path in the filtering - condition is different: + filter expression is applied to the previous step, and the path used + in the condition is different:
Re: SQL/JSON path issues/questions
On 6/21/19 8:04 PM, Alexander Korotkov wrote: On Wed, Jun 19, 2019 at 10:14 PM Alexander Korotkov wrote: While I have no objections to the proposed fixes, I think we can further improve patch 0003 and the text it refers to. In attempt to clarify jsonpath docs and address the concern that ? is hard to trace in the current text, I'd also like to propose patch 0004. Please see both of them attached. Thank you for your editing. I'm going to commit them as well. But I'm going to commit your changes separately from 0003 I've posted before. Because 0003 fixes factual error, while you're proposing set of grammar/style fixes. I made some review of these patches. My notes are following: -See also for the aggregate -function json_agg which aggregates record -values as JSON, and the aggregate function -json_object_agg which aggregates pairs of values -into a JSON object, and their jsonb equivalents, +See also for details on +json_agg function that aggregates record +values as JSON, json_object_agg function +that aggregates pairs of values into a JSON object, and their jsonb equivalents, jsonb_agg and jsonb_object_agg. This part is not directly related to jsonpath, and it has been there for a long time. I'd like some native english speaker to review this change before committing this. -Expression inside subscript may consititue an integer, -numeric expression or any other jsonpath expression -returning single numeric value. The last keyword -can be used in the expression denoting the last subscript in an array. -That's helpful for handling arrays of unknown length. +The specified index can be an integer, +as well as a numeric or jsonpath expression that +returns a single integer value. Zero index corresponds to the first +array element. To access the last element in an array, you can use +the last keyword, which is useful for handling +arrays of unknown length. I think this part requires more work. Let's see what cases do we have with examples: 1) Integer: '$.ar[1]' 2) Numeric: '$.ar[1.5]' (converted to integer) 3) Some numeric expression: '$.ar[last - 1]' 4) Arbitrary jsonpath expression: '$.ar[$.ar2.size() + $.num - 2]' In principle, it not necessary to divide 3 and 4, or divide 1 and 2. Or we may don't describe cases at all, but just say it's a jsonpath expression returning numeric, which is converted to integer. Also, note that we do not necessary *access* last array element with "last" keyword. "last" keyword denotes index of last element in expression. But completely different element might be actually accessed. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Hi Alexander, Thank you for the catch! Please see the modified version of patch 0004 attached. As for your comment on patch 0003, since I'm not a native speaker, I can only refer to a recent discussion in pgsql-docs mailing list that seems to support my view on a similar issue: https://www.postgresql.org/message-id/9484.1558050957%40sss.pgh.pa.us -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 39ba18d..fa5afc1 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -11638,10 +11638,17 @@ table2-mapping When defining the path, you can also use one or more filter expressions, which work similar to - the WHERE clause in SQL. Each filter expression - can provide one or more filtering conditions that are applied - to the result of the path evaluation. Each filter expression must - be enclosed in parentheses and preceded by a question mark. + the WHERE clause in SQL. A filter expression must + be enclosed in parentheses and preceded by a question mark: + + +? (@ filter ...) + + + + + Each filter expression can provide one or more filters + that are applied to the result of the path evaluation. Filter expressions are evaluated from left to right and can be nested. The @ variable denotes the current path evaluation result to be filtered, and can be followed by one or more accessor diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index daebb4f..0d8e2c6 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -815,21 +815,18 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu .**{level} -.**{lower_level to -upper_level} - - -.**{lower_level to -last} +.**{start_level to +end_level} -Same as .**, but with filter over nesting -level of JSON hierarchy. Levels are specified as integers. -Zero
Re: SQL/JSON path issues/questions
On 6/17/19 11:36 AM, Alexander Korotkov wrote: I'm going to push attached 3 patches if no objections. Regarding 0003-Separate-two-distinctive-json-errors.patch, I think it requires more thoughts. RETURN_ERROR(ereport(ERROR, (errcode(ERRCODE_SINGLETON_JSON_ITEM_REQUIRED), errmsg("left operand of jsonpath operator %s is not a single numeric value", -jspOperationName(jsp->type); +jspOperationName(jsp->type)), + (llen != 1 ? + errdetail("It was an array with %d elements.", llen): + errdetail("The only element was not a numeric."); When we have more than 1 value, it's no exactly array. Jsonpath can extract values from various parts of json document, which never constitute and array. Should we say something like "There are %d values"? Also, probably we should display the type of single element if it's not numeric. jsonb_path_match() also throws ERRCODE_SINGLETON_JSON_ITEM_REQUIRED, should we add similar errdetail() there? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Hi Alexander, While I have no objections to the proposed fixes, I think we can further improve patch 0003 and the text it refers to. In attempt to clarify jsonpath docs and address the concern that ? is hard to trace in the current text, I'd also like to propose patch 0004. Please see both of them attached. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e918133..39ba18d 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12281,7 +12281,7 @@ table2-mapping @? jsonpath -Does JSON path returns any item for the specified JSON value? +Does JSON path return any item for the specified JSON value? '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 2)' @@ -12309,8 +12309,8 @@ table2-mapping The @? and @@ operators suppress -errors including: lacking object field or array element, unexpected JSON -item type and numeric errors. +the following errors: lacking object field or array element, unexpected +JSON item type, and numeric errors. This behavior might be helpful while searching over JSON document collections of varying structure. @@ -13166,26 +13166,25 @@ table2-mapping jsonb_path_query, jsonb_path_query_array and jsonb_path_query_first functions have optional vars and silent -argument. +arguments. -When vars argument is specified, it constitutes an object -contained variables to be substituted into jsonpath -expression. +If the vars argument is specified, it provides an +object containing named variables to be substituted into a +jsonpath expression. -When silent argument is specified and has -true value, the same errors are suppressed as it is in -the @? and @@ operators. +If the silent argument is specified and has the +true value, these functions suppress the same errors +as the @? and @@ operators. -See also for the aggregate -function json_agg which aggregates record -values as JSON, and the aggregate function -json_object_agg which aggregates pairs of values -into a JSON object, and their jsonb equivalents, +See also for details on +json_agg function that aggregates record +values as JSON, json_object_agg function +that aggregates pairs of values into a JSON object, and their jsonb equivalents, jsonb_agg and jsonb_object_agg. diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index b8246ba..daebb4f 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -733,10 +733,12 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu $varname - A named variable. Its value must be set in the - PASSING clause of an SQL/JSON query function. - - for details. + +A named variable. Its value can be set by the parameter +vars of several JSON processing functions. +See and +its notes for details. + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 39ba18d..fa5afc1 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -11638,10 +11638,17 @@ table2-mapping When defining the path, you can also use one or more filter expressions, which work similar to - the WHERE clause in SQL. Each filter expression - can provide one or more filtering conditions that are applied - to the result of the p
Re: SQL/JSON: documentation
On 11/29/18 7:34 PM, Dmitry Dolgov wrote: Hi, Any progress on that? It would be nice to have a new version of the documentation, and I would even advocate to put it into the json path patch [1] (especially, since there were already requests for that, and I personally don't see any reason to keep them separately). For now I'll move the item to the next CF. [1]:https://www.postgresql.org/message-id/flat/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com Hi Dmitry, Unfortunately, I couldn't find much time for this activity, but as far as I understand, thread [1] only requires jsonpath documentation right now. So I extracted the relevant parts from this patch, reworked path expression description, and moved it to func.sgml as Peter suggested (attached). Nikita is going to add this patch to the jsonpath thread together with the updated code once it's ready. Next, I'm going to address Peter's feedback on the rest of this documentation patch (which probably also needs to be split for threads [2] and [3]). [2] https://www.postgresql.org/message-id/flat/cd0bb935-0158-78a7-08b5-904886dea...@postgrespro.ru [3] https://www.postgresql.org/message-id/flat/132f26c4-dfc6-f8fd-4764-2cbf455a3...@postgrespro.ru -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/biblio.sgml b/doc/src/sgml/biblio.sgml index 4953024..f06305d 100644 --- a/doc/src/sgml/biblio.sgml +++ b/doc/src/sgml/biblio.sgml @@ -136,6 +136,17 @@ 1988 + +SQL Technical Report +Part 6: SQL support for JavaScript Object + Notation (JSON) +First Edition. + +http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip;>. + +2017. + + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 112d962..20ef7df 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -11285,26 +11285,661 @@ table2-mapping - JSON Functions and Operators + JSON Functions, Operators, and Expressions - + + The functions, operators, and expressions described in this section + operate on JSON data: + + + + + + SQL/JSON path expressions + (see ). + + + + + PostgreSQL-specific functions and operators for JSON + data types (see ). + + + + + +To learn more about the SQL/JSON standard, see +. For details on JSON types +supported in PostgreSQL, +see . + + + + SQL/JSON Path Expressions + + + SQL/JSON path expressions specify the items to be retrieved + from the JSON data, similar to XPath expressions used + for SQL access to XML. In PostgreSQL, + path expressions are implemented as the jsonpath + data type, described in . + + + JSON query functions and operators + pass the provided path expression to the path engine + for evaluation. If the expression matches the JSON data to be queried, + the corresponding SQL/JSON item is returned. + Path expressions are written in the SQL/JSON path language + and can also include arithmetic expressions and functions. + Query functions treat the provided expression as a + text string, so it must be enclosed in single quotes. + + + + A path expression consists of a sequence of elements allowed + by the jsonpath data type. + The path expression is evaluated from left to right, but + you can use parentheses to change the order of operations. + If the evaluation is successful, an SQL/JSON sequence is produced, + and the evaluation result is returned to the JSON query function + that completes the specified computation. + + + + To refer to the JSON data to be queried (the + context item), use the $ sign + in the path expression. It can be followed by one or more + accessor operators, + which go down the JSON structure level by level to retrieve the + content of context item. Each operator that follows deals with the + result of the previous evaluation step. + + + + For example, suppose you have some JSON data from a GPS tracker that you + would like to parse, such as: + +{ "track" : + { +"segments" : [ + { "location": [ 47.763, 13.4034 ], +"start time": "2018-10-14 10:05:14", +"HR": 73 + }, + { "location": [ 47.706, 13.2635 ], +"start time": "2018-10-14 10:39:21", +"HR": 130 + } ] + } +} + + + + + To retrieve the available track segments, you need to use the + .key accessor + operator for all the preceding JSON objects: + +'$.track.segments' + + + + + If the item to retrieve is an element of an array, you have + to unnest this array using the [*] operator. For example, + the following path will return location coordinates for all + the available track segments: + +'$.track.segment
Re: SQL/JSON: documentation
On 09/28/2018 08:29 PM, Peter Eisentraut wrote: On 28/06/2018 01:36, Nikita Glukhov wrote: Attached patch with draft of SQL/JSON documentation written by Liudmila Mantrova, Oleg Bartunov and me. Also it can be found in our sqljson repository on sqljson_doc branch: https://github.com/postgrespro/sqljson/tree/sqljson_doc We continue to work on it. Some structural comments: - I don't think this should be moved to a separate file. Yes, func.sgml is pretty big, but if we're going to split it up, we should do it in a systematic way, not just one section. - The refentries are not a bad idea, but again, if we just used them for this one section, the navigation will behave weirdly. So I'd do it without them, just using normal subsections. - Stick to one-space indentation in XML. Hi Peter, Thanks for your comments! I'm OK with keeping all reference information in func.sgml and will rework it as you suggest. While refentries are dear to my heart, let's use subsections for now for the sake of consistency. We'll continue working with Nikita and Oleg to improve the content before we resend an updated patch; I believe we might still need a separate source file if we end up having a separate chapter with usage examples and implementation details. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Bug in to_timestamp().
On 09/22/2018 10:00 PM, Alexander Korotkov wrote: On Thu, Sep 20, 2018 at 3:52 PM Alexander Korotkov wrote: On Thu, Sep 20, 2018 at 6:09 AM amul sul wrote: Agreed, thanks for working on this. Pushed, thanks. Please, find attached patch improving documentation about letters/digits in to_date()/to_timestamp() template string. I think it needs review from native English speaker. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Hi Alexander, I'm not a native speaker, but let me try to help. A new doc version is attached. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 9a7f683..1532bcc 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -6286,13 +6286,46 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); exceeds the number of separators in the template. - If FX is specified, separator in template string - matches to exactly one character in input string. Notice we don't insist - input string character to be the same as template string separator. + If FX is specified, a separator in the template + string matches exactly one character in the input string. The + input string character does not need to be the same as the template + string separator. For example, to_timestamp('2000/JUN', 'FX MON') works, but to_timestamp('2000/JUN', 'FXMON') - returns an error because a space second template string space consumed - letter J from the input string. + returns an error because the second space in the template string + consumes letter J of the input string. + + + + + +Template strings of to_timestamp and +to_date functions can also contain arbitrary +letters/digits between patterns. Such letters/digits can match +any characters in the input string. For example, +to_timestamp('2000yJUN', 'xMON') works. + + +Letters/digits consume an input string character only if the number +of extra characters at the beginning of the input string or between +the identified date/time values is less than or equal to the number +of the corresponding characters in the template string. This ensures +that the template string does not consume any characters of date/time +values when used without the FX option, even if +a letter/digit separator in the input string appears after a space. +For example, to_timestamp('2000y JUN', 'xMON') +works, but to_timestamp('2000 yJUN', 'xMON') +returns an error. + + +Note that if the template string contains an arbitrary letter, +the pattern that precedes this letter becomes greedy and tries +to match as many characters as possible. For example, +to_timestamp('2000906901', 'xMMxDD') +fails because the pattern matches +the whole input string instead of the first four characters. +Patterns separated by digits are non-greedy, so +to_timestamp('2000906901', '0MM0DD') works fine.
Re: [HACKERS] Bug in to_timestamp().
On 08/14/2018 06:38 PM, Alexander Korotkov wrote: On Thu, Aug 2, 2018 at 9:06 PM Alexander Korotkov wrote: On Thu, Aug 2, 2018 at 6:17 PM Alexander Korotkov wrote: After some experiments I found that when you mix spaces and separators between two fields, then Oracle takes into account only length of last group of spaces/separators. # SELECT to_timestamp('2018- -01 02', ' --- --MM-DD') FROM dual2018-01-01 00:00:00 -10:00 (length of last spaces/separators group is 2) # SELECT to_timestamp('2018- -01 02', ' --- --MM-DD') FROM dual 2018-01-01 00:00:00 -10:00 (length of last spaces/separators group is 3) # SELECT to_timestamp('2018- -01 02', ' -- ---MM-DD') FROM dual 02.01.2018 00:00:00 (length of last spaces/separators group is 2) Ooops... I'm sorry, but I've posted wrong results here. Correct version is here. # SELECT to_timestamp('2018- -01 02', ' --- --MM-DD') FROM dual ORA-01843: not a valid month (length of last spaces/separators group is 2) # SELECT to_timestamp('2018- -01 02', ' -- ---MM-DD') FROM dual 02.01.2018 00:00:00 (length of last spaces/separators group is 3) So length of last group of spaces/separators in the pattern should be greater or equal to length of spaces/separators in the input string. Other previous groups are ignored in Oracle. And that seems ridiculous for me. BTW, I've also revised documentation and regression tests. Patch is attached. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Hi, Please consider some further documentation improvements in the attached patch. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index edc9be9..a8bbafc 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -6262,11 +6262,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); to_timestamp and to_date - skip multiple blank spaces in the input string unless the - FX option is used. For example, - to_timestamp('2000JUN', ' MON') works, but + skip multiple blank spaces at the beginning of the input string and + around date and time values unless the FX option is used. For example, + to_timestamp('2000JUN', ' MON') and + to_timestamp('2000 - JUN', '-MON') work, but to_timestamp('2000JUN', 'FX MON') returns an error - because to_timestamp expects one space only. + because to_timestamp expects a single space only. FX must be specified as the first item in the template. @@ -6274,6 +6275,43 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); + A separator (a space or a non-letter/non-digit character) in the template string of + to_timestamp and to_date + matches any single separator in the input string or is skipped, + unless the FX option is used. + For example, to_timestamp('2000JUN', '///MON') and + to_timestamp('2000/JUN', ' MON') work, but + to_timestamp('2000//JUN', '/MON') + returns an error because the number of separators in the input string + exceeds the number of separators in the template. + + + If FX is specified, separators in the + input and template strings must match exactly. For example, + to_timestamp('2000/JUN', 'FX MON') + returns an error because a space is expected in the input string. + + + + + + TZH template pattern can match a signed number. + Without the FX option, it may lead to ambiguity in + interpretation of the minus sign, which can also be interpreted as a separator. + This ambiguity is resolved as follows. If the number of separators before + TZH in the template string is less than the number of + separators before the minus sign in the input string, the minus sign + is interpreted as part of TZH. + Otherwise, the minus sign is considered to be a separator between values. + For example, to_timestamp('2000 -10', ' TZH') matches + -10 to TZH, but + to_timestamp('2000 -10', 'TZH') + matches 10 to TZH. + + + + + Ordinary text is allowed in to_char templates and will be output literally. You can put a substring in double quotes to force it to be interpreted as literal text @@ -6287,6 +6325,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); string; for example "XX" skips two input characters (whether or not they are XX). + + + Prior to PostgreSQL 12, it was possible to + skip arbitrary text in the input string using non-letter or non-digit + characters. For example, + to_timestamp('2000y6m
Re: Fix for documentation of Covering Indexes
On 04/18/2018 12:52 PM, Heikki Linnakangas wrote: On 11/04/18 04:20, Michael Paquier wrote: Hi all, The documentation of covering indexes is incorrect for CREATE and ALTER TABLE: - ALTER TABLE's page is missing the call. - Exclusion constraints can use INCLUDE clauses. In order to simplify the documentation, please let me suggest the attached which moves the definition of the INCLUDE clause into the section index_parameters, which is compatible with what I read from the parser. Committed, thanks! - Heikki Following this change, I believe we need to modify UNIQUE and PRIMARY KEY descriptions in CREATE TABLE as they still mention INCLUDE but not the other index_parameters. The attached patch fixes this inconsistency, as well as adds a separate paragraph for INCLUDE in CREATE TABLE to clarify its purpose and avoid repetition in constraint descriptions. It also reorders the paragraphs in CREATE INDEX to follow the syntax. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 3c1223b..c67f187 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -145,52 +145,6 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] - INCLUDE - - -The optional INCLUDE clause specifies a -list of columns which will be included in the index -as non-key columns. A non-key column cannot -be used in an index scan search qualification, and it is disregarded -for purposes of any uniqueness or exclusion constraint enforced by -the index. However, an index-only scan can return the contents of -non-key columns without having to visit the index's table, since -they are available directly from the index entry. Thus, addition of -non-key columns allows index-only scans to be used for queries that -otherwise could not use them. - - - -It's wise to be conservative about adding non-key columns to an -index, especially wide columns. If an index tuple exceeds the -maximum size allowed for the index type, data insertion will fail. -In any case, non-key columns duplicate data from the index's table -and bloat the size of the index, thus potentially slowing searches. - - - -Columns listed in the INCLUDE clause don't need -appropriate operator classes; the clause can include -columns whose data types don't have operator classes defined for -a given access method. - - - -Expressions are not supported as included columns since they cannot be -used in index-only scans. - - - -Currently, only the B-tree index access method supports this feature. -In B-tree indexes, the values of columns listed in the -INCLUDE clause are included in leaf tuples which -correspond to heap tuples, but are not included in upper-level -index entries used for tree navigation. - - - - - name @@ -317,6 +271,52 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] + INCLUDE + + +The optional INCLUDE clause specifies a +list of columns which will be included in the index +as non-key columns. A non-key column cannot +be used in an index scan search qualification, and it is disregarded +for purposes of any uniqueness or exclusion constraint enforced by +the index. However, an index-only scan can return the contents of +non-key columns without having to visit the index's table, since +they are available directly from the index entry. Thus, addition of +non-key columns allows index-only scans to be used for queries that +otherwise could not use them. + + + +It's wise to be conservative about adding non-key columns to an +index, especially wide columns. If an index tuple exceeds the +maximum size allowed for the index type, data insertion will fail. +In any case, non-key columns duplicate data from the index's table +and bloat the size of the index, thus potentially slowing searches. + + + +Columns listed in the INCLUDE clause don't need +appropriate operator classes; the clause can include +columns whose data types don't have operator classes defined for +a given access method. + + + +Expressions are not supported as included columns since they cannot be +used in index-only scans. + + + +Currently, only the B-tree index access method supports this feature. +In B-tree indexes, the values of columns listed in the +INCLUDE clause are included in leaf
Re: documentation is now XML
On 04/27/2018 06:36 PM, Tom Lane wrote: Bruce Momjian writes: On Fri, Apr 27, 2018 at 11:00:36AM -0400, Peter Eisentraut wrote: That proposal seemed to indicate not only converting the source code to XML but also the build system to XSL. The latter is out of the question, I think. Why is that? I thought people building current Postgres would already have the XML toolchain built to build older release documentation. Yeah, in principle, anyone who's able to build the v10 branch's docs should be ready for such a change. We had discussed polling the pgsql-packagers list to see whether anyone would have a problem with changing the docs build toolset for the pre-v10 branches, but I don't think we actually asked them. regards, tom lane To complete the picture of possible issues with older branches in XML, we posted a question in packager lists some time ago and didn't receive any objections. Just to keep record of all related questions in one place, here's the link: https://www.postgresql.org/message-id/flat/06efc906-16f8-0cde-5bee-e3d5abfc00ba%40postgrespro.ru#06efc906-16f8-0cde-5bee-e3d5abfc0...@postgrespro.ru We totally understand the reluctance to volunteer personal time for manual testing of legacy branches - sad as it is that we may miss some benefits of the previous efforts. E.g. all styles and transforms have been stabilized for version 11, so they could be reused for older branches. As for testing the content conversion, our scripts can handle it in a fully automated way by comparing the original and the final outputs via the .txt format, so all possible differences will be caught. It's unfortunate that we'll have to deal with different formats in the supported branches for several more years, but we at Postgres Professional are ready to accept any your decision on this matter for now. -- Liudmila Mantrova Technical writer at Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: doc fixes: vacuum_cleanup_index_scale_factor
On 05/08/2018 02:05 PM, Justin Pryzby wrote: 3rd iteration ; thanks for bearing with me. On Tue, May 08, 2018 at 12:35:00PM +0300, Alexander Korotkov wrote: Hi, Justin! Thank you for revising documentation patch. On Mon, May 7, 2018 at 7:55 PM, Justin Pryzby <pry...@telsasoft.com> wrote: +In order to detect stale index statistics, the number of total heap +tuples during previous statistics collection is stored in the index +meta-page. Consider removing: "during previous statistics collection" Or: "during THE previous statistics collection" + Once the number of inserted tuples since previous since THE previous +statistics collection is more than +vacuum_cleanup_index_scale_factor fraction of Since the multiplier can be greater than 1, should we say "multiple" instead of fraction? +during VACUUM cycle. Thus, skipping of the B-tree +index scan during cleanup stage is only possible when second and +subsequent VACUUM cycles detecting no dead tuples. Change "detecting" to "detect". Or maybe just "find" Justin Hi Justin, Thank you for helping with the docs. Attached is another doc patch that should address most of the issues you've brought up. I've also reshuffled the text a bit to make it more like an option description. Hope you'll find it useful. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index ffea744..c4afd14 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1893,15 +1893,34 @@ include_dir 'conf.d' -When no tuples were deleted from the heap, B-tree indexes might still -be scanned during VACUUM cleanup stage by two -reasons. The first reason is that B-tree index contains deleted pages -which can be recycled during cleanup. The second reason is that B-tree -index statistics is stalled. The criterion of stalled index statistics -is number of inserted tuples since previous statistics collection -is greater than vacuum_cleanup_index_scale_factor -fraction of total number of heap tuples. +Specifies the fraction of the total number of heap tuples counted in +the previous statistics collection that can be inserted without +incurring an index scan at the VACUUM cleanup stage. +This setting currently applies to B-tree indexes only. + + +If no tuples were deleted from the heap, B-tree indexes are still +scanned at the VACUUM cleanup stage when at least one +of the following conditions is met: the index statistics are stale, or +the index contains deleted pages that can be recycled during cleanup. +Index statistics are considered to be stale if the number of newly +inserted tuples exceeds the vacuum_cleanup_index_scale_factor +fraction of the total number of heap tuples detected by the previous +statistics collection. The total number of heap tuples is stored in +the index meta-page. Note that the meta-page does not include this data +until VACUUM finds no dead tuples, so B-tree index +scan at the cleanup stage can only be skipped if the second and +subsequent VACUUM cycles detect no dead tuples. + + + +The value can range from 0 to 100. +When vacuum_cleanup_index_scale_factor is set to +0, index scans are never skipped during +VACUUM cleanup. The default value is 0.1. + + diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c index 3bcc56e..22b4a75 100644 --- a/src/backend/access/nbtree/nbtpage.c +++ b/src/backend/access/nbtree/nbtpage.c @@ -189,7 +189,7 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact, if (metad->btm_version < BTREE_VERSION) _bt_upgrademetapage(metapg); - /* update cleanup-related infromation */ + /* update cleanup-related information */ metad->btm_oldest_btpo_xact = oldestBtpoXact; metad->btm_last_cleanup_num_heap_tuples = numHeapTuples; MarkBufferDirty(metabuf); diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index d894ba0..27a3032 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -818,10 +818,11 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info) float8 cleanup_scale_factor; /* - * If table receives large enough amount of insertions and no cleanup - * was performed, then index might appear to have stalled statistics. - * In order to evade that, we perform cleanup when table receives - * vacuum_cleanup_index_scale_factor fractions of insertions. + * If table receives enough insertions and no cleanup
Re: documentation is now XML
Hi Peter, Thank you for taking your time to comment on this long-discussed topic. I was not around when all the great work was done to implement XSLT transforms in branch 10 and convert branch 11 to XML, but judging by this thread (https://www.postgresql.org/message-id/flat/4ffd72d6-8ab6-37c6-d7d5-dfed8967c0fc%40gmail.com), much of the testing was automated and could be re-applied to older branches. Taking into account that both XSLT transforms and testing scripts can be reused for the most part, do you think that the benefits of having consistent source (easy back-porting of doc patches and one-time translation to all languages) in all the supported branches could potentially outweigh the inevitable conversion overhead? Are there any specific features that make branch 10 considerably different from its predecessors that we are missing? On 04/30/2018 10:20 PM, Peter Eisentraut wrote: On 4/27/18 11:03, Bruce Momjian wrote: On Fri, Apr 27, 2018 at 11:00:36AM -0400, Peter Eisentraut wrote: On 4/23/18 05:54, Liudmila Mantrova wrote: Reading this thread, I got an impression that everyone would benefit from converting back branches to XML, but the main concern is lack of resources to complete this task. Are there any other issues that affect this decision? Looks like Aleksander Lakhin's offer to prepare patches was missed somehow as the discussion sidetracked to other issues That proposal seemed to indicate not only converting the source code to XML but also the build system to XSL. The latter is out of the question, I think. Why is that? Because there would be a thousand lines of tooling changes to be backported and thousands of pages of documentation to be checked manually that it doesn't create a mess (times branches times platforms). -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: documentation is now XML
Hi everyone, Reading this thread, I got an impression that everyone would benefit from converting back branches to XML, but the main concern is lack of resources to complete this task. Are there any other issues that affect this decision? Looks like Aleksander Lakhin's offer to prepare patches was missed somehow as the discussion sidetracked to other issues: I can prepare such patches (scripts to generate them). In fact we (Postgres Pro) perform such conversion (SGML->XML) on-fly when building docs starting from 9.6. So it's not problem to convert *.sgml and replace Makefile and *.xsl. But I would prefer to perform the conversion when we finish the move on 11devel (renaming sgml to xml, maybe optimizing xsl's...). Do you think it's worth considering after May releases? I could also help with manual testing of the prepared patches if required. Supporting documentation and translation in several branches will be much easier if the sources are consistent. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug
Hi everyone, When translating doc updates, Alexander Lakhin noticed that trigram examples were not quite accurate. A small patch fixing this issue is attached. On 03/21/2018 03:35 PM, Teodor Sigaev wrote: Thank you, pushed David Steele wrote: On 3/6/18 7:04 AM, Teodor Sigaev wrote: I agree with Teodor (upthread, not quoted here) that the documentation could use some editing. I started to do it myself, but quickly realized I have no knowledge of the content. I'm afraid I would destroy the meaning while updating the grammar. Anyone understand the subject matter well enough to review the documentation? Liudmila tried to improve docs in Alexander's patchset. https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752c...@postgrespro.ru This looks good to me with a few minor exceptions: + word_similarity(text, text) requires further + explanation. Consider the following example: Maybe too verbose? I think "word_similarity(text, text) requires further explanation." can be removed entirely. + string. However, this function does not add paddings to the "add padding" BTW, adding Liudmila's message to commitfest task (https://commitfest.postgresql.org/17/1403/) doesn't work Doesn't work for me either. Alexander, can you post the final patches to the thread so they show up in the CF app? Thanks, -- Liudmila Mantrova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml index 8f39529..be43cdf 100644 --- a/doc/src/sgml/pgtrgm.sgml +++ b/doc/src/sgml/pgtrgm.sgml @@ -152,9 +152,9 @@ In the first string, the set of trigrams is - {" w"," wo","ord","wor","rd "}. + {" w"," wo","wor","ord","rd "}. In the second string, the ordered set of trigrams is - {" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}. + {" t"," tw","two","wo "," w"," wo","wor","ord","rds","ds "}. The most similar extent of an ordered set of trigrams in the second string is {" w"," wo","wor","ord"}, and the similarity is 0.8. @@ -172,7 +172,7 @@ At the same time, strict_word_similarity(text, text) has to select an extent that matches word boundaries. In the example above, strict_word_similarity(text, text) would select the - extent {" w"," wo","wor","ord","rds", ds "}, which + extent {" w"," wo","wor","ord","rds","ds "}, which corresponds to the whole word 'words'.
Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug
Hello everyone, I would like to contribute to documentation review of the patches discussed in thread https://www.postgresql.org/message-id/flat/cy4pr17mb13207ed8310f847cf117eed0d8...@cy4pr17mb1320.namprd17.prod.outlook.com (https://commitfest.postgresql.org/16/1403/). Unfortunately, I was not subscribed to pgsql-hackers before, so I cannot respond to this thread directly. Please find attached new revisions of the original patches. I hope you'll find the changes useful! -- Liudmila Mantrova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c index f7e96ac..306d60b 100644 --- a/contrib/pg_trgm/trgm_op.c +++ b/contrib/pg_trgm/trgm_op.c @@ -456,7 +456,7 @@ iterate_word_similarity(int *trg2indexes, lastpos[trgindex] = i; } - /* Adjust lower bound if this trigram is present in required substring */ + /* Adjust upper bound if this trigram is present in required substring */ if (found[trgindex]) { int prev_lower, @@ -473,7 +473,7 @@ iterate_word_similarity(int *trg2indexes, smlr_cur = CALCSML(count, ulen1, ulen2); - /* Also try to adjust upper bound for greater similarity */ + /* Also try to adjust lower bound for greater similarity */ tmp_count = count; tmp_ulen2 = ulen2; prev_lower = lower; diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml index 338ef30..005961c 100644 --- a/doc/src/sgml/pgtrgm.sgml +++ b/doc/src/sgml/pgtrgm.sgml @@ -99,12 +99,10 @@ real - Returns a number that indicates how similar the first string - to the most similar word of the second string. The function searches in - the second string a most similar word not a most similar substring. The - range of the result is zero (indicating that the two strings are - completely dissimilar) to one (indicating that the first string is - identical to one of the words of the second string). + Returns a number that indicates the greatest similarity between + the set of trigrams in the first string and any continuous extent + of an ordered set of trigrams in the second string. For details, see + the explanation below. @@ -131,6 +129,35 @@ + + word_similarity(text, text) requires further + explanation. Consider the following example: + + +# SELECT word_similarity('word', 'two words'); + word_similarity +- + 0.8 +(1 row) + + + In the first string, the set of trigrams is + {" w"," wo","ord","wor","rd "}. + In the second string, the ordered set of trigrams is + {" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}. + The most similar extent of an ordered set of trigrams in the second string + is {" w"," wo","wor","ord"}, and the similarity is + 0.8. + + + + This function returns a value that can be approximately understood as the + greatest similarity between the first string and any substring of the second + string. However, this function does not add paddings to the boundaries of + the extent. Thus, a whole word match gets a higher score than a match with + a part of the word. + + pg_trgm Operators @@ -156,10 +183,11 @@ text % text boolean - Returns true if its first argument has the similar word in - the second argument and they have a similarity that is greater than the - current word similarity threshold set by - pg_trgm.word_similarity_threshold parameter. + Returns true if the similarity between the trigram + set in the first argument and a continuous extent of an ordered trigram + set in the second argument is greater than the current word similarity + threshold set by pg_trgm.word_similarity_threshold + parameter. @@ -302,10 +330,11 @@ SELECT t, word_similarity('word', t) AS sml WHERE 'word' % t ORDER BY sml DESC, t; - This will return all values in the text column that have a word - which sufficiently similar to word, sorted from best - match to worst. The index will be used to make this a fast operation - even over very large data sets. + This will return all values in the text column for which there is a + continuous extent in the corresponding ordered trigram set that is + sufficiently similar to the trigram set of word, + sorted from best match to worst. The index will be used to make this + a fast operation even over very large data sets. diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile index 212a890..dfecc2a 100644 --- a/contrib/pg_trgm/Makefile +++ b/contrib/pg_trgm/Makefile @@ -4,11 +4,12 @@ MODULE_big = pg_trgm OBJS = trgm_op.o trgm_gist.o trgm_g