Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-20 Thread Thom Brown
On Mon, 20 May 2024 at 00:24, David Rowley  wrote:

> On Mon, 20 May 2024 at 09:35, Jonathan S. Katz 
> wrote:
> > Thanks for all the feedback to date. Please see the next revision.
> > Again, please provide feedback no later than 2024-05-22 18:00 UTC.
>
> Thanks for the updates.
>
> > [`COPY`](https://www.postgresql.org/docs/17/sql-copy.html) is used to
> efficiently bulk load data into PostgreSQL, and with PostgreSQL 17 shows a
> 2x performance improvement when loading large rows.
>
> The 2x thing mentioned by Jelte is for COPY TO rather than COPY FROM.
> So I think "exporting" or "sending large rows to the client"  rather
> than "loading".
>
> There's also a stray "with" in that sentence.
>

Are you referring to the "with" in "and with PostgreSQL 17"? If so, it
looks valid to me.
-- 
Thom


Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-15 Thread Thom Brown
On Thu, May 16, 2024, 06:37 zaidagilist  wrote:

> Hello,
>
> I am trying to open the 17 docs but it looks removed. Getting
> following message "Page not found"
>
> https://www.postgresql.org/docs/17/
>
>
> Regards,
> Zaid Shabbir
>

That link isn't set up yet, but will be (or should be) when the
announcement goes out.

Regards

Thom

>


Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-15 Thread Thom Brown
On Thu, May 16, 2024, 02:45 Jonathan S. Katz  wrote:

> Hi,
>
> Attached is a copy of the PostgreSQL 17 Beta 1 release announcement
> draft. This contains a user-facing summary of some of the features that
> will be available in the Beta, as well as a call to test. I've made an
> effort to group them logically around the different workflows they affect.
>
> A few notes:
>
> * The section with the features is not 80-char delimited. I will do that
> before the final copy
>
> * There is an explicit callout that we've added in the SQL/JSON features
> that were previously reverted in PG15. I want to ensure we're
> transparent about that, but also use it as a hook to get people testing.
>
> When reviewing:
>
> * Please check for correctness of feature descriptions, keeping in mind
> this is targeted for a general audience
>
> * Please indicate if you believe there's a notable omission, or if we
> should omit any of these callouts
>
> * Please indicate if a description is confusing - I'm happy to rewrite
> to ensure it's clearer.
>
> Please provide feedback no later than Wed 2024-05-22 18:00 UTC. As the
> beta release takes some extra effort, I want to ensure all changes are
> in with time to spare before release day.
>

"Now as of PostgreSQL 17, you can now use parallel index builds for [BRIN](
https://www.postgresql.org/docs/17/brin.html) indexes."

The 2nd "now" is redundant.


"Finally, PostgreSQL 17 adds more explicitly SIMD instructions, including
AVX-512 support for the [`bit_count](
https://www.postgresql.org/docs/17/functions-bitstring.html) function."

Would "SIMD-explicit instructions" be better? Also, I know you may not be
using markdown for the final version, but the bit_count backtick isn't
matched by a closing backtick.


"[`COPY`](https://www.postgresql.org/docs/17/sql-copy.html), used to
efficiently bulk load data into PostgreSQL"

The "used to" makes me stumble into reading it as meaning "it previously
could efficiently bulk load data".

Perhaps just add a "which is" before "used"?


"PostgreSQL 17 includes a built-in collation provider that provides similar
semantics to the `C` collation provided by libc."

"provider", "provides", and "provided" feels too repetitive.

How about, "PostgreSQL 17 includes a built-in collation provider with
semantics similar to the `C` collation offered by libc."?


Regards

Thom


Re: remaining sql/json patches

2024-05-15 Thread Thom Brown
On Mon, 8 Apr 2024 at 10:09, Amit Langote  wrote:

> On Mon, Apr 8, 2024 at 2:02 PM jian he 
> wrote:
> > On Mon, Apr 8, 2024 at 11:21 AM jian he 
> wrote:
> > >
> > > On Mon, Apr 8, 2024 at 12:34 AM jian he 
> wrote:
> > > >
> > > > On Sun, Apr 7, 2024 at 9:36 PM Amit Langote 
> wrote:
> > > > > 0002 needs an expanded commit message but I've run out of energy
> today.
> > > > >
> > >
> > > other than that, it looks good to me.
> >
> > one more tiny issue.
> > +EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view1;
> > +ERROR:  relation "jsonb_table_view1" does not exist
> > +LINE 1: EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view1...
> > +   ^
> > maybe you want
> > EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view7;
> > at the end of the sqljson_jsontable.sql.
> > I guess it will be fine, but the format json explain's out is quite big.
> >
> > you also need to `drop table s cascade;` at the end of the test?
>
> Pushed after fixing this and other issues.  Thanks a lot for your
> careful reviews.
>
> I've marked the CF entry for this as committed now:
> https://commitfest.postgresql.org/47/4377/
>
> Let's work on the remaining PLAN clause with a new entry in the next
> CF, possibly in a new email thread.
>

I've just taken a look at the doc changes, and I think we need to either
remove the leading "select" keyword, or uppercase it in the examples.

For example (on
https://www.postgresql.org/docs/devel/functions-json.html#SQLJSON-QUERY-FUNCTIONS
):

json_exists ( context_item, path_expression [ PASSING { value AS varname }
[, ...]] [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ])

Returns true if the SQL/JSON path_expression applied to the context_item
using the PASSING values yields any items.
The ON ERROR clause specifies the behavior if an error occurs; the default
is to return the boolean FALSE value. Note that if the path_expression is
strict and ON ERROR behavior is ERROR, an error is generated if it yields
no items.
Examples:
select json_exists(jsonb '{"key1": [1,2,3]}', 'strict $.key1[*] ? (@ > 2)')
→ t
select json_exists(jsonb '{"a": [1,2,3]}', 'lax $.a[5]' ERROR ON ERROR) → f
select json_exists(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' ERROR ON ERROR) →
ERROR:  jsonpath array subscript is out of bounds

Examples are more difficult to read when keywords appear to be at the same
level as predicates.  Plus other examples within tables on the same page
don't start with "select", and further down, block examples uppercase
keywords.  Either way, I don't like it as it is.

Separate from this, I think these tables don't scan well (see json_query
for an example of what I'm referring to).  There is no clear separation of
the syntax definition, the description, and the example. This is more a
matter for the website mailing list, but I'm expressing it here to check
whether others agree.

Thom


Re: Document NULL

2024-05-11 Thread Thom Brown
On Sat, May 11, 2024, 16:34 David G. Johnston 
wrote:

> On Fri, May 3, 2024 at 9:00 AM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Fri, May 3, 2024 at 8:44 AM Tom Lane  wrote:
>>
>>> Having said that, I reiterate my proposal that we make it a new
>>>
>>  under DDL, before 5.2 Default Values which is the first
>>> place in ddl.sgml that assumes you have heard of nulls.
>>
>>
>> I will go with this and remove the "Data Basics" section I wrote, leaving
>> it to be just a discussion about null values.  The tutorial is the only
>> section that really needs unique wording to fit in.  No matter where we
>> decide to place it otherwise the core content will be the same, with maybe
>> a different section preface to tie it in.
>>
>>
> v3 Attached.
>
> Probably at the 90% complete mark.  Minimal index entries, not as thorough
> a look-about of the existing documentation as I'd like.  Probably some
> wording and style choices to tweak.  Figured better to get feedback now
> before I go into polish mode.  In particular, tweaking and re-running the
> examples.
>
> Yes, I am aware of my improper indentation for programlisting and screen.
> I wanted to be able to use the code folding features of my editor.  Those
> can be readily un-indented in the final version.
>
> The changes to func.sgml is basically one change repeated something like
> 20 times with tweaks for true/false.  Plus moving the discussion regarding
> the SQL specification into the new null handling section.
>
> It took me doing this to really understand the difference between row
> constructors and composite typed values, especially since array
> constructors produce array typed values and the constructor is just an
> unimportant implementation option while row constructors introduce
> meaningfully different behaviors when used.
>
> My plan is to have a v4 out next week, without or without a review of this
> draft, but then the subsequent few weeks will probably be a bit quiet.
>

+   The cardinal rule, a given null value is never
+   equal or unequal
+   to any other non-null.

Again, doesn't this imply it tends to be equal to another null by its
omission?

Thom

>


Re: Document NULL

2024-05-01 Thread Thom Brown
On Wed, May 1, 2024, 16:13 David G. Johnston 
wrote:

> Hi,
>
> Over in [1] it was rediscovered that our documentation assumes the reader
> is familiar with NULL.  It seems worthwhile to provide both an introduction
> to the topic and an overview of how this special value gets handled
> throughout the system.
>
> Attached is a very rough draft attempting this, based on my own thoughts
> and those expressed by Tom in [1], which largely align with mine.
>
> I'll flesh this out some more once I get support for the goal, content,
> and placement.  On that point, NULL is a fundamental part of the SQL
> language and so having it be a section in a Chapter titled "SQL Language"
> seems to fit well, even if that falls into our tutorial.  Framing this up
> as tutorial content won't be that hard, though I've skipped on examples and
> such pending feedback.  It really doesn't fit as a top-level chapter under
> part II nor really under any of the other chapters there.  The main issue
> with the tutorial is the forward references to concepts not yet discussed
> but problem points there can be addressed.
>
> I do plan to remove the entity reference and place the content into
> query.sgml directly in the final version.  It is just much easier to write
> an entire new section in its own file.
>
> David J.
>
> [1]
> https://www.postgresql.org/message-id/1859814.1714532025%40sss.pgh.pa.us
>

"The cardinal rule, NULL is never equal or unequal to any non-null value."

This implies that a NULL is generally equal or unequal to another NULL.
While this can be true (e.g. in aggregates), in general it is not. Perhaps
immediately follow it with something along the lines of "In most cases NULL
is also not considered equal or unequal to any other NULL (i.e. NULL = NULL
will return NULL), but there are occasional exceptions, which will be
explained further on."

Regards

Thom


Automatic tablespace management in pg_basebackup

2024-04-26 Thread Thom Brown
Hi,

Manually specifying tablespace mappings in pg_basebackup, especially in
environments where tablespaces can come and go, or with incremental
backups, can be tedious and error-prone. I propose a solution using
pattern-based mapping to automate this process.

So rather than having to specify.

-T /path/to/original/tablespace/a=/path/to/backup/tablespace/a -T
/path/to/original/tablespace/b=/path/to/backup/tablespace/b

And then coming up with a new location to map to for the subsequent
incremental backups, perhaps we could have a parameter (I’m just going to
choose M for “mapping”), like so:

-M %p/%d_backup_1.1

Where it can interpolate the following values:
%p = path
%d = directory
%l = label (not sure about this one)


Using the -M example above, when pg_basebackup finds:

/path/to/original/tablespace/a
/path/to/original/tablespace/b

It creates:

/path/to/original/tablespace/a_backup_1.1
/path/to/original/tablespace/b_backup_1.1


Or:

-M /path/to/backup/tablespaces/1.1/%d

Creates:

/path/to/backup/tablespaces/1.1/a
/path/to/backup/tablespaces/1.1/b


Or possibly allowing something like %l to insert the backup label.

For example:

-M /path/to/backup/tablespaces/%f_%l -l 1.1

Creates:

/path/to/backup/tablespaces/a_1.1
/path/to/backup/tablespaces/b_1.1


This of course would not work if there were tablespaces as follows:

/path/to/first/tablespace/a
/path/to/second/tablespace/a

Where %d would yield the same result for both tablespaces.  However, this
seems like an unlikely scenario as the tablespace name within the database
would need to be unique, but then requires them to use a directory name
that isn't unique.  This could just be a scenario that isn't supported.

Perhaps even allow it to auto-increment a version number it defines
itself.  Maybe %v implies “make up a version number here, and if one
existed in the manifest previously, increment it”.


Ultimately, it would turn this:

pg_basebackup
  -D /Users/thombrown/Development/backups/data1.5
  -h /tmp
  -p 5999
  -c fast
  -U thombrown
  -l 1.5
  -T
/Users/thombrown/Development/tablespaces/ts_a=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_a
  -T
/Users/thombrown/Development/tablespaces/ts_b=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_b
  -T
/Users/thombrown/Development/tablespaces/ts_c=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_c
  -T
/Users/thombrown/Development/tablespaces/ts_d=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_d
  -i /Users/thombrown/Development/backups/data1.4/backup_manifest

Into this:

pg_basebackup
  -D /Users/thombrown/Development/backups/1.5/data
  -h /tmp
  -p 5999
  -c fast
  -U thombrown
  -l 1.5
  -M /Users/thombrown/Development/backups/tablespaces/%v/%d
  -i /Users/thombrown/Development/backups/data1.4/backup_manifest

In fact, if I were permitted to get carried away:

-D /Users/thombrown/Development/backups/%v/%d

Then, the only thing that needs changing for each incremental backup is the
manifest location (and optionally the label).


Given that pg_combinebackup has the same option, I imagine something
similar would need to be added there too.  We should already know where the
tablespaces reside, as they are in the final backup specified in the list
of backups, so that seems to just be a matter of getting input of how the
tablespaces should be named in the reconstructed backup.

For example:

pg_combinebackup
  -T
/Users/thombrown/Development/backups/tablespaces/1.4/ts_a=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_a
  -T
/Users/thombrown/Development/backups/tablespaces/1.4/ts_b=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_b
  -T
/Users/thombrown/Development/backups/tablespaces/1.4/ts_c=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_c
  -T
/Users/thombrown/Development/backups/tablespaces/1.4/ts_d=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_d
  -o /Users/thombrown/Development/backups/combined
  /Users/thombrown/Development/backups/data{1.0_full,1.1,1.2,1.3,1.4}

Becomes:
pg_combinebackup
  -M /Users/thombrown/Development/backups/tablespaces/%v_combined/%d
  -o /Users/thombrown/Development/backups/%v_combined/%d
  /Users/thombrown/Development/backups/{1.0_full,1.1,1.2,1.3,1.4}/data

You may have inferred that I decided pg_combinebackup increments the
version to the next major version, whereas pg_basebackup in incremental
mode increments the minor version number.

This, of course, becomes messy if the user decided to include the version
number in the backup tablespace directory name, but then these sorts of
things need to be figured out prior to placing into production anyway.

I also get the feeling that accepting an unquoted % as a parameter on the
command line could be problematic, such as it having a special meaning I
haven't accounted for here.  In which case, it may require quoting.

Thoughts?

Regards

Thom


Re: trying again to get incremental backup

2024-04-25 Thread Thom Brown
On Wed, 3 Jan 2024 at 15:10, Robert Haas  wrote:

> On Fri, Dec 22, 2023 at 12:00 AM Alexander Lakhin 
> wrote:
> > My quick experiment shows that that TimestampDifferenceMilliseconds call
> > always returns zero, due to it's arguments swapped.
>
> Thanks. Tom already changed the unsigned -> int stuff in a separate
> commit, so I just pushed the fixes to PrepareForIncrementalBackup,
> both the one I had before, and swapping the arguments to
> TimestampDifferenceMilliseconds
>

I would like to query the following:

--tablespace-mapping=olddir=newdir

Relocates the tablespace in directory olddir to newdir during the
backup. olddir is the absolute path of the tablespace as it exists in the
first backup specified on the command line, and newdir is the absolute path
to use for the tablespace in the reconstructed backup.

The first backup specified on the command line will be the regular, full,
non-incremental backup.  But if a tablespace was introduced subsequently,
it would only appear in an incremental backup.  Wouldn't this then mean
that a mapping would need to be provided based on the path to the
tablespace of that incremental backup's copy?

Regards

Thom


Re: Disabling Heap-Only Tuples

2023-07-19 Thread Thom Brown
On Wed, 19 Jul 2023, 13:58 Laurenz Albe,  wrote:

> On Thu, 2023-07-06 at 22:18 +0200, Matthias van de Meent wrote:
> > On Wed, 5 Jul 2023 at 19:55, Thom Brown  wrote:
> > >
> > > On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
> > >  wrote:
> > > > So what were you thinking of? A session GUC? A table option?
> > >
> > > Both.
> >
> > Here's a small patch implementing a new table option max_local_update
> > (name very much bikesheddable). Value is -1 (default, disabled) or the
> > size of the table in MiB that you still want to allow to update on the
> > same page. I didn't yet go for a GUC as I think that has too little
> > control on the impact on the system.
> >
> > I decided that max_local_update would be in MB because there is no
> > reloption value that can contain MaxBlockNumber and -1/disabled; and 1
> > MiB seems like enough granularity for essentially all use cases.
> >
> > The added regression tests show how this feature works, that the new
> > feature works, and validate that lock levels are acceptable
> > (ShareUpdateExclusiveLock, same as for updating fillfactor).
>
> I have looked at your patch, and I must say that I like it.  Having
> a size limit is better than my original idea of just "on" or "off".
> Essentially, it is "try to shrink the table if it grows above a limit".
>
> The patch builds fine and passes all regression tests.
>
> Documentation is missing.
>
> I agree that the name "max_local_update" could be improved.
> Perhaps "avoid_hot_above_size_mb".
>

Or "hot_table_size_threshold" or "hot_update_limit"?

Thom


Re: generic plans and "initial" pruning

2023-07-18 Thread Thom Brown
On Tue, 18 Jul 2023, 08:26 Amit Langote,  wrote:

> Hi Thom,
>
> On Tue, Jul 18, 2023 at 1:33 AM Thom Brown  wrote:
> > On Thu, 13 Jul 2023 at 13:59, Amit Langote 
> wrote:
> > > In an absolutely brown-paper-bag moment, I realized that I had not
> > > updated src/backend/executor/README to reflect the changes to the
> > > executor's control flow that this patch makes.   That is, after
> > > scrapping the old design back in January whose details *were*
> > > reflected in the patches before that redesign.
> > >
> > > Anyway, the attached fixes that.
> > >
> > > Tom, do you think you have bandwidth in the near future to give this
> > > another look?  I think I've addressed the comments that you had given
> > > back in April, though as mentioned in the previous message, there may
> > > still be some funny-looking aspects still remaining.  In any case, I
> > > have no intention of pressing ahead with the patch without another
> > > committer having had a chance to sign off on it.
> >
> > I've only just started taking a look at this, and my first test drive
> > yields very impressive results:
> >
> > 8192 partitions (3 runs, 1 rows)
> > Head 391.294989 382.622481 379.252236
> > Patched 13088.145995 13406.135531 13431.828051
>
> Just to be sure, did you use pgbench --Mprepared with plan_cache_mode
> = force_generic_plan in postgresql.conf?
>

I did.

For full disclosure, I also had max_locks_per_transaction set to 1.

>
> > Looking at your changes to README, I would like to suggest rewording
> > the following:
> >
> > +table during planning.  This means that inheritance child tables, which
> are
> > +added to the query's range table during planning, if they are present
> in a
> > +cached plan tree would not have been locked.
> >
> > To:
> >
> > This means that inheritance child tables present in a cached plan
> > tree, which are added to the query's range table during planning,
> > would not have been locked.
> >
> > Also, further down:
> >
> > s/intiatialize/initialize/
> >
> > I'll carry on taking a closer look and see if I can break it.
>
> Thanks for looking.  I've fixed these issues in the attached updated
> patch.  I've also changed the position of a newly added paragraph in
> src/backend/executor/README so that it doesn't break the flow of the
> existing text.
>

Thanks.

Thom

>


Re: generic plans and "initial" pruning

2023-07-17 Thread Thom Brown
On Thu, 13 Jul 2023 at 13:59, Amit Langote  wrote:
> In an absolutely brown-paper-bag moment, I realized that I had not
> updated src/backend/executor/README to reflect the changes to the
> executor's control flow that this patch makes.   That is, after
> scrapping the old design back in January whose details *were*
> reflected in the patches before that redesign.
>
> Anyway, the attached fixes that.
>
> Tom, do you think you have bandwidth in the near future to give this
> another look?  I think I've addressed the comments that you had given
> back in April, though as mentioned in the previous message, there may
> still be some funny-looking aspects still remaining.  In any case, I
> have no intention of pressing ahead with the patch without another
> committer having had a chance to sign off on it.

I've only just started taking a look at this, and my first test drive
yields very impressive results:

8192 partitions (3 runs, 1 rows)
Head 391.294989 382.622481 379.252236
Patched 13088.145995 13406.135531 13431.828051

Looking at your changes to README, I would like to suggest rewording
the following:

+table during planning.  This means that inheritance child tables, which are
+added to the query's range table during planning, if they are present in a
+cached plan tree would not have been locked.

To:

This means that inheritance child tables present in a cached plan
tree, which are added to the query's range table during planning,
would not have been locked.

Also, further down:

s/intiatialize/initialize/

I'll carry on taking a closer look and see if I can break it.

Thom




Re: Disabling Heap-Only Tuples

2023-07-07 Thread Thom Brown
On Thu, 6 Jul 2023 at 21:18, Matthias van de Meent
 wrote:
>
> On Wed, 5 Jul 2023 at 19:55, Thom Brown  wrote:
> >
> > On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
> >  wrote:
> > > So what were you thinking of? A session GUC? A table option?
> >
> > Both.
>
> Here's a small patch implementing a new table option max_local_update
> (name very much bikesheddable). Value is -1 (default, disabled) or the
> size of the table in MiB that you still want to allow to update on the
> same page. I didn't yet go for a GUC as I think that has too little
> control on the impact on the system.
>
> I decided that max_local_update would be in MB because there is no
> reloption value that can contain MaxBlockNumber and -1/disabled; and 1
> MiB seems like enough granularity for essentially all use cases.
>
> The added regression tests show how this feature works, that the new
> feature works, and validate that lock levels are acceptable
> (ShareUpdateExclusiveLock, same as for updating fillfactor).

Wow, thanks for working on this.

I've given it a test, and it does what I would expect it to do.

I'm aware of the concerns about the potential for the relocation to
land in an undesirable location, so perhaps that needs addressing.
But this is already considerably better than the current need to
update a row until it gets pushed off its current page.  Ideally there
would be tooling built around this where the user wouldn't need to
figure out how much of the table to UPDATE, or deal with VACUUMing
concerns.

But here's my quick test:

CREATE OR REPLACE FUNCTION compact_table(table_name IN TEXT)
RETURNS VOID AS $$
DECLARE
current_row RECORD;
old_ctid TID;
new_ctid TID;
keys TEXT;
update_query TEXT;
row_counter INTEGER := 0;
BEGIN
SELECT string_agg(a.attname || ' = ' || a.attname, ', ')
INTO keys
FROM
pg_index i
JOIN
pg_attribute a ON a.attnum = ANY(i.indkey)
WHERE
i.indrelid = table_name::regclass
AND a.attrelid = table_name::regclass
AND i.indisprimary;

IF keys IS NULL THEN
RAISE EXCEPTION 'Table % does not have a primary key.', table_name;
END IF;

FOR current_row IN
EXECUTE FORMAT('SELECT ctid, * FROM %I ORDER BY ctid DESC', table_name)
LOOP
old_ctid := current_row.ctid;

update_query := FORMAT('UPDATE %I SET %s WHERE ctid = $1
RETURNING ctid', table_name, keys);
EXECUTE update_query USING old_ctid INTO new_ctid;

row_counter := row_counter + 1;

IF row_counter % 1000 = 0 THEN
RAISE NOTICE '% rows relocated.', row_counter;
END IF;

IF new_ctid <= old_ctid THEN
CONTINUE;
ELSE
RAISE NOTICE 'All non-contiguous rows relocated.';
EXIT;
END IF;
END LOOP;
END; $$
LANGUAGE plpgsql;


postgres=# CREATE TABLE bigtable (id int, content text);
CREATE TABLE
postgres=# INSERT INTO bigtable SELECT x, 'This is just a way to fill
up space.' FROM generate_series(1,1000) a(x);
INSERT 0 1000
postgres=# DELETE FROM bigtable WHERE id % 7 = 0;
DELETE 1428571
postgres=# VACUUM bigtable;
VACUUM
postgres=# ALTER TABLE bigtable SET (max_local_update = 0);
ALTER TABLE
postgres=# ALTER TABLE bigtable ADD PRIMARY KEY (id);
ALTER TABLE
postgres=# \dt+ bigtable
   List of relations
 Schema |   Name   | Type  | Owner | Persistence | Access method |
Size  | Description
+--+---+---+-+---++-
 public | bigtable | table | thom  | permanent   | heap  | 730 MB |
(1 row)

postgres=# SELECT * FROM pgstattuple('bigtable');
 table_len | tuple_count | tuple_len | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
free_percent
---+-+---+---+--++++--
 765607936 | 8571429 | 557142885 | 72.77 |
0 |  0 |  0 |  105901628 |13.83
(1 row)

postgres=# SELECT compact_table('bigtable');
NOTICE:  1000 rows relocated.
NOTICE:  2000 rows relocated.
NOTICE:  3000 rows relocated.
NOTICE:  4000 rows relocated.
...
NOTICE:  1221000 rows relocated.
NOTICE:  1222000 rows relocated.
NOTICE:  1223000 rows relocated.
NOTICE:  1224000 rows relocated.
NOTICE:  All non-contiguous rows relocated.
 compact_table
---

(1 row)

postgres=# VACUUM bigtable;
VACUUM
postgres=# \dt+ bigtable;
   List of relations
 Schema |   Name   | Type  | Owner | Persistence | Access method |
Size  | Description
+--+---+---+-+---++-
 public | bigtable | table | thom  | permanent   | heap  | 626 MB |
(1 row)

postgres=# SELECT * FROM pgstattuple('bigtable')

Re: Allow specifying a dbname in pg_basebackup connection string

2023-07-05 Thread Thom Brown
On Wed, 5 Jul 2023 at 16:50, Jelte Fennema  wrote:
>
> On Wed, 5 Jul 2023 at 16:01, Euler Taveira  wrote:
> > One of the PgBouncer's missions is to be a transparent proxy.
> >
> > Sometimes you cannot reach out the database directly due to a security 
> > policy.
>
> Indeed the transparent proxy use case is where replication through
> pgbouncer makes sense. There's quite some reasons to set up PgBouncer
> like such a proxy apart from security policies. Some others that come
> to mind are:
> - load balancer layer of pgbouncers
> - transparent failovers
> - transparent database moves
>
> And in all of those cases its nice for a user to use a single
> connection string/hostname. Instead of having to think: Oh yeah, for
> backups, I need to use this other one.

Okay, understood.  In that case, please remember to write changes to
the pg_basebackup docs page explaining how the dbname value is ignored
under normal usage.

Thom




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Thom Brown
On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
 wrote:
>
> On Wed, 5 Jul 2023 at 14:39, Thom Brown  wrote:
> >
> > On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent
> >  wrote:
> > >
> > > On Wed, 5 Jul 2023 at 13:03, Thom Brown  wrote:
> > > >
> > > > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
> > > >  wrote:
> > > > >
> > > > > On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > > > > > Heap-Only Tuple (HOT) updates are a significant performance
> > > > > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > > > > comes with a caveat: it means that if we have lots of available 
> > > > > > space
> > > > > > earlier on in the relation, it can only be used for new tuples or in
> > > > > > cases where there's insufficient space on a page for an UPDATE to 
> > > > > > use
> > > > > > HOT.
> > > > > >
> > > > > > This mechanism limits our options for condensing tables, forcing us 
> > > > > > to
> > > > > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > > > > tools like pg_repack. These either require exclusive locks (which 
> > > > > > will
> > > > > > be a deal-breaker on large tables on a production system), or 
> > > > > > there's
> > > > > > risks involved. Of course we can always flood pages with new 
> > > > > > versions
> > > > > > of a row until it's forced onto an early page, but that shouldn't be
> > > > > > necessary.
> > > > > >
> > > > > > Considering these trade-offs, I'd like to propose an option to allow
> > > > > > superusers to disable HOT on tables. The intent is to trade some
> > > > > > performance benefits for the ability to reduce the size of a table
> > > > > > without the typical locking associated with it.
> > > > >
> > > > > Interesting use case, but I think that disabling HOT would be missing
> > > > > the forest for the trees. I think that a feature that disables
> > > > > block-local updates for pages > some offset would be a better solution
> > > > > to your issue: Normal updates also prefer the new tuple to be stored
> > > > > in the same pages as the old tuple if at all possible, so disabling
> > > > > HOT wouldn't solve the issue of tuples residing in the tail of your
> > > > > table - at least not while there is still empty space in those pages.
> > > >
> > > > Hmm... I see your point.  It's when an UPDATE isn't going to land on
> > > > the same page that it relocates to the earlier available page.  So I
> > > > guess I'm after whatever mechanism would allow that to happen reliably
> > > > and predictably.
> > > >
> > > > So $subject should really be "Allow forcing UPDATEs off the same page".
> > >
> > > You'd probably want to do that only for a certain range of the table -
> > > for a table with 1GB of data and 3GB of bloat there is no good reason
> > > to force page-crossing updates in the first 1GB of the table - all
> > > tuples of the table will eventually reside there, so why would you
> > > take a performance penalty and move the tuples from inside that range
> > > to inside that same range?
> >
> > I'm thinking more of a case of:
> >
> > 
> >
> > UPDATE bigtable
> > SET primary key = primary key
> > WHERE ctid IN (
> > SELECT ctid
> > FROM bigtable
> > ORDER BY ctid DESC
> > LIMIT 10);
>
> So what were you thinking of? A session GUC? A table option?

Both.

> The benefit of a table option is that it is retained across sessions
> and thus allows tables that get enough updates to eventually get to a
> cleaner state. The main downside of such a table option is that it
> requires a temporary table-level lock to update the parameter.

Yes, but the maintenance window to make such a change would be extremely brief.

> The benefit of a session GUC is that you can set it without impacting
> other sessions, but the downside is that you need to do the
> maintenance in that session, and risk that cascading updates to other
> tables (e.g. through AFTER UPDATE triggers) are also impacted by this
> non-local update GUC.
>
> > > Something else to note: Indexes would suffer some (large?) amount of
> > > bloat in this process, as you would be updating a lot of tuples
> > > without the HOT optimization, thus increasing the work to be done by
> > > VACUUM.
> > > This may result in more bloat in indexes than what you get back from
> > > shrinking the table.
> >
> > This could be the case, but I guess indexes are expendable to an
> > extent, unlike tables.
>
> I don't think that's accurate - index rebuilds are quite expensive.
> But, that's besides the point of this thread.
>
> Somewhat related: did you consider using pg_repack instead of this
> potential feature?

pg_repack isn't exactly innocuous, and can leave potentially the
database in an irrevocable state.  Plus, if disk space is an issue, it
doesn't help.

Thom




Re: Allow specifying a dbname in pg_basebackup connection string

2023-07-05 Thread Thom Brown
On Mon, 3 Jul 2023 at 13:23, Jelte Fennema  wrote:
>
> Normally it doesn't really matter which dbname is used in the connection
> string that pg_basebackup and other physical replication CLI tools use.
> The reason being, that physical replication does not work at the
> database level, but instead at the server level. So you will always get
> the data for all databases.
>
> However, when there's a proxy, such as PgBouncer, in between the client
> and the server, then it might very well matter. Because this proxy might
> want to route the connection to a different server depending on the
> dbname parameter in the startup packet.
>
> This patch changes the creation of the connection string key value
> pairs, so that the following command will actually include
> dbname=postgres in the startup packet to the server:
>
> pg_basebackup --dbname 'dbname=postgres port=6432' -D dump

I guess my immediate question is, should backups be taken through
PgBouncer?  It seems beyond PgBouncer's remit.

Thom




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Thom Brown
On Wed, 5 Jul 2023 at 13:12, Matthias van de Meent
 wrote:
>
> On Wed, 5 Jul 2023 at 13:03, Thom Brown  wrote:
> >
> > On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
> >  wrote:
> > >
> > > On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > > > Heap-Only Tuple (HOT) updates are a significant performance
> > > > enhancement, as they prevent unnecessary page writes. However, HOT
> > > > comes with a caveat: it means that if we have lots of available space
> > > > earlier on in the relation, it can only be used for new tuples or in
> > > > cases where there's insufficient space on a page for an UPDATE to use
> > > > HOT.
> > > >
> > > > This mechanism limits our options for condensing tables, forcing us to
> > > > resort to methods like running VACUUM FULL/CLUSTER or using external
> > > > tools like pg_repack. These either require exclusive locks (which will
> > > > be a deal-breaker on large tables on a production system), or there's
> > > > risks involved. Of course we can always flood pages with new versions
> > > > of a row until it's forced onto an early page, but that shouldn't be
> > > > necessary.
> > > >
> > > > Considering these trade-offs, I'd like to propose an option to allow
> > > > superusers to disable HOT on tables. The intent is to trade some
> > > > performance benefits for the ability to reduce the size of a table
> > > > without the typical locking associated with it.
> > >
> > > Interesting use case, but I think that disabling HOT would be missing
> > > the forest for the trees. I think that a feature that disables
> > > block-local updates for pages > some offset would be a better solution
> > > to your issue: Normal updates also prefer the new tuple to be stored
> > > in the same pages as the old tuple if at all possible, so disabling
> > > HOT wouldn't solve the issue of tuples residing in the tail of your
> > > table - at least not while there is still empty space in those pages.
> >
> > Hmm... I see your point.  It's when an UPDATE isn't going to land on
> > the same page that it relocates to the earlier available page.  So I
> > guess I'm after whatever mechanism would allow that to happen reliably
> > and predictably.
> >
> > So $subject should really be "Allow forcing UPDATEs off the same page".
>
> You'd probably want to do that only for a certain range of the table -
> for a table with 1GB of data and 3GB of bloat there is no good reason
> to force page-crossing updates in the first 1GB of the table - all
> tuples of the table will eventually reside there, so why would you
> take a performance penalty and move the tuples from inside that range
> to inside that same range?

I'm thinking more of a case of:



UPDATE bigtable
SET primary key = primary key
WHERE ctid IN (
SELECT ctid
FROM bigtable
ORDER BY ctid DESC
LIMIT 10);

> Something else to note: Indexes would suffer some (large?) amount of
> bloat in this process, as you would be updating a lot of tuples
> without the HOT optimization, thus increasing the work to be done by
> VACUUM.
> This may result in more bloat in indexes than what you get back from
> shrinking the table.

This could be the case, but I guess indexes are expendable to an
extent, unlike tables.

Thom




Re: Disabling Heap-Only Tuples

2023-07-05 Thread Thom Brown
On Wed, 5 Jul 2023 at 11:57, Matthias van de Meent
 wrote:
>
> On Wed, 5 Jul 2023 at 12:45, Thom Brown  wrote:
> > Heap-Only Tuple (HOT) updates are a significant performance
> > enhancement, as they prevent unnecessary page writes. However, HOT
> > comes with a caveat: it means that if we have lots of available space
> > earlier on in the relation, it can only be used for new tuples or in
> > cases where there's insufficient space on a page for an UPDATE to use
> > HOT.
> >
> > This mechanism limits our options for condensing tables, forcing us to
> > resort to methods like running VACUUM FULL/CLUSTER or using external
> > tools like pg_repack. These either require exclusive locks (which will
> > be a deal-breaker on large tables on a production system), or there's
> > risks involved. Of course we can always flood pages with new versions
> > of a row until it's forced onto an early page, but that shouldn't be
> > necessary.
> >
> > Considering these trade-offs, I'd like to propose an option to allow
> > superusers to disable HOT on tables. The intent is to trade some
> > performance benefits for the ability to reduce the size of a table
> > without the typical locking associated with it.
>
> Interesting use case, but I think that disabling HOT would be missing
> the forest for the trees. I think that a feature that disables
> block-local updates for pages > some offset would be a better solution
> to your issue: Normal updates also prefer the new tuple to be stored
> in the same pages as the old tuple if at all possible, so disabling
> HOT wouldn't solve the issue of tuples residing in the tail of your
> table - at least not while there is still empty space in those pages.

Hmm... I see your point.  It's when an UPDATE isn't going to land on
the same page that it relocates to the earlier available page.  So I
guess I'm after whatever mechanism would allow that to happen reliably
and predictably.

So $subject should really be "Allow forcing UPDATEs off the same page".

Thom




Disabling Heap-Only Tuples

2023-07-05 Thread Thom Brown
Hi,

Heap-Only Tuple (HOT) updates are a significant performance
enhancement, as they prevent unnecessary page writes. However, HOT
comes with a caveat: it means that if we have lots of available space
earlier on in the relation, it can only be used for new tuples or in
cases where there's insufficient space on a page for an UPDATE to use
HOT.

This mechanism limits our options for condensing tables, forcing us to
resort to methods like running VACUUM FULL/CLUSTER or using external
tools like pg_repack. These either require exclusive locks (which will
be a deal-breaker on large tables on a production system), or there's
risks involved. Of course we can always flood pages with new versions
of a row until it's forced onto an early page, but that shouldn't be
necessary.

Considering these trade-offs, I'd like to propose an option to allow
superusers to disable HOT on tables. The intent is to trade some
performance benefits for the ability to reduce the size of a table
without the typical locking associated with it.

This feature could be used to shrink tables in one of two ways:
temporarily disabling HOT until DML operations have compacted the data
into a smaller area, or performing a mass update on later rows to
relocate them to an earlier location, probably in stages. Of course,
this would need to be used in conjunction with a VACUUM operation.

Admittedly this isn't ideal, and it would be better if we had an
operation that could do this (e.g. VACUUM COMPACT ), or an
option that causes some operations to avoid HOT when it detects an
amount of free space over a threshold, but in lieu of those, I thought
this would at least allow users to help themselves when running into
disk space issues.

Thoughts?

Thom




Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?

2023-07-01 Thread Thom Brown
On Thu, 29 Jun 2023, 14:45 Álvaro Herrera,  wrote:

> ALTER TABLE DETACH CONCURRENTLY had to deal with this also, and it did it
> by having a COMPLETE option you can run later in case things got stuck the
> first time around. I suppose we could do something similar, where the
> server automatically does the needful, whatever that is.
>

So there doesn't appear to be provision for deferred activities.

Could, perhaps, the fact that it is an invalid index that has no locks on
it, and is dependent on the table mean it could be removed by a VACUUM?

I just don't like the idea of the user needing to remove broken things.

Thom


Does a cancelled REINDEX CONCURRENTLY need to be messy?

2023-06-29 Thread Thom Brown
Hi,

It's documented that a failed REINDEX can leave behind a transient
index, and I'm not going to speculate on all the conditions that could
lead to this situation.  However, cancelling a REINDEX CONCURRENTLY
will reliably leave behind the index it was building (_ccnew).

Doesn't a cancellation instruct the process that the user has made a
decision regarding the fate of the new version of the index?  Is there
a situation where the incomplete transient index might need to be
inspected following a cancellation?

Because if not, why not get it to tidy up after itself?  If the
process crashed, fair enough, but it just doesn't sit well that
leftover artifacts of an aborted operation aren't tidied up,
especially since subsequent attempts to REINDEX will find these
invalid transient versions and attempt to REINDEX them.  Why should
the user need to know about them and take manual action in the case of
a cancellation?

I get the feeling that this is deliberate, and perhaps an attempt to
mitigate locking issues, or some other explanation, but the rationale
isn't immediately apparent to me if this is the case.

Thanks

Thom




Re: Remove references to pre-11 versions

2023-04-21 Thread Thom Brown
On Wed, 19 Apr 2023 at 14:58, Tom Lane  wrote:
>
> Thom Brown  writes:
> > I've attached a patch that removes some now redundant messaging about
> > unsupported versions.
>
> If we want to make that a policy, I think a lot more could be done
> --- I remember noticing a documentation comment about some 8.x
> version just recently.
>
> However, "out of support" is a lot different from "nobody has any
> code written for that version anymore".  So I'd be inclined to keep
> the first hunk in your patch, the one explaining the regexp_matches-
> in-a-subselect trick.  People will be trying to puzzle out why
> somebody did it like that for years to come.
>
> I agree with simplifying the other two spots.

Fair enough.  I've updated the patch.  However, feel free to ignore if
this marks the thin edge of the wedge.

Thom


old_version_removal_v2.patch
Description: Binary data


Remove references to pre-11 versions

2023-04-19 Thread Thom Brown
Hi,

I've attached a patch that removes some now redundant messaging about
unsupported versions.

Regards

Thom


old_version_removal.patch
Description: Binary data


Re: Various typo fixes

2023-04-11 Thread Thom Brown
On Tue, 11 Apr 2023 at 15:39, Justin Pryzby  wrote:
>
> On Tue, Apr 11, 2023 at 03:36:02PM +0100, Thom Brown wrote:
> > I've attached a patch with a few typo and grammatical fixes.
>
> I think you actually sent the "git-diff" manpage :(

Oh dear, well that's a first.  Thanks for pointing out.

Re-attached.

Thom


various_typos_and_grammar_fixes.patch
Description: Binary data


Various typo fixes

2023-04-11 Thread Thom Brown
Hi,

I've attached a patch with a few typo and grammatical fixes.

Regards

Thom


various_typos_and_grammar_fixes.patch
Description: Binary data


Re: [PoC] Reducing planning time when tables have many partitions

2022-12-06 Thread Thom Brown
On Mon, 5 Dec 2022 at 21:28, David Rowley  wrote:
>
> On Tue, 6 Dec 2022 at 04:45, Thom Brown  wrote:
> > Testing your patches with the same 1024 partitions, each with 64
> > sub-partitions, I get a planning time of 205.020 ms, which is now a
> > 1,377x speedup.  This has essentially reduced the planning time from a
> > catastrophe to a complete non-issue.  Huge win!
>
> Thanks for testing the v10 patches.
>
> I wouldn't have expected such additional gains from v10. I was mostly
> focused on trying to minimise any performance regression for simple
> queries that wouldn't benefit from indexing the EquivalenceMembers.
> Your query sounds like it does not fit into that category.  Perhaps it
> is down to the fact that v9-0002 or v9-0003 reverts a couple of the
> optimisations that is causing v9 to be slower than v10 for your query.
> It's hard to tell without more details of what you're running.

I celebrated prematurely as I neglected to wait for the 6th execution
of the prepared statement, which shows the real result.  With the v10
patches, it takes 5632.040 ms, a speedup of 50x.

Testing the v9 patches, the same query takes 3388.173 ms, a speedup of
83x.  And re-testing v8, I'm getting roughly the same times.  These
are all with a cold cache.

So the result isn't as dramatic as I had initially interpreted it to
have unfortunately.

-- 
Thom




Re: [PoC] Reducing planning time when tables have many partitions

2022-12-05 Thread Thom Brown
On Sun, 4 Dec 2022 at 00:35, David Rowley  wrote:
>
> On Tue, 29 Nov 2022 at 21:59, Yuya Watari  wrote:
> > Thank you for testing the patch with an actual query. This speedup is
> > very impressive. When I used an original query with 1024 partitions,
> > its planning time was about 200ms. Given that each partition is also
> > partitioned in your workload, I think the result of 1415ms is
> > reasonable.
>
> I was looking again at the v9-0001 patch and I think we can do a
> little better when building the Bitmapset of matching EMs.  For
> example, in the v9 patch, the code for get_ecmember_indexes_strict()
> is doing:
>
> + if (!with_children)
> + matching_ems = bms_copy(ec->ec_nonchild_indexes);
> + else
> + matching_ems = bms_copy(ec->ec_member_indexes);
> +
> + i = -1;
> + while ((i = bms_next_member(relids, i)) >= 0)
> + {
> + RelOptInfo *rel = root->simple_rel_array[i];
> +
> + matching_ems = bms_int_members(matching_ems, 
> rel->eclass_member_indexes);
> + }
>
> It seems reasonable that if there are a large number of partitions
> then ec_member_indexes will have a large number of Bitmapwords.  When
> we do bms_int_members() on that, we're going to probably end up with a
> bunch of trailing zero words in the set.  In the v10 patch, I've
> changed this to become:
>
> +inti = bms_next_member(relids, -1);
> +
> +if (i >= 0)
> +{
> +RelOptInfo *rel = root->simple_rel_array[i];
> +
> +/*
> + * bms_intersect to the first relation to try to keep the resulting
> + * Bitmapset as small as possible.  This saves having to make a
> + * complete bms_copy() of one of them.  One may contain significantly
> + * more words than the other.
> + */
> +if (!with_children)
> +matching_ems = bms_intersect(rel->eclass_member_indexes,
> + ec->ec_nonchild_indexes);
> +else
> +matching_ems = bms_intersect(rel->eclass_member_indexes,
> + ec->ec_member_indexes);
> +
> +while ((i = bms_next_member(relids, i)) >= 0)
> +{
> +rel = root->simple_rel_array[i];
> +matching_ems = bms_int_members(matching_ems,
> +   rel->eclass_member_indexes);
> +}
> +}
>
> so, effectively we first bms_intersect to the first member of relids
> before masking out the bits for the remaining ones.  This should mean
> we'll have a Bitmapset with fewer words in many complex planning
> problems. There's no longer the dilemma of having to decide if we
> should start with RelOptInfo's eclass_member_indexes or the
> EquivalenceClass's member indexes.  When using bms_int_member, we
> really want to start with the smallest of those so we get the smallest
> resulting set.  With bms_intersect(), it will always make a copy of
> the smallest set. v10 does that instead of bms_copy()ing the
> EquivalenceClass's member's Bitmapset.
>
> I also wondered how much we're losing to the fact that
> bms_int_members() zeros the trailing words and does not trim the
> Bitmapset down.
>
> The problem there is 2-fold;
> 1) we have to zero the trailing words on the left input. That'll
> pollute the CPU cache a bit as it may have to fetch a bunch of extra
> cache lines, and;
> 2) subsequent bms_int_members() done afterwards may have to mask out
> additional words. If we can make the shortest input really short, then
> subsequent bms_int_members() are going to be very fast.
>
> You might argue there that setting nwords to the shortest length may
> cause us to have to repalloc the Bitmapset if we need to later add
> more members again, but if you look at the repalloc() code, it's
> effectively a no-op when the allocated size >= the requested size, so
> repalloc() should be very fast in this case. So, worst case, there's
> an additional "no-op" repalloc() (which should be very fast) followed
> by maybe a bms_add_members() which has to zero the words instead of
> bms_int_members(). I changed this in the v10-0002 patch. I'm not sure
> if we should do this or not.
>
> I also changed v10-0001 so that we still store the EquivalenceClass's
> members list.  There were a few places where the code just wanted to
> get the first member and having to look at the Bitmapset index and
> fetch the first match from PlannerInfo seemed convoluted.  If the
> query is simple, it seems like it's not going to be very expensive to
> add a few EquivalenceMembers to this list. When planning more complex
> problems, there's probably enough other extra overhead that we're
> unlikely to notice the extra lappend()s.  This also allows v10-0003 to
> work, see below.
>
> In v10-0003, I experimented with the iterator concept that I mentioned
> earlier.  Since v10-0001 is now storing the EquivalenceMember list in
> EquivalenceClass again, it's now quite simple to have the iterator
> decide if it should be scanning the 

Re: [PoC] Reducing planning time when tables have many partitions

2022-11-17 Thread Thom Brown
On Thu, 17 Nov 2022 at 11:20, Thom Brown  wrote:
>
> On Thu, 17 Nov 2022 at 09:31, Alvaro Herrera  wrote:
> >
> > On 2022-Nov-16, Thom Brown wrote:
> >
> > > Once the issue Tom identified has been resolved, I'd like to test
> > > drive newer patches.
> >
> > What issue?  If you mean the one from the thread "Reducing
> > duplicativeness of EquivalenceClass-derived clauses", that patch is
> > already applied (commit a5fc46414deb), and Yuya Watari's v8 series
> > applies fine to current master.
>
> Ah, I see..  I'll test the v8 patches.

No issues with applying.  Created 1024 partitions, each of which is
partitioned into 64 partitions.

I'm getting a generic planning time of 1415ms.  Is that considered
reasonable in this situation?  Bear in mind that the planning time
prior to this patch was 282311ms, so pretty much a 200x speedup.

Thom




Re: [PoC] Reducing planning time when tables have many partitions

2022-11-17 Thread Thom Brown
On Thu, 17 Nov 2022 at 09:31, Alvaro Herrera  wrote:
>
> On 2022-Nov-16, Thom Brown wrote:
>
> > Once the issue Tom identified has been resolved, I'd like to test
> > drive newer patches.
>
> What issue?  If you mean the one from the thread "Reducing
> duplicativeness of EquivalenceClass-derived clauses", that patch is
> already applied (commit a5fc46414deb), and Yuya Watari's v8 series
> applies fine to current master.

Ah, I see..  I'll test the v8 patches.

Thanks

Thom




Re: [PoC] Reducing planning time when tables have many partitions

2022-11-16 Thread Thom Brown
On Mon, 7 Nov 2022 at 06:33, Zhang Mingli  wrote:
>
> HI,
>
> Regards,
> Zhang Mingli
> On Nov 7, 2022, 14:26 +0800, Tom Lane , wrote:
>
> Andrey Lepikhov  writes:
>
> I'm still in review of your patch now. At most it seems ok, but are you
> really need both eq_sources and eq_derives lists now?
>
>
> Didn't we just have this conversation? eq_sources needs to be kept
> separate to support the "broken EC" logic. We don't want to be
> regurgitating derived clauses as well as originals in that path.
>
> Aha, we have that conversation in another thread(Reducing duplicativeness of 
> EquivalenceClass-derived clauses
> ) : https://www.postgresql.org/message-id/644164.1666877342%40sss.pgh.pa.us

Once the issue Tom identified has been resolved, I'd like to test
drive newer patches.

Thom




Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-14 Thread Thom Brown
On Thu, 3 Nov 2022 at 08:12, Maxim Orlov  wrote:
>
> Hi!
>
>> I attach an additional V48-0009 patch as they are just comments, apply it if 
>> you want to.
>
> Big thank you for your review. I've applied your addition in the recent patch 
> set below.
>
> Besides, mentioned above, next changes are made:
> - rename HeapTupleCopyBaseFromPage to HeapTupleCopyXidsFromPage, since this 
> old name came from the time when еру "t_xid_base" was stored in tuple,
>   and not correspond to recent state of the code;
> - replace ToastTupleHeader* calls with HeapHeader* with the "is_toast" 
> argument. This reduces diff and make the code more readable;
> - put HeapTupleSetZeroXids calls in several places for the sake of redundancy;
> - in heap_tuple_would_freeze add case to reset xmax without reading clog;
> - rename SeqTupleHeaderSetXmax/Xmin to SeqTupleSetXmax/min and refactoring of 
> the function; Now it will set HeapTuple and HeapTupleHeader xmax;
> - add case of int64 values in check_GUC_init;
> - massive refactoring in htup_details.h to use inline functions with type 
> control over macro;
> - reorder code in htup_details.h to reduce overall diff.
>
> As always, reviews and opinions are very welcome!

0008 needs a rebase.  heapam.h and catversion.h are failing.

Regards

Thom




Re: generalized conveyor belt storage

2022-04-20 Thread Thom Brown
On Wed, 20 Apr 2022 at 13:02, Alvaro Herrera  wrote:
>
> What's with the free text in cbstorage.h?  I would guess that this
> wouldn't even compile, and nobody has noticed because the file is not
> included by anything yet ...

I'm not able to compile:

cbfsmpage.c: In function ‘cb_fsmpage_initialize’:
cbfsmpage.c:34:11: warning: unused variable ‘fsm_block_spacing’
[-Wunused-variable]
  unsigned fsm_block_spacing = cb_fsm_block_spacing(pages_per_segment);
   ^
cbfsmpage.c:33:14: warning: unused variable ‘first_fsm_block’
[-Wunused-variable]
  BlockNumber first_fsm_block = cb_first_fsm_block(pages_per_segment);
...
cbxlog.c: In function ‘cb_xlog_allocate_payload_segment’:
cbxlog.c:70:24: error: void value not ignored as it ought to be
  bool  have_fsm_page = XLogRecGetBlockTag(record, 1, NULL, NULL, NULL);
^
cbxlog.c: In function ‘cb_xlog_allocate_index_segment’:
cbxlog.c:123:17: error: void value not ignored as it ought to be
  have_prev_page = XLogRecGetBlockTag(record, 2, NULL, NULL, NULL);
 ^
cbxlog.c:124:16: error: void value not ignored as it ought to be
  have_fsm_page = XLogRecGetBlockTag(record, 3, NULL, NULL, NULL);
^
cbxlog.c: In function ‘cb_xlog_recycle_payload_segment’:
cbxlog.c:311:16: error: void value not ignored as it ought to be
  have_metapage = XLogRecGetBlockTag(record, 0, NULL, NULL, NULL);
^
cbxlog.c:312:18: error: void value not ignored as it ought to be
  have_index_page = XLogRecGetBlockTag(record, 1, NULL, NULL, NULL);
  ^
cbxlog.c:313:16: error: void value not ignored as it ought to be
  have_fsm_page = XLogRecGetBlockTag(record, 2, NULL, NULL, NULL);
^
make[4]: *** [cbxlog.o] Error 1
make[4]: Leaving directory
`/home/thom/Development/postgresql/src/backend/access/conveyor'
make[3]: *** [conveyor-recursive] Error 2
make[3]: Leaving directory
`/home/thom/Development/postgresql/src/backend/access'
make[2]: *** [access-recursive] Error 2


-- 
Thom




Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery

2022-04-11 Thread Thom Brown
On Mon, 11 Apr 2022, 15:55 Robert Haas,  wrote:

> On Fri, Apr 8, 2022 at 11:10 AM Robert Haas  wrote:
> > On Fri, Apr 8, 2022 at 10:45 AM Thom Brown  wrote:
> > > Thanks. This doesn't include my self-correction:
> > >
> > > s/kept on standby/kept on the standby/
> >
> > Here is v2, endeavoring to rectify that oversight.
>
> Committed.
>

Much appreciated.

Thom

>


Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery

2022-04-08 Thread Thom Brown
On Fri, 8 Apr 2022, 14:36 Robert Haas,  wrote:

> On Wed, Apr 6, 2022 at 8:15 AM Robert Haas  wrote:
> > On Tue, Apr 5, 2022 at 8:43 PM Thom Brown  wrote:
> > > I share your discomfort with the wording.  How about:
> > >
> > > WAL records must be kept on standby until they are ready to be applied.
> > > Therefore, longer delays will result in a greater accumulation of WAL
> files,
> > > increasing disk space requirements for the standby's
> pg_wal
> > > directory.
> >
> > Looks awesome.
>
> Here that is in patch form. I feel that the feature freeze should not
> preclude committing this documentation improvement, but if someone
> feels otherwise, then I will leave this until the tree reopens.
>

Thanks. This doesn't include my self-correction:

s/kept on standby/kept on the standby/

Thom

>


Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery

2022-04-05 Thread Thom Brown
On Wed, 6 Apr 2022 at 01:42, Thom Brown  wrote:
>
> On Tue, 5 Apr 2022 at 16:02, Robert Haas  wrote:
> >
> > On Tue, Apr 5, 2022 at 10:58 AM Magnus Hagander  wrote:
> > >> Makes sense. I will do this soon if nobody objects.
> > >>
> > >> I'm mildly uncomfortable with the phrase "WAL records generated over
> > >> the delay period" because it seems a bit imprecise, but I'm not sure
> > >> what would be better and I think the meaning is clear.
> > >
> > > Maybe "during" instead of "over"? But I'm not sure that's the part you're 
> > > referring to?
> >
> > Yeah, something like that, maybe.
>
> I share your discomfort with the wording.  How about:
>
> WAL records must be kept on standby until they are ready to be applied.
> Therefore, longer delays will result in a greater accumulation of WAL files,
> increasing disk space requirements for the standby's pg_wal
> directory.

*must be kept on the standby

-- 
Thom




Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery

2022-04-05 Thread Thom Brown
On Tue, 5 Apr 2022 at 16:02, Robert Haas  wrote:
>
> On Tue, Apr 5, 2022 at 10:58 AM Magnus Hagander  wrote:
> >> Makes sense. I will do this soon if nobody objects.
> >>
> >> I'm mildly uncomfortable with the phrase "WAL records generated over
> >> the delay period" because it seems a bit imprecise, but I'm not sure
> >> what would be better and I think the meaning is clear.
> >
> > Maybe "during" instead of "over"? But I'm not sure that's the part you're 
> > referring to?
>
> Yeah, something like that, maybe.

I share your discomfort with the wording.  How about:

WAL records must be kept on standby until they are ready to be applied.
Therefore, longer delays will result in a greater accumulation of WAL files,
increasing disk space requirements for the standby's pg_wal
directory.
-- 
Thom




Re: Blank archive_command

2022-01-17 Thread Thom Brown
On Mon, 17 Jan 2022 at 15:25, Bharath Rupireddy
 wrote:
>
> On Mon, Jan 17, 2022 at 8:14 PM Thom Brown  wrote:
> >
> > Hi,
> >
> > Should archive_command being blank when archiving is enabled result in
> > a fatal error?  This doesn't even produce a warning when restarting,
> > just an entry in the log when it goes to archive a WAL segment, and
> > finds the archive_command is empty.
> >
> > Is there a valid scenario where someone would have archiving enabled
> > but no archive command?  Naturally it will build up WAL until it is
> > corrected, which will result in a less desirable error, and likely at
> > a less convenient time, and to avoid it, someone either has to have
> > checked the logs and noticed this error, or got curious as to why
> > their WAL collection is nearly running out of shelf space.
>
> Yes, the .ready files under archive_status and wal files under pg_wal
> directory grow up with archive_mode on but no archive_command. The
> archiver keeps emitting "archive_mode enabled, yet archive_command is
> not set" warnings into server logs, maybe this is something that needs
> to be monitored for. The expectation is to have a good archiving
> configuration setup in place which updates both archive_command and
> archive_mode to appropriate values.
>
> The server keeps the WAL files from the point when archive_mode is
> enabled, but not from the point when the archive_command is set. The
> archive_mode needs postmaster restart whereas archive_command doesn't,
> if the archive_command too needed a postmaster restart, then we would
> have failed FATALly if archive_command was empty. But making the
> archive_command a needs-postmaster-restart class of parameter is not
> the path we go IMO because avoiding pomaster restarts in production
> environments is to be avoided whenever possible.

Okay, that makes sense.  Thanks.  I guess people have to be careful
with their settings.  I was hoping there was one less footgun that
could be disarmed.
>
> An extreme scenario I can think of is if the archive_command is set to
> empty by a service layer code. Of course, this is something postgres
> doesn't need to care about. However, a reasonable thing to do is to
> emit a WARNING or ERROR-out when archive_command is set to null in
> it's check_archive_command when archive_mode is on?
>
> Regards,
> Bharath Rupireddy.



-- 
Thom




Blank archive_command

2022-01-17 Thread Thom Brown
Hi,

Should archive_command being blank when archiving is enabled result in
a fatal error?  This doesn't even produce a warning when restarting,
just an entry in the log when it goes to archive a WAL segment, and
finds the archive_command is empty.

Is there a valid scenario where someone would have archiving enabled
but no archive command?  Naturally it will build up WAL until it is
corrected, which will result in a less desirable error, and likely at
a less convenient time, and to avoid it, someone either has to have
checked the logs and noticed this error, or got curious as to why
their WAL collection is nearly running out of shelf space.

Thom




Re: [HACKERS] Look-behind regular expressions

2020-07-07 Thread Thom Brown
On Tue, 29 Jun 2010 at 17:49, David E. Wheeler  wrote:
>
> On Jun 29, 2010, at 7:44 AM, Thom Brown wrote:
>
> >> No.  Or are you volunteering?
> >
> > A n00b like me volunteer for that?  It's more of a suggestion.
>
> N00bs gotta start somewhere…

10 years later, and I've noticed that both look-behind and negative
look-behind have been implemented.

Thanks to whomever did this.

Thom




Multi-byte character case-folding

2020-07-06 Thread Thom Brown
Hi,

At the moment, only single-byte characters in identifiers are
case-folded, and multi-byte characters are not.

For example, abĉDĚF is case-folded to "abĉdĚf".  This can be referred
to as "abĉdĚf" or "ABĉDĚF", but not "abĉděf" or "ABĈDĚF".

downcase_identifier() has the following comment:

/*
 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 * have the infrastructure for.  Instead we use tolower() to provide a
 * locale-aware translation.  However, there are some locales where this
 * is not right either (eg, Turkish may do strange things with 'i' and
 * 'I').  Our current compromise is to use tolower() for characters with
 * the high bit set, as long as they aren't part of a multi-byte
 * character, and use an ASCII-only downcasing for 7-bit characters.
 */

So my question is, do we yet have the infrastructure to make
case-folding consistent across all character widths?

Thanks

Thom




Re: SQL/JSON path issues/questions

2019-07-18 Thread Thom Brown
On Tue, 16 Jul 2019 at 19:44, Alexander Korotkov
 wrote:
>
> On Tue, Jul 16, 2019 at 9:22 PM Thom Brown  wrote:
> > Now I'm looking at the @? and @@ operators, and getting a bit
> > confused.  This following query returns true, but I can't determine
> > why:
> >
> > # SELECT '{"a":[1,2,3,4,5]}'::jsonb @? '$.b == "hello"'::jsonpath;
> >  ?column?
> > --
> >  t
> > (1 row)
> >
> > "b" is not a valid item, so there should be no match.  Perhaps it's my
> > misunderstanding of how these operators are supposed to work, but the
> > documentation is quite terse on the behaviour.
>
> So, the result of jsonpath evaluation is single value "false".
>
> # SELECT jsonb_path_query_array('{"a":[1,2,3,4,5]}'::jsonb, '$.b == "hello"');
>  jsonb_path_query_array
> 
>  [false]
> (1 row)
>
> @@ operator checks that result is "true".  This is why it returns "false".
>
> @? operator checks if result is not empty.  So, it's single "false"
> value, not empty list.  This is why it returns "true".
>
> Perhaps, we need to clarify this in docs providing more explanation.

Understood.  Thanks.

Also, is there a reason why jsonb_path_query doesn't have an operator analog?

Thom




Re: SQL/JSON path issues/questions

2019-07-16 Thread Thom Brown
On Thu, 11 Jul 2019 at 16:23, Alexander Korotkov
 wrote:
>
> On Thu, Jul 11, 2019 at 5:10 PM Thom Brown  wrote:
> > On Wed, 10 Jul 2019 at 05:58, Alexander Korotkov
> >  wrote:
> > >
> > > On Mon, Jul 8, 2019 at 12:30 AM Alexander Korotkov
> > >  wrote:
> > > > On Thu, Jul 4, 2019 at 4:38 PM Liudmila Mantrova
> > > >  wrote:
> > > > > Thank  you!
> > > > >
> > > > > I think we can make this sentence even shorter, the fix is attached:
> > > > >
> > > > > "To refer to a JSON element stored at a lower nesting level, add one 
> > > > > or
> > > > > more accessor operators after @."
> > > >
> > > > Thanks, looks good to me.  Attached revision of patch contains commit
> > > > message.  I'm going to commit this on no objections.
> > >
> > > So, pushed!
> >
> > I've just noticed the >= operator shows up as just > in the "jsonpath
> > Filter Expression Elements" table, and the <= example only shows <.
>
> Thank you for catching this!  Fix just pushed.

Thanks.

Now I'm looking at the @? and @@ operators, and getting a bit
confused.  This following query returns true, but I can't determine
why:

# SELECT '{"a":[1,2,3,4,5]}'::jsonb @? '$.b == "hello"'::jsonpath;
 ?column?
--
 t
(1 row)

"b" is not a valid item, so there should be no match.  Perhaps it's my
misunderstanding of how these operators are supposed to work, but the
documentation is quite terse on the behaviour.

Thom




Re: SQL/JSON path issues/questions

2019-07-11 Thread Thom Brown
On Wed, 10 Jul 2019 at 05:58, Alexander Korotkov
 wrote:
>
> On Mon, Jul 8, 2019 at 12:30 AM Alexander Korotkov
>  wrote:
> > On Thu, Jul 4, 2019 at 4:38 PM Liudmila Mantrova
> >  wrote:
> > > Thank  you!
> > >
> > > I think we can make this sentence even shorter, the fix is attached:
> > >
> > > "To refer to a JSON element stored at a lower nesting level, add one or
> > > more accessor operators after @."
> >
> > Thanks, looks good to me.  Attached revision of patch contains commit
> > message.  I'm going to commit this on no objections.
>
> So, pushed!

I've just noticed the >= operator shows up as just > in the "jsonpath
Filter Expression Elements" table, and the <= example only shows <.

Thom




Re: SQL/JSON path issues/questions

2019-06-27 Thread Thom Brown
On Wed, 19 Jun 2019 at 20:04, Alexander Korotkov
 wrote:
>
> On Wed, Jun 19, 2019 at 7:07 PM Thom Brown  wrote:
> > On Thu, 13 Jun 2019 at 14:59, Thom Brown  wrote:
> > >
> > > Hi,
> > >
> > > I've been reading through the documentation regarding jsonpath and
> > > jsonb_path_query etc., and I have found it lacking explanation for
> > > some functionality, and I've also had some confusion when using the
> > > feature.
> > >
> > > ? operator
> > > ==
> > > The first mention of '?' is in section 9.15, where it says:
> > >
> > > "Suppose you would like to retrieve all heart rate values higher than
> > > 130. You can achieve this using the following expression:
> > > '$.track.segments[*].HR ? (@ > 130)'"
> > >
> > > So what is the ? operator doing here?  Sure, there's the regular ?
> > > operator, which is given as an example further down the page:
> > >
> > > '{"a":1, "b":2}'::jsonb ? 'b'
> > >
> > > But this doesn't appear to have the same purpose.
> > >
> > >
> > > like_regex
> > > ==
> > > Then there's like_regex, which shows an example that uses the keyword
> > > "flag", but that is the only instance of that keyword being mentioned,
> > > and the flags available to this expression aren't anywhere to be seen.
> > >
> > >
> > > is unknown
> > > ==
> > > "is unknown" suggests a boolean output, but the example shows an
> > > output of "infinity".  While I understand what it does, this appears
> > > inconsistent with all other "is..." functions (e.g. is_valid(lsn),
> > > pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
> > > pg_is_in_backup() etc.).
> > >
> > >
> > > $varname
> > > ==
> > > The jsonpath variable, $varname, has an incomplete description: "A
> > > named variable. Its value must be set in the PASSING clause of an
> > > SQL/JSON query function. for details."
> > >
> > >
> > > Binary operation error
> > > ==
> > > I get an error when I run this query:
> > >
> > > postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
> > > psql: ERROR:  right operand of jsonpath operator + is not a single 
> > > numeric value
> > >
> > > While I know it's correct to get an error in this scenario as there is
> > > no element beyond 0, the message I get is confusing.  I'd expect this
> > > if it encountered another array in that position, but not for
> > > exceeding the upper bound of the array.
> > >
> > >
> > > Cryptic error
> > > ==
> > > postgres=# SELECT jsonb_path_query('[1, "2",
> > > {},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].type()');
> > > psql: ERROR:  syntax error, unexpected ANY_P at or near "**" of jsonpath 
> > > input
> > > LINE 1: ...},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].typ...
> > >  ^
> > > Again, I expect an error, but the message produced doesn't help me.
> > > I'll remove the ANY_P if I can find it.
> > >
> > >
> > > Can't use nested arrays with jsonpath
> > > ==
> > >
> > > I encounter an error in this scenario:
> > >
> > > postgres=# select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == 
> > > [1,2])');
> > > psql: ERROR:  syntax error, unexpected '[' at or near "[" of jsonpath 
> > > input
> > > LINE 1: select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == ...
> > >
> > > So these filter operators only work with scalars?
> > >
> > >
> >
> > Another observation about the documentation is that the examples given
> > in 9.15. JSON Functions, Operators, and Expressions aren't all
> > functional.  Some example JSON is provided, followed by example
> > jsonpath queries which could be used against it.  These will produce
> > results for the reader wishing to test them out until this example:
> >
> > '$.track.segments[*].HR ? (@ > 130)'
> >
> > This is because there is no HR value greater than 130.  May I propose
> > setting this and all similar examples to (@ > 120) instead?
>
> Makes sense to me.
>
> > Also, this example doesn't work:
> >
> > '$.track ? (@.segments[*] ? (@.HR > 130)).segments.size()'
> >
> > This gives me:
> >
> > psql: ERROR:  syntax error, unexpected $end at end of jsonpath input
> > LINE 13: }','$.track ? (@.segments[*]');
> > ^
>
> Perhaps it should be following:
>
> '$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()'

I'm not clear on why the original example doesn't work here.

Thom




Re: SQL/JSON path issues/questions

2019-06-19 Thread Thom Brown
On Thu, 13 Jun 2019 at 14:59, Thom Brown  wrote:
>
> Hi,
>
> I've been reading through the documentation regarding jsonpath and
> jsonb_path_query etc., and I have found it lacking explanation for
> some functionality, and I've also had some confusion when using the
> feature.
>
> ? operator
> ==
> The first mention of '?' is in section 9.15, where it says:
>
> "Suppose you would like to retrieve all heart rate values higher than
> 130. You can achieve this using the following expression:
> '$.track.segments[*].HR ? (@ > 130)'"
>
> So what is the ? operator doing here?  Sure, there's the regular ?
> operator, which is given as an example further down the page:
>
> '{"a":1, "b":2}'::jsonb ? 'b'
>
> But this doesn't appear to have the same purpose.
>
>
> like_regex
> ==
> Then there's like_regex, which shows an example that uses the keyword
> "flag", but that is the only instance of that keyword being mentioned,
> and the flags available to this expression aren't anywhere to be seen.
>
>
> is unknown
> ==
> "is unknown" suggests a boolean output, but the example shows an
> output of "infinity".  While I understand what it does, this appears
> inconsistent with all other "is..." functions (e.g. is_valid(lsn),
> pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
> pg_is_in_backup() etc.).
>
>
> $varname
> ==
> The jsonpath variable, $varname, has an incomplete description: "A
> named variable. Its value must be set in the PASSING clause of an
> SQL/JSON query function. for details."
>
>
> Binary operation error
> ==
> I get an error when I run this query:
>
> postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
> psql: ERROR:  right operand of jsonpath operator + is not a single numeric 
> value
>
> While I know it's correct to get an error in this scenario as there is
> no element beyond 0, the message I get is confusing.  I'd expect this
> if it encountered another array in that position, but not for
> exceeding the upper bound of the array.
>
>
> Cryptic error
> ==
> postgres=# SELECT jsonb_path_query('[1, "2",
> {},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].type()');
> psql: ERROR:  syntax error, unexpected ANY_P at or near "**" of jsonpath input
> LINE 1: ...},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].typ...
>  ^
> Again, I expect an error, but the message produced doesn't help me.
> I'll remove the ANY_P if I can find it.
>
>
> Can't use nested arrays with jsonpath
> ==
>
> I encounter an error in this scenario:
>
> postgres=# select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == 
> [1,2])');
> psql: ERROR:  syntax error, unexpected '[' at or near "[" of jsonpath input
> LINE 1: select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == ...
>
> So these filter operators only work with scalars?
>
>

Another observation about the documentation is that the examples given
in 9.15. JSON Functions, Operators, and Expressions aren't all
functional.  Some example JSON is provided, followed by example
jsonpath queries which could be used against it.  These will produce
results for the reader wishing to test them out until this example:

'$.track.segments[*].HR ? (@ > 130)'

This is because there is no HR value greater than 130.  May I propose
setting this and all similar examples to (@ > 120) instead?

Also, this example doesn't work:

'$.track ? (@.segments[*] ? (@.HR > 130)).segments.size()'

This gives me:

psql: ERROR:  syntax error, unexpected $end at end of jsonpath input
LINE 13: }','$.track ? (@.segments[*]');
^

Thanks

Thom




Re: SQL/JSON path issues/questions

2019-06-17 Thread Thom Brown
On Fri, 14 Jun 2019 at 08:16, Kyotaro Horiguchi  wrote:
>
> Hi, Thom.
>
> At Thu, 13 Jun 2019 14:59:51 +0100, Thom Brown  wrote
> in 
> > Hi,
> >
> > I've been reading through the documentation regarding jsonpath and
> > jsonb_path_query etc., and I have found it lacking explanation for
> > some functionality, and I've also had some confusion when using the
> > feature.
> >
> > ? operator
> > ==
> > The first mention of '?' is in section 9.15, where it says:
> >
> > "Suppose you would like to retrieve all heart rate values higher than
> > 130. You can achieve this using the following expression:
> > '$.track.segments[*].HR ? (@ > 130)'"
> >
> > So what is the ? operator doing here?  Sure, there's the regular ?
>
> It is described just above as:
>
> | Each filter expression must be enclosed in parentheses and
> | preceded by a question mark.

Can I suggest that, rather than using "question mark", we use the "?"
symbol, or provide a syntax structure which shows something like:

 ? 

This not only makes this key information clearer and more prominent,
but it also makes the "?" symbol searchable in a browser for anyone
wanting to find out what that symbol is doing.

> > operator, which is given as an example further down the page:
> >
> > '{"a":1, "b":2}'::jsonb ? 'b'
> >
> > But this doesn't appear to have the same purpose.
>
> The section is mentioning path expressions and the '?' is a jsonb
> operator. It's somewhat confusing but not so much comparing with
> around..
>
> > like_regex
> > ==
> > Then there's like_regex, which shows an example that uses the keyword
> > "flag", but that is the only instance of that keyword being mentioned,
> > and the flags available to this expression aren't anywhere to be seen.
>
> It is described as POSIX regular expressions. So '9.7.3 POSIX
> Regular Expressions' is that. But linking it would
> helpful. (attached 0001)
>
> > is unknown
> > ==
> > "is unknown" suggests a boolean output, but the example shows an
> > output of "infinity".  While I understand what it does, this appears
> > inconsistent with all other "is..." functions (e.g. is_valid(lsn),
> > pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
> > pg_is_in_backup() etc.).
>
> It's the right behavior. Among them, only "infinity" gives
> "unknown' for the test (@ > 0). -1 gives false, 2 and 3 true.

I still find it counter-intuitive.
>
> > $varname
> > ==
> > The jsonpath variable, $varname, has an incomplete description: "A
> > named variable. Its value must be set in the PASSING clause of an
> > SQL/JSON query function. for details."
>
> Yeah, it is apparently chopped amid. In the sgml source, the
> missing part is "", and the PASSING clause is
> not implemented yet. On the other hand a similar stuff is
> currently implemented as vas parameter in some jsonb
> functions. Linking it to there might be helpful (Attached 0002).
>
>
> > Binary operation error
> > ==
> > I get an error when I run this query:
> >
> > postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
> > psql: ERROR:  right operand of jsonpath operator + is not a single numeric 
> > value
> >
> > While I know it's correct to get an error in this scenario as there is
> > no element beyond 0, the message I get is confusing.  I'd expect this
> > if it encountered another array in that position, but not for
> > exceeding the upper bound of the array.
>
> Something like attached makes it clerer? (Attached 0003)
>
> | ERROR:  right operand of jsonpath operator + is not a single numeric value
> | DETAIL:  It was an array with 0 elements.

My first thought upon seeing this error message would be, "I don't see
an array with 0 elements."

>
> > Cryptic error
> > ==
> > postgres=# SELECT jsonb_path_query('[1, "2",
> > {},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].type()');
> > psql: ERROR:  syntax error, unexpected ANY_P at or near "**" of jsonpath 
> > input
> > LINE 1: ...},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].typ...
> >  ^
> > Again, I expect an error, but the message produced doesn't help me.
> > I'll remove the ANY_P if I can find it.
>
> Yeah, I had a similar error:
>
> =# select jsonb_path_query('[-1,2,7, &q

SQL/JSON path issues/questions

2019-06-13 Thread Thom Brown
Hi,

I've been reading through the documentation regarding jsonpath and
jsonb_path_query etc., and I have found it lacking explanation for
some functionality, and I've also had some confusion when using the
feature.

? operator
==
The first mention of '?' is in section 9.15, where it says:

"Suppose you would like to retrieve all heart rate values higher than
130. You can achieve this using the following expression:
'$.track.segments[*].HR ? (@ > 130)'"

So what is the ? operator doing here?  Sure, there's the regular ?
operator, which is given as an example further down the page:

'{"a":1, "b":2}'::jsonb ? 'b'

But this doesn't appear to have the same purpose.


like_regex
==
Then there's like_regex, which shows an example that uses the keyword
"flag", but that is the only instance of that keyword being mentioned,
and the flags available to this expression aren't anywhere to be seen.


is unknown
==
"is unknown" suggests a boolean output, but the example shows an
output of "infinity".  While I understand what it does, this appears
inconsistent with all other "is..." functions (e.g. is_valid(lsn),
pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
pg_is_in_backup() etc.).


$varname
==
The jsonpath variable, $varname, has an incomplete description: "A
named variable. Its value must be set in the PASSING clause of an
SQL/JSON query function. for details."


Binary operation error
==
I get an error when I run this query:

postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
psql: ERROR:  right operand of jsonpath operator + is not a single numeric value

While I know it's correct to get an error in this scenario as there is
no element beyond 0, the message I get is confusing.  I'd expect this
if it encountered another array in that position, but not for
exceeding the upper bound of the array.


Cryptic error
==
postgres=# SELECT jsonb_path_query('[1, "2",
{},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].type()');
psql: ERROR:  syntax error, unexpected ANY_P at or near "**" of jsonpath input
LINE 1: ...},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].typ...
 ^
Again, I expect an error, but the message produced doesn't help me.
I'll remove the ANY_P if I can find it.


Can't use nested arrays with jsonpath
==

I encounter an error in this scenario:

postgres=# select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == [1,2])');
psql: ERROR:  syntax error, unexpected '[' at or near "[" of jsonpath input
LINE 1: select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == ...

So these filter operators only work with scalars?


Thanks

Thom




Re: Translations contributions urgently needed

2018-02-23 Thread Thom Brown
On 23 February 2018 at 04:04, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Please join pgsql-translat...@postgresql.org.
>
> What surprises me about this thread is that apparently the sad state
> of the v10 translations wasn't already discussed on that list?
>
> I have no objection to calling for more translation volunteers on
> this list --- in fact, probably it'd be a good idea to call for
> more help on pgsql-general, too.  But it doesn't seem like it's
> quite on-topic for -hackers otherwise.

Something that isn't clear to me is, for a language that didn't meet
80% translation for a component, if it does reach 80% after the major
version release, does it then get shipped in a minor release, or is
out of that version completely until the next major version?

Thom



Re: Translations contributions urgently needed

2018-02-22 Thread Thom Brown
On 22 February 2018 at 17:24, Magnus Hagander <mag...@hagander.net> wrote:
> On Thu, Feb 22, 2018 at 6:20 PM, Thom Brown <t...@linux.com> wrote:
>>
>> Hi,
>>
>> I have found that Japanese language support for the database server
>> has been dropped for 10.  This is because it fell below the 80% of
>> strings translated requirement, so it was shipped without Japanese.
>> This isn't true of all components, but it seems quite alarming that
>> we've pushed out PostgreSQL 10 with no language support for a country
>> that has contributed a significant amount to the project, and has a
>> relatively large number of users.
>>
>> The database server also dropped support for Indonesian and Portugese
>> (Brazil).  In fact, just between 9.6 and 10, the following language
>> support was dropped for these components:
>>
>>  cs   | plpython
>>  de   | pg_resetxlog
>>  es   | pg_resetxlog
>>  fr   | pg_resetxlog
>>  id   | postgres
>>  it   | pg_resetxlog
>>  ja   | pg_basebackup,pg_resetxlog,plpython,pltcl,postgres,psql
>>  ko   | pg_resetxlog
>>  pl   | pg_resetxlog
>>  pt_BR| pg_basebackup,pg_resetxlog,pltcl,postgres
>>  ru   | pg_resetxlog
>>  sv   | pg_resetxlog
>>  tr   | plperl
>>  zh_CN| pg_basebackup,pg_resetxlog,pltcl
>>  zh_TW| plperl
>
>
> Arent all those pg_resetxlog entries because it was renamed to pg_resetwal?
> Is that because they were actually not updated for the new name, or is it a
> reporting side effect?

Oh yes, okay, that's true.  I guess we can ignore those, so it's a
little less dire for 9.6 to 10 then.  We get this instead, which is
still not good:

 cs   | plpython
 id   | postgres
 it   | pg_resetxlog
 ja   | pg_basebackup,plpython,pltcl,postgres,psql
 pt_BR| pg_basebackup,pltcl,postgres
 tr   | plperl
 zh_CN| pg_basebackup,pltcl
 zh_TW| plperl

Thom



Translations contributions urgently needed

2018-02-22 Thread Thom Brown
Hi,

I have found that Japanese language support for the database server
has been dropped for 10.  This is because it fell below the 80% of
strings translated requirement, so it was shipped without Japanese.
This isn't true of all components, but it seems quite alarming that
we've pushed out PostgreSQL 10 with no language support for a country
that has contributed a significant amount to the project, and has a
relatively large number of users.

The database server also dropped support for Indonesian and Portugese
(Brazil).  In fact, just between 9.6 and 10, the following language
support was dropped for these components:

 cs   | plpython
 de   | pg_resetxlog
 es   | pg_resetxlog
 fr   | pg_resetxlog
 id   | postgres
 it   | pg_resetxlog
 ja   | pg_basebackup,pg_resetxlog,plpython,pltcl,postgres,psql
 ko   | pg_resetxlog
 pl   | pg_resetxlog
 pt_BR| pg_basebackup,pg_resetxlog,pltcl,postgres
 ru   | pg_resetxlog
 sv   | pg_resetxlog
 tr   | plperl
 zh_CN| pg_basebackup,pg_resetxlog,pltcl
 zh_TW| plperl

This is a huge amount of compared to what happened between 9.5 and 9.6:

 cs   | psql

between 9.4 and 9.5:

 cs   | pltcl
 ro   | pltcl
 tr   | libpq,pltcl
 zh_TW| pltcl,psql

and between 9.3 and 9.4:

 cs   | pg_basebackup,pg_resetxlog
 ja   | pg_basebackup,pg_resetxlog
 zh_TW| pg_ctl,postgres


There are many translations that are on the verge of falling under
80%.  Japanese alone has 6 components between 80-83%, but other
languages are in a similar position.

I feel this is something that we, as a community, need to address.
Unfortunately, some of us aren't in a position to contribute to such
an effort.  We need folk to step forward to help get these
translations updated, or this situation will likely only get worse.

Thanks

Thom