ERROR: posting list tuple with 2 items cannot be split at offset 17

2023-02-08 Thread Paul McGarry
I have three databases, two of databases where I am experiencing the issue
below.

The first database was created from a dump in Feb 2022 (a few weeks after
the time period for which I seem to have problematic indexes, maybe).
The second database was then cloned from the first (ie filesystem level
copy) soon after.
Since then all databases have undergone a number of minor version upgrades,
including to 13.9 and an OS update last week for the two problem databases
(the other is still on 13.8).

Now, a process which does clears some data > 13 months old is getting an
error when trying to do that update.

My suspicion is that either:
- there was probably an issue with the index 12 months ago and that problem
was copied when I cloned the database, and is just becoming apparent now a
script is accessing 13 month olf data.
- something in our recent upgrade has caused the problem.

The third database is still on 13.8, and with some OS updates pending, and
is not experiencing the problem.

The problem emerges as:


UPDATE widget SET description=NULL WHERE (time>='2022-01-07
17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08
17:40:05.780573+00' AND description IS NOT NULL);
ERROR:  posting list tuple with 2 items cannot be split at offset 17


A select on the same data works fine, so presumably a problem updating the
index, not accessing it or the corresponding table):


db=> select count(*) from widget where (time>='2022-01-07
17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08
17:40:05.780573+00' AND description IS NOT NULL);
 count

 797943



The index used as per explain:

 explain  UPDATE widget SET description=NULL WHERE (time>='2022-01-07
17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08
17:40:05.780573+00' AND description IS NOT NULL);

  QUERY PLAN
--
 Update on widget  (cost=0.57..2921626.80 rows=205910 width=1066)
   ->  Index Scan using widget_time_client_idx on widget
 (cost=0.57..2921626.80 rows=205910 width=1066)
 Index Cond: (("time" >= '2022-01-03 17:40:05.140287'::timestamp
without time zone) AND ("time" < '2022-01-08 17:40:05.780573+00'::timestamp
with time zone))
 Filter: (description IS NOT NULL)
(4 rows)


amcheck attempted on that index, but doesn't seem to identify any issues:


db=> SELECT bt_index_check('widget_time_client_idx',true);
DEBUG:  verifying consistency of tree structure for index
"widget_time_client_idx"
DEBUG:  verifying level 3 (true root level)
DEBUG:  verifying level 2
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)
DEBUG:  verifying that tuples from index "widget_time_client_idx" are
present in "widget"
DEBUG:  finished verifying presence of 639872196 tuples from table "widget"
with bitset 25.94% set
 bt_index_check


(1 row)

db=> SELECT bt_index_parent_check('widget_time_client_idx',true,true);
DEBUG:  verifying consistency of tree structure for index
"widget_time_client_idx" with cross-level checks
DEBUG:  verifying level 3 (true root level)
DEBUG:  verifying level 2
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)

DEBUG:  verifying that tuples from index "widget_time_client_idx" are
present in "widget"
DEBUG:  finished verifying presence of 639874864 tuples from table "widget"
with bitset 25.94% set
 bt_index_parent_check
---

(1 row)



We recreated that index and deleted the old index, and the update then
worked.
I've done that on one of the databases so far.

Despite that working, it then occurred to me that the problem might be due
to problems updating a different index on the same table.

I then found on the two problem DBs (but not the 13.8 one):


db=> select bt_index_check('widget_name_idx');
ERROR:  item order invariant violated for index "widget_name_idx"
DETAIL:  Lower index tid=(682201,85) (points to index tid=(682278,4097))
higher index tid=(682201,86) (points to index tid=(716079,1)) page
lsn=580/E554A858.



db=> select bt_index_check('widget_name_idx');
ERROR:  item order invariant violated for index "widget_name_idx"
DETAIL:  Lower index tid=(682201,49) (points to index tid=(682245,1))
higher index tid=(682201,50) (points to index tid=(734398,1)) page
lsn=566/E67C5FF0.


which as a text field, seems more likely to be the result of a collation
change problem that might accompany an OS update.

But if it is the problem, why did the update start working after I
recreated the other index?

I think I should now:
- recreate the widget_name_idx on the problem servers
- run bt_index_check across all other indexes

Any suggestions on what else I should look into, in particular anything I
should check before upgrading the remaining 13.8 DB to 13.9?

Thanks for any help,

Paul


Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?

2023-02-08 Thread Vladimir Sitnikov
Here's an interesting read on regex improvements in dot net 7

See "Goodbye, Boyer-Moore" where they drop Boyer-Moore and replace it with
vectorized search:

https://devblogs.microsoft.com/dotnet/regular-expression-improvements-in-dotnet-7/#vectorization

Vladimir

-- 
Vladimir


Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?

2023-02-08 Thread David Rowley
On Thu, 9 Feb 2023 at 14:49, Tom Lane  wrote:
>
> David Rowley  writes:
> > Tom's argument seems to think it's impossible, so if you find that
> > it's definitely not impossible, then you can assume he's wrong about
> > that.
>
> My point was that it seems like you'd need a separate BMH engine for
> each %-separated segment of the LIKE pattern.  I'm not quite clear on
> whether BMH can handle '_' (single-char wildcard) conveniently by
> itself, although my gut feel is that you can probably make that part
> work.  Maybe you can even extend the idea to embedded %, but that
> seems more difficult.

Yeah, I think to make it work with more complex patterns like
'%some%string%' or '%some_string%' you'd need to break that into
multiple independent searches for each portion between a wildcard
character.  For the former pattern, you'd need to do some final check
that ensures that the 2nd pattern was found in some position >= the
position of the 1st pattern + its length.  For the latter, the final
check would need to validate that the 2nd pattern was found at a
position of the first pattern + its length + 1. It's probably also
possible to make those patterns work when they don't contain the
leading and trailing % by checking that the first pattern is found at
position 0 and the end of the 2nd pattern is found at the end of the
search string.

However, I imagine going to the trouble of trying to make it work for
more complex patterns initially would be a bad idea.  I imagine there
are just too many cases where we could demonstrate performance
regressions and that would cause us to reject the patch.

David




Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?

2023-02-08 Thread Tom Lane
David Rowley  writes:
> On Thu, 9 Feb 2023 at 13:05, Martin L. Buchanan
>  wrote:
>> str LIKE '%foo%'
>> str ILIKE  '%foo%'
>> position('foo' in str) > 0
>> Is Boyer-Moore string searching now used by any of these three?

> We use a sort of "lossy" Boyer-Moore-Horspool algorithm. See
> text_position_setup() and text_position_next() in varlena.c (the lossy
> part comes from the skiptables possibly sharing the same entry for
> multiple characters as defined by what skiptablemask is set to).

Note that that's used only by the functions in that file; thus,
position() yes, but (I)LIKE no.

> Tom's argument seems to think it's impossible, so if you find that
> it's definitely not impossible, then you can assume he's wrong about
> that.

My point was that it seems like you'd need a separate BMH engine for
each %-separated segment of the LIKE pattern.  I'm not quite clear on
whether BMH can handle '_' (single-char wildcard) conveniently by
itself, although my gut feel is that you can probably make that part
work.  Maybe you can even extend the idea to embedded %, but that
seems more difficult.

Given that the fixed substrings of LIKE patterns are usually rather
short, I'm unconvinced that BMH would buy much compared to its
setup costs.  But hey, prove me wrong.

(One way to amortize the setup costs could be to cache pre-built
BMH data structures keyed by the pattern strings, in a similar fashion
to what utils/adt/regexp.c does for compiled regular expressions.)

> ... Getting
> stuff committed that causes performance regressions in some cases and
> wins in other cases can be a pretty difficult and frustrating process.
> You have to remember, even if you think the slowdown is some corner
> case that only applies ~1% of the time, for some users in the real
> world, that might be 100% of their queries.

Yeah, whatever the shape of the preprocessing might be, it seems
likely that it would be a net loss in some use-cases.  We do manage
to get past that --- the position() code didn't have BMH to start
with --- but it definitely requires solid evidence.

regards, tom lane




Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?

2023-02-08 Thread David Rowley
On Thu, 9 Feb 2023 at 13:05, Martin L. Buchanan
 wrote:
> For the common and simple cases of find this string anywhere in another 
> string:
>
> str LIKE '%foo%'
>
> str ILIKE  '%foo%'
>
> position('foo' in str) > 0
>
> Is Boyer-Moore string searching now used by any of these three?

We use a sort of "lossy" Boyer-Moore-Horspool algorithm. See
text_position_setup() and text_position_next() in varlena.c (the lossy
part comes from the skiptables possibly sharing the same entry for
multiple characters as defined by what skiptablemask is set to).

> I checked the PG documentation and found no info about this other than what 
> was in the Todo wiki, https://wiki.postgresql.org/wiki/Todo, under Functions. 
> Tom Lane gave a thumbs down to the idea back in 2008, but that was a long 
> time ago: 
> https://www.postgresql.org/message-id/27645.1220635...@sss.pgh.pa.us .

Tom's argument seems to think it's impossible, so if you find that
it's definitely not impossible, then you can assume he's wrong about
that.  I've not given it much thought, but I think providing there are
only leading and trailing %'s and no _'s that the LIKE would be
equivalent to checking if position(like_arg2 in like_arg1).

There are probably 2 ways that you could consider going about making this work:

Method 1: See if it's possible to rewrite the query to swap the LIKE
for position().

I'm really unsure where you'd want to put the logic that looks for
such patterns. I don't recall any code that swaps one operator out for
another.  It seems like you could only apply such a transformation if
the like pattern was a Const.  I'm not sure if we'd want to do
anything like introduce some special logic in oper() in parse_oper.c
to do such a rewrite. Given we allow users to define their own
datatypes and operators, it would seem a bit limiting to go and hard
code this transformation. I'm really not sure what it would take to
make such a thing expandable. Maybe some kind of rewrite function
could be defined in pg_operator that can try and fail to come up with
some more efficient alternative OpExpr.

You'd also need to consider that conditions such as: str LIKE 'foo%'
can use an index scan in some cases. If you were to rewrite in that
case you'd likely kill the performance of queries that could exploit
that feature.

Method 2: Adjust like_match.c to look for these patterns and use BMH
when possible.

The problem with this is that you'd need to check the pattern each
call to see if it's compatible. Since you need to check for %'s and
_'s in the middle of the string, then that means adding a complete
preprocessing step that would need to be done on every call to
MatchText(). If that passes then it's maybe going to be faster, but if
it fails then it'll certainly be slower. You might very well struggle
to convince people that this would be a good idea. It sounds like it
would be very easy to demonstrate performance regressions here just by
passing a pattern that always fails that preprocess step. Getting
stuff committed that causes performance regressions in some cases and
wins in other cases can be a pretty difficult and frustrating process.
You have to remember, even if you think the slowdown is some corner
case that only applies ~1% of the time, for some users in the real
world, that might be 100% of their queries.

There are probably other ways you could consider doing this, I just
can't think of them right now.

David




Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?

2023-02-08 Thread Martin L. Buchanan
In the PostgreSQL Todo wiki, Boyer-Moore string searching for LIKE is
mentioned as an outstanding item.

For the common and simple cases of find this string anywhere in another
string:

str LIKE '%foo%'

str ILIKE  '%foo%'

position('foo' in str) > 0

Is Boyer-Moore string searching now used by any of these three?

I checked the PG documentation and found no info about this other than what
was in the Todo wiki, https://wiki.postgresql.org/wiki/Todo, under
Functions. Tom Lane gave a thumbs down to the idea back in 2008, but that
was a long time ago:
https://www.postgresql.org/message-id/27645.1220635...@sss.pgh.pa.us .

Sincerely,

Martin L Buchanan
senior software engineer
Laramie, WY, USA


Re: How do a user-defined function that returns a table executes a query?

2023-02-08 Thread Katsuya Okizaki
Dear Laurenz,

Thank you for your help. As you suspected, I was seeking a way to view the
execution plan of a function defined in SQL. Your suggestion was exactly
what I needed and has been very helpful.

Additionally, I also appreciated learning about the debugging techniques
for PL/pgSQL. Thank you for your valuable advice.

Best regards,

Katsuya Okizaki

2023年2月8日(水) 19:18 Laurenz Albe :

> On Wed, 2023-02-08 at 08:49 +0900, Katsuya Okizaki wrote:
> > In a normal SQL, we can use the EXPLAIN command to view the execution
> plan.
> > However, in this case, I am not sure how a user-defined function work.
> >
> > If anyone has faced a similar situation and found a way to view the
> execution plan,
> > I would greatly appreciate your insights and suggestions.
>
> I am not sure which of the following you want:
>
> 1. Get the execution plan of SQL statements run inside a function:
>
>For that, you would use "auto_explain" with
> "auto_explain.log_nested_statements = on".
>
> 2. Get a trace of the execution of the function code itself:
>
>For PL/pgSQL, there is "pldebugger" (
> https://github.com/EnterpriseDB/pldebugger),
>which can be used together with pgAdmin.
>
>I usually sprinkle the code with RAISE NOTICE statements.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>


Re: How to create directory format backup

2023-02-08 Thread Tom Lane
Andrus  writes:
> This file size is only 6.2 GB. If custom format is used, pg_dump creates 
> large file without problems. There are no file size limits. Error 
> message is not about this.

Are you certain that the pg_dump you're using is v15, and not something
pre-v14?  We got rid of the 4GB limit on stat() results in v14,
or so we thought.

Anyway, as far as I can find pg_dump can only reach that error message
via fsync_dir_recurse(), which suggests that a workaround might be
to add --no-sync so it doesn't try to fsync its output.

regards, tom lane




Re: How do a user-defined function that returns a table executes a query?

2023-02-08 Thread Katsuya Okizaki
Dear David,

Thank you for your prompt reply.

This is exactly what I was looking for. Thank you for your help.

Best regards,
Katsuya Okizaki

2023年2月8日(水) 8:51 David G. Johnston :

> On Tue, Feb 7, 2023 at 4:49 PM Katsuya Okizaki 
> wrote:
>
>> In a normal SQL, we can use the EXPLAIN command to view the execution
>> plan. However, in this case, I am not sure how a user-defined function work.
>>
>> If anyone has faced a similar situation and found a way to view the
>> execution plan, I would greatly appreciate your insights and suggestions.
>>
>>
> https://www.postgresql.org/docs/current/auto-explain.html
>
> David J.
>


Re: How to create directory format backup

2023-02-08 Thread Erik Wienhold
> On 08/02/2023 22:37 CET Andrus  wrote:
>
> > > How to create backup in format from which tables can selectively
> > > restored?
> > >
> > Dump as custom-format archive (-F custom) and use that with pg_restore
> > and options --table or --list/--use-list to select what should be
> > restored.
> >
> How to select tables interactively like pgAdmin allows to select when
> directory format is used ?
>
> Database contains hundreds of schemas. I need to restore public and other
> other schema.
>
> Whole backup file is scanned to restore only two schemas. It takes lot of
> time.

pg_dump also accepts options --table and --schema to only dump what you need.

> Also directory format allows to use all cores with --jobs=32 parameter.
> Dump and partial restore using custom format are much slower.

Run multiple pg_dump processes in parallel where each processes a subset of
tables with the options mentioned above.

--
Erik




Re: How to use the BRIN index properly?

2023-02-08 Thread Ron
1. The whole index does not need to fit in memory, just the parts of it you 
need at that time.
2. Partition the table by the primary key.  Each index will be *much* 
smaller, since each child will be smaller.


On 2/8/23 16:14, Siddharth Jain wrote:

OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is 
it worth creating it at all? Are there any established patterns here?


On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus  wrote:



> On Feb 8, 2023, at 13:17, Siddharth Jain  wrote:
>
> As I explained in my question that is indeed our dilemma. Our
insertion order will not be equal to index order. i.e., referring to
your response:
>
> > who's data is added in the same order as the key in the BRIN index
>
> does NOT hold.

A BRIN index is not a good choice in this case.  You can CLUSTER the
data on an index, but that's a one-time operation: PostgreSQL will not
maintain that order after the CLUSTER.  If the number of rows in the
table at the time of the CLUSTER is much larger than the number that
are inserted between CLUSTER operations, then a BRIN index might be
useful, but clustering a very large table is an expensive operation,
and requires an exclusive lock on the table while it is being done.



--
Born in Arizona, moved to Babylonia.

Re: How to use the BRIN index properly?

2023-02-08 Thread Christophe Pettus



> On Feb 8, 2023, at 14:14, Siddharth Jain  wrote:
> 
> If the B-Tree index will be so large that it cannot fit in memory, then is it 
> worth creating it at all? 

Yes.  Of course, more memory is better, and more recently versions of 
PostgreSQL have optimizations that are valuable for large B-tree indexes.



Re: How to use the BRIN index properly?

2023-02-08 Thread Siddharth Jain
OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is
it worth creating it at all? Are there any established patterns here?

On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus  wrote:

>
>
> > On Feb 8, 2023, at 13:17, Siddharth Jain  wrote:
> >
> > As I explained in my question that is indeed our dilemma. Our insertion
> order will not be equal to index order. i.e., referring to your response:
> >
> > > who's data is added in the same order as the key in the BRIN index
> >
> > does NOT hold.
>
> A BRIN index is not a good choice in this case.  You can CLUSTER the data
> on an index, but that's a one-time operation: PostgreSQL will not maintain
> that order after the CLUSTER.  If the number of rows in the table at the
> time of the CLUSTER is much larger than the number that are inserted
> between CLUSTER operations, then a BRIN index might be useful, but
> clustering a very large table is an expensive operation, and requires an
> exclusive lock on the table while it is being done.


Re: How to create directory format backup

2023-02-08 Thread Andrus

Hi!



How to create backup in format from which tables can selectively restored?

Dump as custom-format archive (-F custom) and use that with pg_restore and
options --table or --list/--use-list to select what should be restored.


How to select tables interactively like pgAdmin allows to select when 
directory format is used ?


Database contains hundreds of schemas. I need to restore public and 
other other schema.


Whole backup file is scanned to restore only two schemas. It takes lot 
of time.


Also directory format allows to use all cores with --jobs=32 parameter. 
Dump and partial restore using custom format are much slower.


Andrus.



--
Erik

Re: How to use the BRIN index properly?

2023-02-08 Thread Christophe Pettus



> On Feb 8, 2023, at 13:17, Siddharth Jain  wrote:
> 
> As I explained in my question that is indeed our dilemma. Our insertion order 
> will not be equal to index order. i.e., referring to your response:
> 
> > who's data is added in the same order as the key in the BRIN index
> 
> does NOT hold. 

A BRIN index is not a good choice in this case.  You can CLUSTER the data on an 
index, but that's a one-time operation: PostgreSQL will not maintain that order 
after the CLUSTER.  If the number of rows in the table at the time of the 
CLUSTER is much larger than the number that are inserted between CLUSTER 
operations, then a BRIN index might be useful, but clustering a very large 
table is an expensive operation, and requires an exclusive lock on the table 
while it is being done.



Re: How to create directory format backup

2023-02-08 Thread Erik Wienhold
> On 08/02/2023 21:59 CET Andrus  wrote:
>
> How to create backup in format from which tables can selectively restored?

Dump as custom-format archive (-F custom) and use that with pg_restore and
options --table or --list/--use-list to select what should be restored.

--
Erik




Re: How to use the BRIN index properly?

2023-02-08 Thread Siddharth Jain
As I explained in my question that is indeed our dilemma. Our insertion
order will not be equal to index order. i.e., referring to your response:

> who's data is added in the same order as the key in the BRIN index

does NOT hold.

On Wed, Feb 8, 2023 at 12:27 PM Ron  wrote:

> Is the data in your tables stored in natural correlation with those
> *three* columns?  I'm dubious that can even happen.
>
> BRIN is best for *range queries* on tables who's data is added in the
> same order as the key in the BRIN index (for example, a BRIN index on a
> timestamp field in a log table where new records are always being appended
> in "timestamp" order).
>
> It would also be great for history tables where you can pre-sort the data
> by, for example, customer_id, and then put the BRIN on customer_id.
>
> On 2/8/23 13:58, Siddharth Jain wrote:
>
> our insertion order is of course != index order otherwise the question
> would have been trivial.
> we use postgres 14
>
> On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain  wrote:
>
>> Hello,
>>
>> We have large tables with billions of rows in them and want to take
>> advantage of the BRIN index on them.
>>
>> Issues we are facing:
>>
>>- as I understand, BRIN index is useful only if the data is stored in
>>index order. As an example we want to create a composite BRIN index on 3
>>columns - integers and strings (varchar). How can we tell Postgres to 
>> store
>>data in index order as new records are inserted into the database?
>>- i understand that turning on autosummarize will keep the index
>>fresh and up-to-date as new records are inserted. is this correct?
>>
>> Thanks for your help.
>>
>> S.
>>
>
> --
> Born in Arizona, moved to Babylonia.
>


Re: How to create directory format backup

2023-02-08 Thread Andrus

Hi!

>Looks like your filesystem on client is having limits on file sizes. 
Use better filesystem, or just dump on linux, it's filesystems usually 
don't hit these limits.


This file size is only 6.2 GB. If custom format is used, pg_dump creates 
large file without problems. There are no file size limits. Error 
message is not about this.


Backup client is running in Windows 11 and this cannot changed.

How to create backup in format from which tables can selectively restored?

Posted also in 
https://stackoverflow.com/questions/75387616/how-to-make-directory-format-backup


Andrus.



Re: Sequence vs UUID

2023-02-08 Thread Miles Elam
On Wed, Feb 8, 2023 at 11:56 AM Kirk Wolak  wrote:
>
> CREATE FUNCTION generate_ulid() RETURNS uuid
> LANGUAGE sql
> RETURN ((lpad(to_hex((floor((EXTRACT(epoch FROM clock_timestamp()) * 
> (100)::numeric)))::bigint), 14, '0'::text)
>   || encode(gen_random_bytes(9), 'hex'::text)))::uuid;

You can save yourself some CPU by skipping the extra cast, omitting
the lpad, to_hex, and floor, and just grabbing the bytes from the
bigint directly along with the random part since bytea can be
concatenated.

SELECT encode(int8send((EXTRACT(epoch FROM clock_timestamp()) *
100)::bigint)
  || gen_random_bytes(8), 'hex')::uuid
;

Note that you refer to it as a ULID, but it is stored as a UUID.
Hopefully nothing downstream ever relies on UUID versioning/spec
compliance. Now that I think of it, I could probably speed up my
tagged interval UUID implementation using some of this at the expense
of configurability.




Re: How to use the BRIN index properly?

2023-02-08 Thread Ron
Is the data in your tables stored in natural correlation with those *three* 
columns?  I'm dubious that can even happen.


BRIN is best for *range queries* on tables who's data is added in the same 
order as the key in the BRIN index (for example, a BRIN index on a timestamp 
field in a log table where new records are always being appended in 
"timestamp" order).


It would also be great for history tables where you can pre-sort the data 
by, for example, customer_id, and then put the BRIN on customer_id.


On 2/8/23 13:58, Siddharth Jain wrote:
our insertion order is of course != index order otherwise the question 
would have been trivial.

we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain  wrote:

Hello,

We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.

Issues we are facing:

  * as I understand, BRIN index is useful only if the data is stored
in index order. As an example we want to create a composite BRIN
index on 3 columns - integers and strings (varchar). How can we
tell Postgres to store data in index order as new records are
inserted into the database?
  * i understand that turning on autosummarize will keep the index
fresh and up-to-date as new records are inserted. is this correct?

Thanks for your help.

S.



--
Born in Arizona, moved to Babylonia.

Re: How to use the BRIN index properly?

2023-02-08 Thread Siddharth Jain
our insertion order is of course != index order otherwise the question
would have been trivial.
we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain  wrote:

> Hello,
>
> We have large tables with billions of rows in them and want to take
> advantage of the BRIN index on them.
>
> Issues we are facing:
>
>- as I understand, BRIN index is useful only if the data is stored in
>index order. As an example we want to create a composite BRIN index on 3
>columns - integers and strings (varchar). How can we tell Postgres to store
>data in index order as new records are inserted into the database?
>- i understand that turning on autosummarize will keep the index fresh
>and up-to-date as new records are inserted. is this correct?
>
> Thanks for your help.
>
> S.
>


Re: Sequence vs UUID

2023-02-08 Thread Kirk Wolak
On Wed, Feb 8, 2023 at 4:18 AM veem v  wrote:

> Thank you So much all for such valuable feedback.
> ..
> So wanted to know from experts here, is there really exists any scenario
> in which UUID really cant be avoided?
>
> Funny you are asking about this.  My recent experience is that UUIDs
really get crushed on performance in medium (> 5 million rows) tables.
I found an article by Dave Allie on ULID, and I modified his implementation
to create a timestamp(6) (microsecond level) sequenced version.

Doing an article on this soon.  But WITHOUT calling the "gen_random_bytes"
I can generate 2 timestamps at the same microsecond level.
Once that call is included in the function, I've never been close to
returning 2 timestamps at the same microsecond level.  Although I did not
run this on multiple threads.  This fit our needs for an efficient UUID
formatted key...

9 Bytes (18 Hex Digits) of Randomness at the far right.

Oh, and some time after the year 10,000 you will get some wrap around...
But I expect 256 bit UUIDs will take over before then.


CREATE FUNCTION generate_ulid() RETURNS uuid
LANGUAGE sql
RETURN ((lpad(to_hex((floor((EXTRACT(epoch FROM clock_timestamp()) *
(100)::numeric)))::bigint), 14, '0'::text)

  || encode(gen_random_bytes(9), 'hex'::text)))::uuid;


How to use the BRIN index properly?

2023-02-08 Thread Siddharth Jain
Hello,

We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.

Issues we are facing:

   - as I understand, BRIN index is useful only if the data is stored in
   index order. As an example we want to create a composite BRIN index on 3
   columns - integers and strings (varchar). How can we tell Postgres to store
   data in index order as new records are inserted into the database?
   - i understand that turning on autosummarize will keep the index fresh
   and up-to-date as new records are inserted. is this correct?

Thanks for your help.

S.


How to verify postrgresql download on windows?

2023-02-08 Thread Ali M.
Hi the Postgresql binaries and installer are provided by EDB
How can i verify the download

Usually other projects provide a SHA256 hash list to verify the downloads
EDB doesnt seem to provide that, so how else can i verify the download

Thanks
Ali


Re: How to create directory format backup

2023-02-08 Thread hubert depesz lubaczewski
On Wed, Feb 08, 2023 at 05:00:10PM +0200, Andrus wrote:
> Hi!
> 
> Creating backup in directory format using
> 
>     pg_dump -f "sba" -Fdirectory --jobs=32 --verbose sba
> 
> throws error
> 
>     pg_dump: error: could not stat file "sba/282168.data.gz": value too
> large
> 
> How to fix it ?
> 
> Server is Postgres 12 running in Debian Linux 10 under WSL
> Client is pg_dump from Postgres 15 running in Windows 11

Looks like your filesystem on client is having limits on file sizes.

Use better filesystem, or just dump on linux, it's filesystems usually
don't hit these limits.

Best regards,

depesz





Re: Need help debugging slow logical replication

2023-02-08 Thread sunyuc...@gmail.com
OK: so I think I've got some new insight by using test_decoding to
peek into the changes being replicate, here is what i think is
happening:

- there is a background job generating a Huge transaction (10K
changes) one after another  that basically does DELETE on 1 table,
INSERT on 1 table and UPDATE on 1 table. Since this is a test
environment, this is the only TX being accumulated.
- since there is only 1 publication,  this  worker on primary
basically spend all its CPU decoding this large transaction then send
to subscriber to execute
   -  memory usage on logical decoding worker maybe a issue: as it
might have spilled to disk during decoding
   -  worker on subscriber basically spend all time applying this change

- I suspect  keep-alives are not sent during applying a transaction,
and the transaction apply time takes longer than wal_sender_timeout
(30seconds) , that's why we have to increase this number before.
- I think utilizing the new streaming feature for logical replication
in PG14 will improve performance here.
- I think the other thing that will help is to drop the index
temporarily on the subscriber to accelerate applying, but will need to
rebuild once close.

BTW: just to clarify Replica identity is not an issue for the table
that only receives INSERT, I tried to create a replica identity for
that table, and it didn't change anything recorded in WAL (as
expected), if anything it actually decreased performance since

What's puzzling for me is that why does logical apply being so slower
than physical apply (and also apparently physical streaming never
times out!)

On Tue, Feb 7, 2023 at 9:13 PM sunyuc...@gmail.com  wrote:
>
> ccing the mailist again in case someone else would have a idea how to debug:
>
> Here is what I see in the pg_locks on subscribe at all time:
>
> As you can see, it mostly concern following 3 tables, accounts,
> ledger, pending_ledger_fees, which I have tried analyze, vacuum them
> etc, none of them helped much:
>
> I do see in the log these tables are getting autovaccumed *very*
> frequently, is that a problem for logical replication?
>
> 2023-02-08 04:59:47 UTC::@:[13626]:LOG:  automatic vacuum of table
> "dev.public.accounts": index scans: 0
> pages: 0 removed, 71437 remain, 0 skipped due to pins, 11465 skipped frozen
> tuples: 0 removed, 3590987 remain, 3522495 are dead but not yet
> removable, oldest xmin: 2893907681
> index scan bypassed: 4 pages from table (0.01% of total) have 5 dead
> item identifiers
> I/O timings: read: 0.000 ms, write: 0.000 ms
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 120107 hits, 0 misses, 0 dirtied
> WAL usage: 1 records, 0 full page images, 708 bytes
> system usage: CPU: user: 0.23 s, system: 0.00 s, elapsed: 1.01 s
> 2023-02-08 04:59:47 UTC::@:[13626]:LOG:  automatic vacuum of table
> "dev.public.pending_fee_ledger": index scans: 0
> pages: 0 removed, 199818 remain, 0 skipped due to pins, 117769 skipped frozen
> tuples: 0 removed, 3670095 remain, 352 are dead but not yet
> removable, oldest xmin: 2893907681
> index scan not needed: 0 pages from table (0.00% of total) had 0 dead
> item identifiers removed
> I/O timings: read: 0.000 ms, write: 0.000 ms
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 164212 hits, 0 misses, 0 dirtied
> WAL usage: 0 records, 0 full page images, 0 bytes
> system usage: CPU: user: 0.27 s, system: 0.00 s, elapsed: 0.27 s
>
> Thanks
>
> On Tue, Feb 7, 2023 at 6:10 PM sunyuc...@gmail.com  
> wrote:
> >
> > Hi Justin, thanks for the response!
> >
> > > REPLICA IDENTITY DEFAULT   will only use primary keys,  if the publisher  
> > > includes those tables, the subscriber when  replaying the WAL will stop 
> > > throwing an error not knowing how to replay the UPDATE/DELETE.
> >
> > But I don't see any errors being thrown out in the postgresql logs?
> > Should I be seeing it complain there? Is postgresql falling back to
> > replica identity full here?
> >
> > However I checked that table, it doesn't seem to be making progress at
> > all: so I suspect you are right that it is the problem.
> >
> > > Logical Replication is most likely broken at this point.
> > >
> > > I suggest stopping logical replication and correcting tables that don't 
> > > have qualifying indexes for logical replication by creating the necessary 
> > > indexes and avoid using replica identity full.  Then restart logical 
> > > replication from the beginning.
> > >
> > >




How to create directory format backup

2023-02-08 Thread Andrus

Hi!

Creating backup in directory format using

    pg_dump -f "sba" -Fdirectory --jobs=32 --verbose sba

throws error

    pg_dump: error: could not stat file "sba/282168.data.gz": value too 
large


How to fix it ?

Server is Postgres 12 running in Debian Linux 10 under WSL
Client is pg_dump from Postgres 15 running in Windows 11

Andrus


Re: Index scan for PK constraint validation

2023-02-08 Thread David G. Johnston
On Wednesday, February 8, 2023, Philippe VIEGAS 
wrote:

> Hi all,
>
> I was wondering why the index statistics usage were not reflecting the
> index usage for primary key constraint validation ?
>
> When we create a table with a primary key, PostgreSQL creates a Btree
> index for the validation of this constraint : index is defined unique
> (indisunique=t) and primary (indisprimary=t) from the pg_index and pg_class
> catalogs.
>
> Inserting data on the above created table and selecting based on the
> primary key column will increment the idx_scan from `pg_stat_user_indexes`
> view.
>
>
The select produces the index scan, not the insert.  The insert doesn’t
scan, it populates and aborts if that population fails.  It doesn’t check
first.

David J.


Re: Get the sequence name corresponding to a GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY column

2023-02-08 Thread Sebastien Flaesch
Good point, thanks Peter!
Seb

From: Peter Eisentraut 
Sent: Wednesday, February 8, 2023 12:07 PM
To: Sebastien Flaesch ; 
pgsql-general@lists.postgresql.org 
Subject: Re: Get the sequence name corresponding to a GENERATED { ALWAYS | BY 
DEFAULT } AS IDENTITY column

EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

On 07.02.23 11:43, Sebastien Flaesch wrote:
> select '"'||ns.nspname||'"."'||p.relname||'"' tabname, c.attname colname

Just a side note: You might find the quote_ident() function useful.



Index scan for PK constraint validation

2023-02-08 Thread Philippe VIEGAS

Hi all,

I was wondering why the index statistics usage were not reflecting the 
index usage for primary key constraint validation ?


When we create a table with a primary key, PostgreSQL creates a Btree 
index for the validation of this constraint : index is defined unique 
(indisunique=t) and primary (indisprimary=t) from the pg_index and 
pg_class catalogs.


Inserting data on the above created table and selecting based on the 
primary key column will increment the idx_scan from 
`pg_stat_user_indexes` view.
But if we insert again in this table with a conflicting id, the primary 
key is being validated using the index as stated in our example :


ERROR: 23505: duplicate keyvalueviolates uniqueconstraint"testpkidx_pkey"
DETAIL: Key(id)=(1) already exists.
SCHEMANAME: public
TABLENAME: testpkidx
CONSTRAINTNAME: testpkidx_pkey
LOCATION: _bt_check_unique, nbtinsert.c:664

But checking again the `pg_stat_user_indexes` view, the idx_scan 
attribute does not seem to be incremented.


Is the statistics of index usage for constraint validation being 
computed elsewhere ?



Are we missing something ?


Thanks in advance.

Regards.
--
Philippe VIEGAS

Re: Sequence vs UUID

2023-02-08 Thread Peter J. Holzer
On 2023-02-08 14:48:03 +0530, veem v wrote:
> So wanted to know from experts here, is there really exists any scenario in
> which UUID really cant be avoided?

Probably not. The question is usually not "is this possible" but "does
this meet the requirements at acceptable cost".


> Sequence Number = n*d+m+offset. Where n is the sequence order number, d is the
> dimensions of the multi-master replication, m ranges from 0 to n-1 is the
> number assigned to each node in the replication, and offset is the number to
> offset the sequence numbers. 

Yes, you can do this. In fact, people (including me) have already done
this.

But it's relatively easy to mess this up:

Firstly, you have to make sure that d is larger than your number of
(active) replicas will ever be, but still small enough that you will
never overflow. Probably not a problem with 64 bit sequences (if you set
d to 1E6, you can still count to 9E12 on each node), but might be a
problem if you are for some reason limited to 32 bits.

Secondly (and IMHO more importantly) you have to make sure each node
gets its own unique offset. So this needs to be ensured during
deployment, but also during migrations, restores from backups and other
infrequent events.

With random Ids you don't have to worry about this.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Get the sequence name corresponding to a GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY column

2023-02-08 Thread Peter Eisentraut

On 07.02.23 11:43, Sebastien Flaesch wrote:

select '"'||ns.nspname||'"."'||p.relname||'"' tabname, c.attname colname


Just a side note: You might find the quote_ident() function useful.





Re: Switching identity column to serial

2023-02-08 Thread Peter Eisentraut

On 04.02.23 21:55, Erik Wienhold wrote:

  Why doesn't this work?
  BEGIN;
  DROP SEQUENCE t_id;


[This won't work, you need to use ALTER TABLE / DROP IDENTITY.]


  CREATE SEQUENCE new_t_id_seq AS INTEGER OWNED BY t.id;
  ALTER SEQUENCE new_t_id_seq OWNER TO new_owner;
  SELECT setval('new_t_id', (SELECT MAX(id) FROM t));
  SELECT nextval('new_t_id');
  COMMIT;

This should work but I want to preserve the existing sequence instead of
re-creating it with the same properties.  That's why I was looking for a
shortcut (also code golfing and sheer curiosity).


This is possible in principle, by implementing the inverse of the 
"Upgrading" recipe from 
. 
But if you don't want to figure that out, I think dropping and 
recreating the sequences as suggested here seems the best solution.






Re: How do a user-defined function that returns a table executes a query?

2023-02-08 Thread Laurenz Albe
On Wed, 2023-02-08 at 08:49 +0900, Katsuya Okizaki wrote:
> In a normal SQL, we can use the EXPLAIN command to view the execution plan.
> However, in this case, I am not sure how a user-defined function work.
> 
> If anyone has faced a similar situation and found a way to view the execution 
> plan,
> I would greatly appreciate your insights and suggestions.

I am not sure which of the following you want:

1. Get the execution plan of SQL statements run inside a function:

   For that, you would use "auto_explain" with 
"auto_explain.log_nested_statements = on".

2. Get a trace of the execution of the function code itself:

   For PL/pgSQL, there is "pldebugger" 
(https://github.com/EnterpriseDB/pldebugger),
   which can be used together with pgAdmin.

   I usually sprinkle the code with RAISE NOTICE statements.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Sequence vs UUID

2023-02-08 Thread veem v
Thank you So much all for such valuable feedback.

As "Julian" was pointing, I also tried to test the INSERT independently(as
in below test case) without keeping the "generate_series" in the inline
query. But in all the cases sequence is performing better as compared to
both UUID V4 and UUID V7. And same with Index access path observed i.e. an
index on sequence performs better as compared to an index on UUID column.
So i believe , its understood that the performance wise sequence is better
as compared to even sequential UUID (like UUID-V7). And it seems that, even
the UUID V7 maintains better locality because of its sequential nature, but
still it all may tied to the higher space/memory consumption of UUID
because of its bigger size as compared to sequence and thus the performance
hit.

But as i saw few comments in this discussion regarding the scenarios which
will mandate the usage of UUID like "multi-master replication", "sharding",
"Need to be able to move data between databases" Etc..So wanted to
understand , why cant we use sequences as PK in these scenarios? Say for
e.g. in case of multimaster replication we can use sequence someway as
below..,

So wanted to know from experts here, is there really exists any scenario in
which UUID really cant be avoided?

Sequence Number = n*d+m+offset. Where n is the sequence order number, d is
the dimensions of the multi-master replication, m ranges from 0 to n-1 is
the number assigned to each node in the replication, and offset is the
number to offset the sequence numbers.

For a 4-ways multi-master replication where m=4, y is in (0, 1, 2, 3), and
offset is 100.
Node #1 (m=0) :Sequence number = n*4+100
Node #2 (m=1): Sequence number = n*4+101
Node #3 (m=2): Sequence number = n*4+102
Node #4 (m=3): Sequence number = n*4+103

Each sequence will have:
100, 104, 108,112, 116, 120,...
101, 105, 109, 113, 117, 121,...
102, 106, 110, 114, 118, 122...
103, 107, 111, 115, 119, 123

* Test case *

CREATE UNLOGGED TABLE test_bigint ( id bigint PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid ( id uuid PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid7 ( id uuid PRIMARY KEY);
CREATE UNLOGGED TABLE test_bigint_1 ( id bigint PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid_1 ( id uuid PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid7_1 ( id uuid PRIMARY KEY);
create sequence myseq cache 32767;

*** Insert Test***
INSERT INTO test_bigint select nextval('myseq') from
generate_series(1,10);
INSERT INTO test_uuid select gen_random_uuid() from
generate_series(1,10);
INSERT INTO test_uuid7 select uuid_generate_v7() from
generate_series(1,10);


explain analyze INSERT INTO test_bigint_1 select id from test_bigint;

EXPLAIN
QUERY PLAN
Insert on test_bigint_1 (cost=0.00..1444.18 rows=0 width=0) (actual
time=220.689..220.690 rows=0 loops=1)
-> Seq Scan on test_bigint (cost=0.00..1444.18 rows=100118 width=8) (actual
time=0.012..17.488 rows=10 loops=1)
Planning Time: 0.137 ms
Execution Time: 220.714 ms

explain analyze INSERT INTO test_uuid_1 select id from test_uuid;

EXPLAIN
QUERY PLAN
Insert on test_uuid_1 (cost=0.00..1541.85 rows=0 width=0) (actual
time=311.949..311.950 rows=0 loops=1)
-> Seq Scan on test_uuid (cost=0.00..1541.85 rows=100085 width=16) (actual
time=0.010..20.173 rows=10 loops=1)
Planning Time: 0.082 ms
Execution Time: 311.973 ms


explain analyze INSERT INTO test_uuid7_1 select id from test_uuid7;

EXPLAIN
QUERY PLAN
Insert on test_uuid7_1 (cost=0.00..1541.85 rows=0 width=0) (actual
time=244.531..244.532 rows=0 loops=1)
-> Seq Scan on test_uuid7 (cost=0.00..1541.85 rows=100085 width=16) (actual
time=0.011..16.390 rows=10 loops=1)
Planning Time: 0.084 ms
Execution Time: 244.554 ms

Explain analyze select * from test_bigint where id in (select id from
test_bigint limit 10);

Nested Loop (cost=2692.77..3793.23 rows=50059 width=8) (actual
time=53.739..368.110 rows=10 loops=1)
-> HashAggregate (cost=2692.48..2694.48 rows=200 width=8) (actual
time=53.681..93.044 rows=10 loops=1)
Group Key: test_bigint_1.id
Batches: 5 Memory Usage: 11073kB Disk Usage: 208kB
-> Limit (cost=0.00..1442.48 rows=10 width=8) (actual
time=0.020..18.985 rows=10 loops=1)
-> Seq Scan on test_bigint test_bigint_1 (cost=0.00..1444.18 rows=100118
width=8) (actual time=0.019..11.330 rows=10 loops=1)
-> Index Only Scan using test_bigint_pkey on test_bigint (cost=0.29..6.53
rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=10)
Index Cond: (id = test_bigint_1.id)
Heap Fetches: 10
Planning Time: 0.373 ms
Execution Time: 373.440 ms
EXPLAIN

Explain analyze select * from test_uuid where id in (select id from
test_uuid limit 10);

QUERY PLAN
Nested Loop (cost=2790.96..4006.29 rows=50042 width=16) (actual
time=48.251..410.786 rows=10 loops=1)
-> HashAggregate (cost=2790.54..2792.54 rows=200 width=16) (actual
time=48.157..76.176 rows=10 loops=1)
Group Key: test_uuid_1.id
Batches: 1 Memory