Partitioning options

2024-02-07 Thread sud
Hi ,
We have a system which stores customers' transactions. There are a total of
~100K customers currently and the list will increase in future but not
drastically though(maybe ~50K more or so). The number of transactions per
day is ~400million. and we want to persist them in our postgres database
for ~5months.

The key transaction table is going to have ~450 Million transactions per
day and the data querying/filtering will always happen based on the
"transaction date" column. And mostly "JOIN" will be happening on the
"CUTSOMER_ID" column along with filters for some scenarios on customer_id
columns. Each one day worth of transaction consumes ~130GB of storage space
as we verified using the "pg_relation_size" function, for a sample data set.

As mentioned, there will be ~100K distinct "customer_id" but the data won't
be equally distributed , they will be skewed in nature for e.g. Some of the
big customers will hold majority of the transactions (say 20-30% of total
transactions) and other are distributed among others, but again not equally.

So my question was , in the above scenario should we go for a composite
partitioning strategy i.e range/hash (daily range partition by
transaction_date and hash subpartition by customer_id)?
OR
Should we go for simple daily range partitioning on the transaction_date
column?
OR
Range/list composite partitioning (range partition by transaction_date and
list subpartition by customer_id)?

Thanks and Regards
Sud


Re: How to do faster DML

2024-02-07 Thread Lok P
> On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane 
> wrote:
>
>> On Sun, Feb 4, 2024 at 3:52 PM Lok P  wrote:
>>
>>> What I mean was, we will definitely need the data for querying purposes
>>> by the users, but just to keep the storage space incontrol (and to help the
>>> read queries), we were thinking of having the historical partitions
>>> compressed. And for that , if any compression strategy should we follow on
>>> postgres?
>>>
>>
>> There is no compression strategy, per se. There are ways Postgres
>> internally compresses the data (see "TOAST"), but in general, the table is
>> either there or not. If space is a concern you should examine if you truly
>> need 127 columns, make sure they are arranged correctly (search for
>> 'postgres column tetris'), and move unused and older partitions elsewhere -
>> like a separate cheaper Postgres server, or something offline.
>>
>> Cheers,
>> Greg
>>
>>
Rearranging the table columns by typlen desc, didn't give much storage
space reduction.

So, I was trying TOAST compression by creating the table from scratch and
using the LZ4 algorithm defined on the column definition level just for
varchar type columns , as it seems this compression only works for varchar
and text columns. And the table had 7 columns defined as varchar out of
total 12 columns. I write the DDL something as below

Column1 varchar(50) compression(lz4) not null

However , when i loaded the table using INSERT AS SELECT from the main
table(which is uncompressed one) , i see the size of the compressed table
remains same and also i applied the function "pg_column_compression()" to
see if any column value is compressed using lz4, it returns all "null",
which means not compressed.

So it seems the compression does not apply for the rows inserted using
"CTAS" or "INSERT AS SELECT". Does that mean it is only applicable for the
row by row inserts but not batch inserts(which a bigger system normally
will have)? I was not expecting this though, so it was disappointing.

Regards
Lok


Re: archive command doesnt work

2024-02-07 Thread Laurenz Albe
On Wed, 2024-02-07 at 12:07 +0100, Alpaslan AKDAĞ wrote:
> We have a primary, a hot standby and 2 warm standby servers with 2 days delay.
> 
> After switchover since 01.02.2024 hot standby server does not archive wal 
> files. I couldn't find the problem. 
> 
> related postgresql.conf lines:
> archive_mode = on
> archive_command = 'cp %p /archive/archived_wal/%f && cp %p 
> /opt/postgres/backup/archived_wal/%f'
> 
> and in log files there are lines like below but no error or failure lines:
> 
> postgresql-2024-02-05_00.log:2024-02-05 06:25:50.272 CET [979081]   LOG:  
> restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added, 0 
> removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync 
> files=85, longest=0.001 s, average=0.001 s; distance=65484 kB, estimate=413488
>  kB
> postgresql-2024-02-05_00.log-2024-02-05 06:25:50.272 CET [979081]   LOG:  
> recovery restart point at 25C/74083E58
> postgresql-2024-02-05_00.log-2024-02-05 06:25:50.272 CET [979081]   
> DETAIL:  Last completed transaction was at log time 2024-02-05 06:25:50.223799
> +01.

That looks like the standby wasn't promoted and is still in recovery,
so it won't generate WAL.

Restartpoints are only written on servers in recovery mode.  Moreover,
a promote request would generate a log entry.

Another option would be that the promote request is further up in the log,
and the server takes a long time to replay all the changes
(if the "2 days delay" you mention are set in "recovery_min_apply_delay").

What do you get for

  SELECT pg_is_in_recovery();

Yours,
Laurenz Albe




Re: vacuum freeze wait_event BufferPin

2024-02-07 Thread abrahim abrahao

Thanks again Greg, I really appreciated all information.


On Friday, February 2, 2024 at 08:16:41 p.m. EST, Greg Sabino Mullane 
 wrote:  
 
 On Fri, Feb 2, 2024 at 3:25 PM abrahim abrahao  wrote:

 SHARE UPDATE EXCLUSIVE (ShareUpdateExclusiveLock) should not be blocked by 
ACCESS SHARE (AccessShareLock). Am I wrong about it? If I am not wrong why it 
still locking it? 

Those locks with no other context are somewhat  of a red herring. The important 
part is not that the AccessShare is somehow blocking ShareUpdateExclusive, but 
that the ShareUpdateExclusive process is NOT blocking new AccessShare 
processes! In the internals of postgres, vacuumlazy.c tries to grab a buffer 
lock (different concept from all the user-visible locks above). It politely 
tries to wait[1] until nobody else is grabbing it (aka pinning it), then 
proceeds. The problem is that other processes are allowed to come along and put 
a pin in it as well - the vacuum's shareupdateexclusive lock does not prevent 
that.
So the timeline is:
Process X runs a long select and pins the buffer
Process V runs a vacuum freeze and tries to lock the buffer. It detects other 
pins, so it waits. It assumes that whoever is holding the pin will release it 
someday.Process Y runs another long select and also pins the buffer.Process X 
ends, and removes its pins.Process V still cannot move - it just knows there 
are still pins. Where they come from does not matter.
As long as there is at least one other process holding a pin, the vacuum freeze 
cannot continue[2].
That's my understanding of the code, anyway. This could be argued as a bug. I 
am not sure what a solution would be. Cancelling user queries just for a vacuum 
would not be cool, but we could maybe pause future pin-creating actions somehow?
For the time being, forcing a super-quick moment of no table access would seem 
to be your best bet, as described earlier.
Cheers,Greg
[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/vacuumlazy.c;hb=HEAD#l975
See backend/storage/buffer/bufmgr.c for LockBufferForCleanup()
[2] Quick duplication script:drop table if exists foobar;
create table foobar as select 1 as id;
alter table foobar set (autovacuum_enabled = off);
update foobar set id = id;

Process 1:begin; select *, pg_sleep(11) from foobar;

Process 2:vacuum(freeze,verbose) foobar; /* blocked */

Process 3:begin; select *, pg_sleep(33) from foobar;

Run in order. Kill Process 1 and Process 2 is still blocked. Kill Process 3 and 
Process 2 finished the vacuum.Note that a regular vacuum (without a freeze) 
will not get blocked.
Cheers,Greg

  
  

Re: Error on create extension pgcrypto. Undefined symbol "EVP_bf_ecb"

2024-02-07 Thread Marco Aurelio
I changed the path and now the correct version of openssl appears in the
config.log, even so, after recompiling and reinstalling both postgresql and
the pgcrypto contrib, when trying to create the extension in a database I
continue to receive the same error, any further suggestions ?

Marco Aurélio Ventura da Silva
Analista de Sistemas / Desenvolvedor
marcoprod...@gmail.com
Prodata Informática e Cadastro LTDA
(33) 3322-6363


Em qua., 7 de fev. de 2024 às 12:12, Tom Lane  escreveu:

> Marco Aurelio  writes:
> > I added the include and libraries folders:
> > --with-libraries=/usr/local/lib --with-includes=/usr/local/include
> > but configure keeps finding the old version of openssl
>
> > configure:14133: checking for openssl
> > configure:14151: found /usr/bin/openssl
> > configure:14163: result: /usr/bin/openssl
> > configure:14183: using openssl: OpenSSL 1.1.1t-freebsd 7 Feb 2023
>
> You can ignore that particular result, because it doesn't matter
> as long as we're using the right headers and libraries.  (We knew
> when that check was put in that it would give some false positives.)
> Alternatively, change your PATH to put /usr/local/bin in front
> of /usr/bin.
>
> regards, tom lane
>


Re: Generic File Access Function to read program output

2024-02-07 Thread Carsten Klein

Am 07.02.2024 um 15:54 schrieb Joe Conway:


Maybe write your own in plpython or plperlu?



Yeah... why didn't I think of if? PL/Python would be a first class option.

Nevertheless, I still believe such a function in PostgreSQL's core would 
be a good addition. Maybe someone feels like implementing one some day... :)


Carsten




Re: Error on create extension pgcrypto. Undefined symbol "EVP_bf_ecb"

2024-02-07 Thread Tom Lane
Marco Aurelio  writes:
> I added the include and libraries folders:
> --with-libraries=/usr/local/lib --with-includes=/usr/local/include
> but configure keeps finding the old version of openssl

> configure:14133: checking for openssl
> configure:14151: found /usr/bin/openssl
> configure:14163: result: /usr/bin/openssl
> configure:14183: using openssl: OpenSSL 1.1.1t-freebsd 7 Feb 2023

You can ignore that particular result, because it doesn't matter
as long as we're using the right headers and libraries.  (We knew
when that check was put in that it would give some false positives.)
Alternatively, change your PATH to put /usr/local/bin in front
of /usr/bin.

regards, tom lane




Re: Generic File Access Function to read program output

2024-02-07 Thread Joe Conway

On 2/7/24 05:28, Carsten Klein wrote:

Hi there,

on PostgreSQL 14, I'm using function pg_read_file to read a JSON file on
the server. After that, the JSON file gets casted to jsonb and with
function jsonb_array_elements I'm iterating over the "records", which I
transform into a PostgreSQL ROWTYPE with jsonb_populate_record...

Since the source files are actually XML files, these are turned into
JSON files with Node JS and the fast-xml-parser module (processing JSON
is much faster and more comfortable than processing XML in PostgreSQL).

The command line of this conversion process is like this:

# node /opt/my_node_apps/xml_to_json.js 

In order to do this without temporary JSON files (which need to be
deleted at some time), it would be great to have a new Generic File
Access Function

pg_read_program_output(command)


Although one could argue, that it's not a Generic *File* Access
Function, that function would be a worthwhile addition and could use the
same semantics and rules as with the

COPY table_name FROM PROGRAM 'command'

statement. Also the implementation (running a command with the shell and
capture it's STDOUT) is nearly the same.

In contrast to the other Generic File Access Functions, it will be
almost impossible to restrict access to programs or commands within the
database cluster directory (could be a complex shell command). Aside
from that this makes no sense since, typically, there are no executable
programs in those directories.

Even worse, it's likely also not possible to restrict the source of the
content read (the STDOUT) to be any of these directories, since the
program could just dump anything to its STDOUT.

AFAIT, that's not really an issue but only makes this new Generic File
Access Function special, in that these restrictions and the meaning of
role pg_read_server_files just do not apply for it.

Do you know if there is already such a function, maybe provided by an
extension I do not yet know?



Maybe write your own in plpython or plperlu?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Error on create extension pgcrypto. Undefined symbol "EVP_bf_ecb"

2024-02-07 Thread Marco Aurelio
I added the include and libraries folders:
--with-libraries=/usr/local/lib --with-includes=/usr/local/include
but configure keeps finding the old version of openssl

configure:14133: checking for openssl
configure:14151: found /usr/bin/openssl
configure:14163: result: /usr/bin/openssl
configure:14183: using openssl: OpenSSL 1.1.1t-freebsd 7 Feb 2023


Marco Aurélio Ventura da Silva
Analista de Sistemas / Desenvolvedor
marcoprod...@gmail.com
Prodata Informática e Cadastro LTDA
(33) 3322-6363


Em qua., 7 de fev. de 2024 às 11:17, Daniel Gustafsson 
escreveu:

> > On 7 Feb 2024, at 15:09, Marco Aurelio  wrote:
>
> > When I run ./configure I see in config.log that it is always using the
> base version of Freebsd.
>
> You can use the --with-libraries and --with-includes arguments to
> ./configure
> in order to point it to the version you want to build against.
>
> --
> Daniel Gustafsson
>
>


Re: Error on create extension pgcrypto. Undefined symbol "EVP_bf_ecb"

2024-02-07 Thread Daniel Gustafsson
> On 7 Feb 2024, at 15:09, Marco Aurelio  wrote:

> When I run ./configure I see in config.log that it is always using the base 
> version of Freebsd.

You can use the --with-libraries and --with-includes arguments to ./configure
in order to point it to the version you want to build against.

--
Daniel Gustafsson





Error on create extension pgcrypto. Undefined symbol "EVP_bf_ecb"

2024-02-07 Thread Marco Aurelio
Dear Friends,

I have a Freebsd 13.2 server where I installed postgresql 16.1 via source
code, I installed the pgcrypto extension with the gmake install command
without error messages, but when trying to create the extension in a
database I receive the following message:
ERROR: could not load library "/usr/local/pgsql16/lib/pgcrypto.so":
/usr/local/pgsql16/lib/pgcrypto.so: Undefined symbol "EVP_bf_ecb"

Freebsd 13.2 has 2 versions of openssl installed, one base and the other
via pkg, the base version is:
OpenSSL 1.1.1t-freebsd 7 Feb 2023

while the version installed via pkg is:
OpenSSL 3.0.12 24 Oct 2023 (Library: OpenSSL 3.0.12 24 Oct 2023)

When I run ./configure I see in config.log that it is always using the base
version of Freebsd.

How to resolve this?


Marco Aurélio Ventura da Silva
Analista de Sistemas / Desenvolvedor
marcoprod...@gmail.com
Prodata Informática e Cadastro LTDA
(33) 3322-6363


Re: Why doesn't Postgres apply limit on groups when retrieving N results per group using WHERE IN + ORDER BY

2024-02-07 Thread David Rowley
On Tue, 6 Feb 2024 at 01:23, Sean v  wrote:
> SELECT "orders".*
> FROM "orders"
> WHERE (user_id IN ?, ?, ?)
> ORDER BY "orders"."created_at" LIMIT 50
>
> I have two indexes - `(user_id)` and `(user_id, created_at)`. Only the first 
> index is ever used with this query.

> I imagined that it would be clever enough to determine that only 50 results 
> are needed, and that it could use the `(user_id, created_at)` index to get 50 
> orders for each user. Then sort and filter those few hundred results in 
> memory.

It's as simple as the planner currently does not consider fetching 50
rows per user and doing a final sort before applying an overall LIMIT
50.

> I have found that I can speed this up significantly using CROSS JOIN LATERAL 
> and it will use the composite index, but I'm struggling to understand WHY the 
> CROSS JOIN LATERAL is needed here for it to use the index.

I'm afraid that's the best workaround until someone submits a patch to
have the planner consider doing this optimisation automatically.

> I've tried tweaking costs, disabling bitmap scans, etc, so it seems like this 
> is a functional limitation rather than something to do with cost/statistics.

No amount of that will get you the plan you want without the LATERAL JOIN query.

> So my question is twofold:
> - why doesn't Postgres use the composite index, and then retrieve only the 
> minimum necessary amount of rows (50 per user) using the query I posted above?
>
>  - If it is a functional limitation, is it lack of implementation, or is 
> there a deeper incompatibility with how the query planner works that would 
> prevent it from being able to do this?

Likely it wouldn't be too difficult to make the planner consider this
optimisation. However, for it to be valid, the ORDER BY clause would
have to contain only columns from the column(s) on the left side of
the IN clause.  I think likely this could be done by having the
planner consider performing a Nested Loop with an outer VALUES scan
and an inner parameterized index scan with a Limit node above it as a
path for scanning the base relation. The tricky part would be
adjusting the planner so it didn't needlessly leave the IN clause in
the WHERE clause when the chosen plan is the Nested Loop with the
values scan.  The current planner data structures are not really
geared up for optional base quals right now.  Something would need to
be done to make that work and off the top of my head, I don't know
what that would be.

David




archive command doesnt work

2024-02-07 Thread Alpaslan AKDAĞ
Hello

We have a primary, a hot standby and 2 warm standby servers with 2 days
delay.

After switchover since 01.02.2024 hot standby server does not archive wal
files. I couldn't find the problem.

Hot Standby Server infos:
Redhat Enterprise Linux 9.2
postgreSQL v13.10

related postgresql.conf lines:
archive_mode = on
archive_command = 'cp %p /archive/archived_wal/%f && cp %p
/opt/postgres/backup/archived_wal/%f'
archive_timeout = 300
wal_keep_size = 80
wal_level = 'replica'
wal_log_hints = on
max_wal_size = 10GB
min_wal_size = 80MB

archived WAL files per day in /archive/archived_wal/ folder. All of them
are before the switchover.
Number Day
421 2024-01-19
672 2024-01-20
   1374 2024-01-21
564 2024-01-22
569 2024-01-23
607 2024-01-24
510 2024-01-25
612 2024-01-26
495 2024-01-27
   1497 2024-01-28
556 2024-01-29
506 2024-01-30
549 2024-01-31
228 2024-02-01

and in log files there are lines like below but no error or failure lines:

2024-02-02 02:07:01.978 CET [979081]   LOG:  restartpoint complete: wrote
26904 buffers (2.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=445.
874 s, sync=0.005 s, total=445.887 s; sync files=88, longest=0.001 s,
average=0.001 s; distance=732035 kB, estimate=732035 kB
2024-02-02 02:07:01.978 CET [979081]   LOG:  recovery restart point at
252/2D7E5E68
2024-02-02 02:07:01.978 CET [979081]   DETAIL:  Last completed transaction
was at log time 2024-02-02 02:07:01.889865+01.
--
2024-02-02 02:07:02.024 CET [979081]   LOG:  restartpoint complete: wrote
310 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.037
s, sync=0.004 s, total=0.046 s; sync files=54, longest=0.002 s,
average=0.001 s; distance=90216 kB, estimate=667853 kB
2024-02-02 02:07:02.024 CET [979081]   LOG:  recovery restart point at
252/3328
2024-02-02 02:07:02.024 CET [979081]   DETAIL:  Last completed transaction
was at log time 2024-02-02 02:07:01.889865+01.

postgresql-2024-02-03_00.log:2024-02-03 09:03:20.383 CET [979081]
LOG:  restartpoint complete: wrote 102950 buffers (9.8%); 0 WAL file(s)
added,
 1286 removed, 117 recycled; write=1079.221 s, sync=0.008 s, total=1082.510
s; sync files=146, longest=0.001 s, average=0.001 s; distance=1143627 kB,
estimate=1143627 kB
postgresql-2024-02-03_00.log-2024-02-03 09:03:20.383 CET [979081]
LOG:  recovery restart point at 255/43D56D80
postgresql-2024-02-03_00.log-2024-02-03 09:03:20.383 CET [979081]
DETAIL:  Last completed transaction was at log time 2024-02-03
09:03:20.311619
+01.
--
postgresql-2024-02-03_00.log:2024-02-03 17:47:29.480 CET [979081]
LOG:  restartpoint complete: wrote 1305 buffers (0.1%); 0 WAL file(s)
added, 1
 removed, 3 recycled; write=131.069 s, sync=0.005 s, total=131.088 s; sync
files=104, longest=0.001 s, average=0.001 s; distance=65368 kB, estimate=13
9744 kB
postgresql-2024-02-03_00.log-2024-02-03 17:47:29.480 CET [979081]
LOG:  recovery restart point at 255/E40CE260
postgresql-2024-02-03_00.log-2024-02-03 17:47:29.480 CET [979081]
DETAIL:  Last completed transaction was at log time 2024-02-03
17:47:28.055165
+01.

--
postgresql-2024-02-05_00.log:2024-02-05 06:25:50.272 CET [979081]
LOG:  restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added,
0
removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync
files=85, longest=0.001 s, average=0.001 s; distance=65484 kB,
estimate=413488
 kB
postgresql-2024-02-05_00.log-2024-02-05 06:25:50.272 CET [979081]
LOG:  recovery restart point at 25C/74083E58
postgresql-2024-02-05_00.log-2024-02-05 06:25:50.272 CET [979081]
DETAIL:  Last completed transaction was at log time 2024-02-05
06:25:50.223799
+01.

and last thing that i want to share


select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(),
pg_last_xact_replay_timestamp() \gx
-[ RECORD 1 ]-+--
pg_last_wal_receive_lsn   | 261/C10BB318
pg_last_wal_replay_lsn| 261/C10BB318
pg_last_xact_replay_timestamp | 2024-02-07 11:40:57.536997+01

Everything seems fine except archiving.
for now i share these informations but if you need more, i can give.

Thanks in advance.
Best Regards
Alpaslan


Generic File Access Function to read program output

2024-02-07 Thread Carsten Klein

Hi there,

on PostgreSQL 14, I'm using function pg_read_file to read a JSON file on 
the server. After that, the JSON file gets casted to jsonb and with 
function jsonb_array_elements I'm iterating over the "records", which I 
transform into a PostgreSQL ROWTYPE with jsonb_populate_record...


Since the source files are actually XML files, these are turned into 
JSON files with Node JS and the fast-xml-parser module (processing JSON 
is much faster and more comfortable than processing XML in PostgreSQL).


The command line of this conversion process is like this:

# node /opt/my_node_apps/xml_to_json.js 

In order to do this without temporary JSON files (which need to be 
deleted at some time), it would be great to have a new Generic File 
Access Function


pg_read_program_output(command)


Although one could argue, that it's not a Generic *File* Access 
Function, that function would be a worthwhile addition and could use the 
same semantics and rules as with the


COPY table_name FROM PROGRAM 'command'

statement. Also the implementation (running a command with the shell and 
capture it's STDOUT) is nearly the same.


In contrast to the other Generic File Access Functions, it will be 
almost impossible to restrict access to programs or commands within the 
database cluster directory (could be a complex shell command). Aside 
from that this makes no sense since, typically, there are no executable 
programs in those directories.


Even worse, it's likely also not possible to restrict the source of the 
content read (the STDOUT) to be any of these directories, since the 
program could just dump anything to its STDOUT.


AFAIT, that's not really an issue but only makes this new Generic File 
Access Function special, in that these restrictions and the meaning of 
role pg_read_server_files just do not apply for it.


Do you know if there is already such a function, maybe provided by an 
extension I do not yet know?



Cheers
Carsten