Re: Query plans not identical for `id in(1,2,3)` and `(id=1 or id=2 or id=3)`
Laurenz thanks for the info. Thomas no I can't. Original Message From: Thomas Kellerer [mailto:sham...@gmx.net] Sent: Sunday, March 10, 2024 at 11:58 UTC To: pgsql-general@lists.postgresql.org Subject: Query plans not identical for `id in(1,2,3)` and `(id=1 or id=2 or id=3)` Ilya Basin schrieb am 09.03.2024 um 20:08: Hi List. I have a list of bigint keys and I need to retrieve rows by these keys. Normally, I would split this list into pages of size 900 and perform several `SELECT ... WHERE key in($1,$2,...)`. However, the proprietary ORM we use can only produce this SQL: `SELECT ... WHERE (key=$1 or key=$2 or ...)`. Surprisingly, PostgreSQL planner treats these two SQLs differently: - ```select * FROM "audittrail$referencelogline" where id in ( 1 , 2 , 3 )``` Index Scan https://i.stack.imgur.com/dr8oz.png - ```select * FROM "audittrail$referencelogline" where id = 1 or id = 2 or id = 3``` A lot of "Bitmap Index Scan" for each value https://i.stack.imgur.com/dnErs.png Is it possible to configure PostgreSQL 12.16 to treat the second query as the first? Can you convince your obfuscation layer to send an array value (containing all IDs) and change the query to: select * FROM "audittrail$referencelogline" where id = any(?)
Re: Question related to partitioning with pg_partman
On 3/10/24 11:34, sud wrote: On Sun, Mar 10, 2024 at 11:31 PM Adrian Klaver mailto:adrian.kla...@aklaver.com>> wrote: 1) The partition will be across one day(24 hours) it is just the times may confuse people. Per you example 2024-03-07 00:00:00+00 is the same time as 2024-03-06 19:00:00-05 for EST. The issue is that the +00 and -05 maybe ignored. Also it depends on the clients being consistent in using timestamptz. 2) You still have not answered what the datetime range(not date range) is that will be queried. If you have the partitions Midnight to Midnight UTC and the clients are querying Midnight to Midnight local time the query will not match the partitions. My apology if not able to clearly put the details. Actually, the query will always happen on a day basis i.e they can query from one day to 15 days transactions. But as you rightly pointed , the partitions can only span from midnight to midnight in one timezone, and thus users who queries the data from another time zone will mostly scan two partitions (even if they just queries one days transaction data in their own timezone). And I don't see an easy solution for this , which will help all users across all time zones to scan only a single partition in the database, when they queries data for a single transaction date. And thus my question was, is it necessary to have the creation of partitions to happen on UTC time zone only? and then whatever transaction data inserted by the users from respective time zones will be stored in the database as is and will be queried based on the user timezone (it may span across multiple partitions though for a single user transaction date). This is going to depend on many things. 1) Partitions are not free they have overhead, which is fine if the cost(overhead) is less then the benefits. For details on that see: https://www.postgresql.org/docs/current/ddl-partitioning.html and partition parts of https://www.postgresql.org/docs/current/sql-createtable.html As part of this there is the consideration of whether daily partitions are really what you want? 2) What you hope to get out of the partitioning? a) If it is confining queries to the partition boundaries then you have already stated that is not going to happen. b) If it is for data pruning purposes, then you have something to consider on both ends. Creating/dropping partitions with Midnight to Midnight UTC means you will need to consider whether they cover the range of datetimes that your users are interested in. In other words creating a partition ahead that covers local times that resolve to a UTC time in the 'future'. On the back end not dropping a partition until it has gone out of scope for everybody. To answer 1 & 2 you are probably going to need to create a test setup and verify how the expected queries are actually going to work with your partition scheme. -- Adrian Klaver adrian.kla...@aklaver.com
Re: Question related to partitioning with pg_partman
On Sun, Mar 10, 2024 at 11:31 PM Adrian Klaver wrote: > 1) The partition will be across one day(24 hours) it is just the times > may confuse people. Per you example 2024-03-07 00:00:00+00 is the same > time as 2024-03-06 19:00:00-05 for EST. The issue is that the +00 and > -05 maybe ignored. Also it depends on the clients being consistent in > using timestamptz. > > 2) You still have not answered what the datetime range(not date range) > is that will be queried. If you have the partitions Midnight to Midnight > UTC and the clients are querying Midnight to Midnight local time the > query will not match the partitions. > > My apology if not able to clearly put the details. Actually, the query will always happen on a day basis i.e they can query from one day to 15 days transactions. But as you rightly pointed , the partitions can only span from midnight to midnight in one timezone, and thus users who queries the data from another time zone will mostly scan two partitions (even if they just queries one days transaction data in their own timezone). And I don't see an easy solution for this , which will help all users across all time zones to scan only a single partition in the database, when they queries data for a single transaction date. And thus my question was, is it necessary to have the creation of partitions to happen on UTC time zone only? and then whatever transaction data inserted by the users from respective time zones will be stored in the database as is and will be queried based on the user timezone (it may span across multiple partitions though for a single user transaction date).
Re: Seeing high query planning time on Azure Postgres Single Server version 11.
On Sun, Mar 10, 2024 at 1:34 PM Greg Sabino Mullane wrote: > > On Sat, Mar 9, 2024 at 1:57 PM hassan rafi > wrote: > >> Would upgrading to the latest version of Postgres potentially solve the >> issue? >> > > Potentially, yes, but the only one who can answer that for sure is you. > Upgrade to 11.22 and re-run the query. Worst case scenario, it runs the > same speed but you gained yourself a bunch of bugfixes and CVE resolutions. > If the problem persists on 11.22, spin up a Postgres 16, load the data, and > test it there. > We have a similar situation with 9.6.24. 14.10 is noticeably faster (between 10% and 80%, depending on the query.
Re: Question related to partitioning with pg_partman
On 3/10/24 10:51, sud wrote: On Sun, Mar 10, 2024 at 10:32 PM Adrian Klaver mailto:adrian.kla...@aklaver.com>> wrote: On 3/10/24 05:12, sud wrote: > > In my example in the first post, I see, if someone connected to a RDS > Postgres database and run the create partition command using pg_partman > by setting the timezone as "UTC", the 7th march partition looks to be > spanned from "7th march midnight" to "8th march midnight", when queried > the partition_experession from the data dictionary view. Which is correct. > > And same information if someone querying by setting the timezone as EST > is showing spanning from "6th march 7PM" to "7th March 7PM". And this > can cause sometimes the partition may shift to other days all together. > Similar differences happen if creating the partitions using EST timezone > initially and then querying the data dictionary from UTC timezone. The above is at odds with your example below which has the correct values: 2024-03-07 00:00:00+00 = 2024-03-06 19:00:00-05 > > So my question was, if in these types of scenarios, we should follow a > standard approach of setting the timezone as UTC in such a type of > global user use case, while the system can persist data from multiple > users sitting across different time zones? So that the boundary(start > and end time) of each of the range partitions will be set as consistent > in one timezone across all the partitioned tables? You need to first determine what your time frames are going to be? 1) Midnight to Midnight in UTC will be consistent when viewed in UTC. It will not be when viewed in other time zone +/- the offset from UTC. 2) Or Midnight to Midnight in the users time zone, in which case the UTC values will differ. You have to decide which of the above is your goal. The bottom line is by definition the local wall clock time will not equal UTC, GMT excepted. This comes down to what the purpose of the partitions are? In other words how do you want to organize the data? > > And even while inserting the data , should we set the timezone to first > UTC and do the data load ? > > *** Partition created by pg_partman by setting timezone as UTC > *** > > *UTC* > *Partition_name Partition_expression* > test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00+00') TO > ('2024-03-08 00:00:00+00') > > when queried the partition_expression using EST .. > > *EST* > *Partition_name Partition_expression* > test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-06 19:00:00-05') TO > ('2024-03-07 19:00:00-05') > > > *** Partition created by pg_partman by setting timezone as EST > *** > > *EST* > *Partition_name Partition_expression* > test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00-05') > TO ('2024-03-08 00:00:00-05') > > when queried the partition_expression using UTC .. > > *UTC* > *Partition_name Partition_expression* > test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 05:00:00+00') > TO ('2024-03-08 05:00:00+00') > > *** > > Also i see both the "setting" and "reset_val" is showing as local > timezone only. If we set the timezone to a different value than the > local timezone then it gets updated on the "setting". Our requirement is to have the transaction table partitioned by range daily on the transaction_date column(i.e one midnight to next midnight transaction data in one partition). Transaction date column will be of timestamptz data type. And this application/database might be consuming data from users across multiple time zones in future. These tables will be queried based on the date range (minimum being ~1 transaction day) and also will be purged one day partition. So for above I understand , it might not be possible to keep the users data restricted to one day partition in the table considering the users will perform transactions across multiple timezones, but we are thinking of restricting the database with UTC timezone irrespective of the users. And thus during creating the table partitions , we need to ensure the UTC timezone is set , such that the upper and lower boundary for the daily range partitions remains consistent for all. Correct me if my understanding is wrong. 1) The partition will be across one day(24 hours) it is just the times may confuse people. Per you example 2024-03-07 00:00:00+00 is the same time as 2024-03-06 19:00:00-05 for EST. The issue is that the +00 and -05 maybe ignored. Also it depends
Re: Question related to partitioning with pg_partman
On Sun, Mar 10, 2024 at 10:32 PM Adrian Klaver wrote: > On 3/10/24 05:12, sud wrote: > > > > In my example in the first post, I see, if someone connected to a RDS > > Postgres database and run the create partition command using pg_partman > > by setting the timezone as "UTC", the 7th march partition looks to be > > spanned from "7th march midnight" to "8th march midnight", when queried > > the partition_experession from the data dictionary view. Which is > correct. > > > > And same information if someone querying by setting the timezone as EST > > is showing spanning from "6th march 7PM" to "7th March 7PM". And this > > can cause sometimes the partition may shift to other days all together. > > Similar differences happen if creating the partitions using EST timezone > > initially and then querying the data dictionary from UTC timezone. > > The above is at odds with your example below which has the correct values: > > 2024-03-07 00:00:00+00 = 2024-03-06 19:00:00-05 > > > > > So my question was, if in these types of scenarios, we should follow a > > standard approach of setting the timezone as UTC in such a type of > > global user use case, while the system can persist data from multiple > > users sitting across different time zones? So that the boundary(start > > and end time) of each of the range partitions will be set as consistent > > in one timezone across all the partitioned tables? > > You need to first determine what your time frames are going to be? > > 1) Midnight to Midnight in UTC will be consistent when viewed in UTC. It > will not be when viewed in other time zone +/- the offset from UTC. > > 2) Or Midnight to Midnight in the users time zone, in which case the UTC > values will differ. > > You have to decide which of the above is your goal. The bottom line is > by definition the local wall clock time will not equal UTC, GMT > excepted. This comes down to what the purpose of the partitions are? In > other words how do you want to organize the data? > > > > > And even while inserting the data , should we set the timezone to first > > UTC and do the data load ? > > > > > > *** Partition created by pg_partman by setting timezone as UTC > > *** > > > > *UTC* > > *Partition_name Partition_expression* > > test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00+00') TO > > ('2024-03-08 00:00:00+00') > > > > when queried the partition_expression using EST .. > > > > *EST* > > *Partition_name Partition_expression* > > test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-06 19:00:00-05') TO > > ('2024-03-07 19:00:00-05') > > > > > > *** Partition created by pg_partman by setting timezone as EST > > *** > > > > *EST* > > *Partition_name Partition_expression* > > test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00-05') > > TO ('2024-03-08 00:00:00-05') > > > > when queried the partition_expression using UTC .. > > > > *UTC* > > *Partition_name Partition_expression* > > test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 05:00:00+00') > > TO ('2024-03-08 05:00:00+00') > > > > *** > > > > Also i see both the "setting" and "reset_val" is showing as local > > timezone only. If we set the timezone to a different value than the > > local timezone then it gets updated on the "setting". > > Our requirement is to have the transaction table partitioned by range daily on the transaction_date column(i.e one midnight to next midnight transaction data in one partition). Transaction date column will be of timestamptz data type. And this application/database might be consuming data from users across multiple time zones in future. These tables will be queried based on the date range (minimum being ~1 transaction day) and also will be purged one day partition. So for above I understand , it might not be possible to keep the users data restricted to one day partition in the table considering the users will perform transactions across multiple timezones, but we are thinking of restricting the database with UTC timezone irrespective of the users. And thus during creating the table partitions , we need to ensure the UTC timezone is set , such that the upper and lower boundary for the daily range partitions remains consistent for all. Correct me if my understanding is wrong.
Re: Seeing high query planning time on Azure Postgres Single Server version 11.
On Sat, Mar 9, 2024 at 1:57 PM hassan rafi wrote: > Would upgrading to the latest version of Postgres potentially solve the > issue? > Potentially, yes, but the only one who can answer that for sure is you. Upgrade to 11.22 and re-run the query. Worst case scenario, it runs the same speed but you gained yourself a bunch of bugfixes and CVE resolutions. If the problem persists on 11.22, spin up a Postgres 16, load the data, and test it there. Cheers, Greg
Re: Question related to partitioning with pg_partman
On 3/10/24 05:12, sud wrote: '2024-03-07 00:00:00+00' and '2024-03-06 19:00:00-05' are the same time as is '2024-03-07 00:00:00-05' and '2024-03-07 05:00:00+00'. Still I would think for sanity sake you would want to stick with UTC. Thank you so much Adrian. In my example in the first post, I see, if someone connected to a RDS Postgres database and run the create partition command using pg_partman by setting the timezone as "UTC", the 7th march partition looks to be spanned from "7th march midnight" to "8th march midnight", when queried the partition_experession from the data dictionary view. Which is correct. And same information if someone querying by setting the timezone as EST is showing spanning from "6th march 7PM" to "7th March 7PM". And this can cause sometimes the partition may shift to other days all together. Similar differences happen if creating the partitions using EST timezone initially and then querying the data dictionary from UTC timezone. The above is at odds with your example below which has the correct values: 2024-03-07 00:00:00+00 = 2024-03-06 19:00:00-05 So my question was, if in these types of scenarios, we should follow a standard approach of setting the timezone as UTC in such a type of global user use case, while the system can persist data from multiple users sitting across different time zones? So that the boundary(start and end time) of each of the range partitions will be set as consistent in one timezone across all the partitioned tables? You need to first determine what your time frames are going to be? 1) Midnight to Midnight in UTC will be consistent when viewed in UTC. It will not be when viewed in other time zone +/- the offset from UTC. 2) Or Midnight to Midnight in the users time zone, in which case the UTC values will differ. You have to decide which of the above is your goal. The bottom line is by definition the local wall clock time will not equal UTC, GMT excepted. This comes down to what the purpose of the partitions are? In other words how do you want to organize the data? And even while inserting the data , should we set the timezone to first UTC and do the data load ? *** Partition created by pg_partman by setting timezone as UTC *** *UTC* *Partition_name Partition_expression* test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00+00') TO ('2024-03-08 00:00:00+00') when queried the partition_expression using EST .. *EST* *Partition_name Partition_expression* test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-06 19:00:00-05') TO ('2024-03-07 19:00:00-05') *** Partition created by pg_partman by setting timezone as EST *** *EST* *Partition_name Partition_expression* test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00-05') TO ('2024-03-08 00:00:00-05') when queried the partition_expression using UTC .. *UTC* *Partition_name Partition_expression* test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 05:00:00+00') TO ('2024-03-08 05:00:00+00') *** Also i see both the "setting" and "reset_val" is showing as local timezone only. If we set the timezone to a different value than the local timezone then it gets updated on the "setting". Regards Sud -- Adrian Klaver adrian.kla...@aklaver.com
walsender RAM increases by 500 MB while data is 80 MB
Hi Experts, Your input is most welcome! We are using Postgres 13 (and plan to upgrade to 15 soon). We have logical replication with about 40 servers subscribing to one publisher. 40 Walsender processes are running on the publisher server. When we insert a row into a table holding binary data the walsender RAM usage increases by 500MB although the row binary data is only 80MB. We see this increase in all walsender processes. At some point we got OOM and the process was killed. 1. Why does the walsender increases by 500MB when the data change was only 80MB 2. Is some of the 500MB increase due to shared memory or each walsender has its own 500MB increase. I assume that if it was only in shared memory we would not have gotten OOM... 3. Why when logical_decoding_work_mem = 64MB the RAM is 15 times that size? Shouldn't any additional space be used from disk and not RAM? 4. Will adding streaming = on to publication "PUBLICATION pub WITH (streaming = on)" can alleviate the issue? 5. Are there configuration options that can resolve the RAM issue. It can be also in version 15 since we plan to upgrade soon. Thanks! IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.
Re: Question related to partitioning with pg_partman
On Sat, Mar 9, 2024 at 3:41 AM Adrian Klaver wrote: > On 3/8/24 00:23, sud wrote: > > > > Starting a new thread... > > > > Something interesting and not sure if its expected behaviour as below. > > We are also confused a bit here. > > > > In the below example we created two partitioned tables on timestamptz > > type columns with different time zones and the child partitions are > > created appropriately with boundaries as one mid night to next mid night > > of a day and so on. But when we change the time zone and query the data > > dictionary views again, it shows the start and end of the partition > > boundary as not midnights but different times of the day's values. > > > > So I was wondering if this can cause us any unforeseen issues in the > > long run while creating the partitions though partman and persisting the > > data into the tables from the end users then querying those and having > > queries properly partitioned pruned? > > or > > should we always set the local timezone as UTC always before running or > > calling the pg_partman/pg_cron process which creates the partitions? > > Mainly in a database which serves global users sitting across multiple > > timezones. And same thing while inserting data into the table, we should > > use UTC timezone conversion function. Can you please confirm. > > '2024-03-07 00:00:00+00' and '2024-03-06 19:00:00-05' are the same time > as is '2024-03-07 00:00:00-05' and '2024-03-07 05:00:00+00'. > > Still I would think for sanity sake you would want to stick with UTC. > > Thank you so much Adrian. In my example in the first post, I see, if someone connected to a RDS Postgres database and run the create partition command using pg_partman by setting the timezone as "UTC", the 7th march partition looks to be spanned from "7th march midnight" to "8th march midnight", when queried the partition_experession from the data dictionary view. Which is correct. And same information if someone querying by setting the timezone as EST is showing spanning from "6th march 7PM" to "7th March 7PM". And this can cause sometimes the partition may shift to other days all together. Similar differences happen if creating the partitions using EST timezone initially and then querying the data dictionary from UTC timezone. So my question was, if in these types of scenarios, we should follow a standard approach of setting the timezone as UTC in such a type of global user use case, while the system can persist data from multiple users sitting across different time zones? So that the boundary(start and end time) of each of the range partitions will be set as consistent in one timezone across all the partitioned tables? And even while inserting the data , should we set the timezone to first UTC and do the data load ? *** Partition created by pg_partman by setting timezone as UTC *** *UTC* *Partition_name Partition_expression* test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00+00') TO ('2024-03-08 00:00:00+00') when queried the partition_expression using EST .. *EST* *Partition_name Partition_expression* test_timestamp_p2024_03_07 FOR VALUES FROM ('2024-03-06 19:00:00-05') TO ('2024-03-07 19:00:00-05') *** Partition created by pg_partman by setting timezone as EST *** *EST* *Partition_name Partition_expression* test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 00:00:00-05') TO ('2024-03-08 00:00:00-05') when queried the partition_expression using UTC .. *UTC* *Partition_name Partition_expression* test_timestamp2_p2024_03_07 FOR VALUES FROM ('2024-03-07 05:00:00+00') TO ('2024-03-08 05:00:00+00') *** Also i see both the "setting" and "reset_val" is showing as local timezone only. If we set the timezone to a different value than the local timezone then it gets updated on the "setting". Regards Sud
Re: Query plans not identical for `id in(1,2,3)` and `(id=1 or id=2 or id=3)`
Ilya Basin schrieb am 09.03.2024 um 20:08: Hi List. I have a list of bigint keys and I need to retrieve rows by these keys. Normally, I would split this list into pages of size 900 and perform several `SELECT ... WHERE key in($1,$2,...)`. However, the proprietary ORM we use can only produce this SQL: `SELECT ... WHERE (key=$1 or key=$2 or ...)`. Surprisingly, PostgreSQL planner treats these two SQLs differently: - ```select * FROM "audittrail$referencelogline" where id in ( 1 , 2 , 3 )``` Index Scan https://i.stack.imgur.com/dr8oz.png - ```select * FROM "audittrail$referencelogline" where id = 1 or id = 2 or id = 3``` A lot of "Bitmap Index Scan" for each value https://i.stack.imgur.com/dnErs.png Is it possible to configure PostgreSQL 12.16 to treat the second query as the first? Can you convince your obfuscation layer to send an array value (containing all IDs) and change the query to: select * FROM "audittrail$referencelogline" where id = any(?)
Re: Query plans not identical for `id in(1,2,3)` and `(id=1 or id=2 or id=3)`
On Sat, 2024-03-09 at 23:08 +0400, Ilya Basin wrote: > I have a list of bigint keys and I need to retrieve rows by these keys. > Normally, I would split this list into pages of size 900 and perform > several `SELECT ... WHERE key in($1,$2,...)`. However, the proprietary > ORM we use can only produce this SQL: > `SELECT ... WHERE (key=$1 or key=$2 or ...)`. > Surprisingly, PostgreSQL planner treats these two SQLs differently: > > Is it possible to configure PostgreSQL 12.16 to treat the second query as the > first? No, that is currently not possible. Yours, Laurenz Albe