Re: ERROR: found xmin from before relfrozenxid
It seems that the version of the db is 9.6.10 : psql -U db -d db -c "select version()"; Password for user db: version --- PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit (1 row) and the error is still exist.. בתאריך שבת, 26 בינו׳ 2019 ב-12:59 מאת Adrien NAYRAT < adrien.nay...@anayrat.info>: > On 1/26/19 11:56 AM, Mariel Cherkassky wrote: > > Update to the minor version should be an easy solution - yum update > > postgresql . What did you mean by carful > > > Sorry, I meant, do not forget to apply update to be sure same bug do not > happen again. >
Re: Will higher shared_buffers improve tpcb-like benchmarks?
I did one final test of increasing the shared_buffers=32GB. It seems to be having no impact on TPS (in fact, if I look closely there is a 10-15% **negative** impact on the TPS compared to shared_buffers=2G) I can confirm that **almost** the entire DB has been cached in the shared_buffers: relname | buffered | buffers_percent | percent_of_relation -++-+- pgbench_accounts | 24 GB |74.5 | 93.9 pgbench_accounts_pkey| 4284 MB|13.1 | 100.0 pgbench_history | 134 MB | 0.4 | 95.8 pg_aggregate | 8192 bytes | 0.0 | 50.0 pg_amproc| 32 kB | 0.0 | 100.0 pg_cast | 16 kB | 0.0 | 100.0 pg_amop | 48 kB | 0.0 | 85.7 pg_depend| 96 kB | 0.0 | 18.8 pg_index | 40 kB | 0.0 | 125.0 pg_namespace | 8192 bytes | 0.0 | 100.0 pg_opclass | 24 kB | 0.0 | 100.0 pg_operator | 96 kB | 0.0 | 75.0 pg_rewrite | 24 kB | 0.0 | 25.0 pg_statistic | 176 kB | 0.0 | 75.9 pg_aggregate_fnoid_index | 16 kB | 0.0 | 100.0 pg_trigger | 40 kB | 0.0 | 500.0 pg_amop_fam_strat_index | 24 kB | 0.0 | 60.0 pg_amop_opr_fam_index| 32 kB | 0.0 | 80.0 pg_amproc_fam_proc_index | 24 kB | 0.0 | 75.0 pg_constraint| 24 kB | 0.0 | 150.0 And I think now I give up. I don't think I understand how PG perf tuning works and what impact shared_buffers has on perf. I'll just run my DB in production with default settings and hope no one complains about the system being slow! -- Saurabh. On Tue, Jan 29, 2019 at 11:40 PM Saurabh Nanda wrote: > That is likely correct, but the data will likely be stored in the OS file >> cache, so reading it from there will still be pretty fast. >> > > Right -- but increasing shared_buffers won't increase my TPS, right? Btw, > I just realised that irrespective of shared_buffers, my entire DB is > already in memory (DB size=30GB, RAM=64GB). I think the following output > from iotop confirms this. All throughout the benchmarking > (client=1,4,8,12,24,48,96), the *disk read* values remain zero! > > Total DISK READ : 0.00 B/s | Total DISK WRITE : 73.93 M/s > Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 43.69 M/s > > > > Could this explain why my TPS numbers are not changing no matter how much > I fiddle with the Postgres configuration? > > If my hypothesis is correct, increasing the pgbench scale to get a 200GB > database would immediately show different results, right? > > -- Saurabh. > -- http://www.saurabhnanda.com
Re: dsa_allocate() faliure
On Tue, Jan 29, 2019 at 10:32 PM Fabio Isabettini wrote: > we are facing a similar issue on a Production system using a Postgresql 10.6: > > org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ; ID: > EXCEPTION on getstatistics_media ; ID: uidatareader. > run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7 free > pages > We would like not to stop the Production system and upgrade it to PG11. And > even though would this guarantee a permanent fix? > Any suggestion? Hi Fabio, Thanks for your report. Could you please also show the query plan that runs on the "remote" node (where the error occurred)? There is no indication that upgrading to PG11 would help here. It seems we have an undiagnosed bug (in 10 and 11), and so far no one has been able to reproduce it at will. I personally have chewed a lot of CPU time on several machines trying various plan shapes and not seen this or the possibly related symptom from bug #15585 even once. But we have about three reports of each of the two symptoms. One reporter wrote to me off-list to say that they'd seen #15585 twice, the second time by running the same query in a tight loop for 8 hours, and then not seen it again in the past 3 weeks. Clearly there is issue needing a fix here, but I don't yet know what it is. -- Thomas Munro http://www.enterprisedb.com
Re: How can sort performance be so different
Bob Jolliffe writes: > I wonder what can cause such a massive discrepancy in the sort time. Are you using the same locale (LC_COLLATE) setting on both machines? Some locales sort way slower than C locale does. That's not enough to explain a 1000X discrepancy --- I concur with the other opinions that there's something wrong with your VPS --- but it might account for something like 10X of it. regards, tom lane
Re: How can sort performance be so different
Run https://github.com/n-st/nench and benchmark the underlying vps first. On Tue 29 Jan, 2019, 11:59 PM Bob Jolliffe The following is output from analyzing a simple query on a table of > 13436 rows on postgresql 10, ubuntu 18.04. > > explain analyze select * from chart order by name; >QUERY PLAN > > - > Sort (cost=1470.65..1504.24 rows=13436 width=725) (actual > time=224340.949..224343.499 rows=13436 loops=1) >Sort Key: name >Sort Method: quicksort Memory: 4977kB >-> Seq Scan on chart (cost=0.00..549.36 rows=13436 width=725) > (actual time=0.015..1.395 rows=13436 loops=1) > Planning time: 0.865 ms > Execution time: 224344.281 ms > (6 rows) > > The planner has predictably done a sequential scan followed by a sort. > Though it might have wished it hadn't and just used the index (there > is an index on name). The sort is taking a mind boggling 224 seconds, > nearly 2 minutes. > > This is on a cloud vps server. > > Interesting when I run the same query on my laptop it completes in > well under one second. > > I wonder what can cause such a massive discrepancy in the sort time. > Can it be that the VPS server has heavily over committed CPU. Note I > have tried this with 2 different company's servers with similar > results. > > I am baffled. The sort seems to be all done in memory (only 5MB). > Tested when nothing else was going on at the time. I can expect some > difference between the VPS and my laptop, but almost 1000x seems odd. > The CPUs are different but not that different. > > Any theories? > > Regards > Bob > >
Re: How can sort performance be so different
út 29. 1. 2019 v 19:29 odesílatel Bob Jolliffe napsal: > The following is output from analyzing a simple query on a table of > 13436 rows on postgresql 10, ubuntu 18.04. > > explain analyze select * from chart order by name; >QUERY PLAN > > - > Sort (cost=1470.65..1504.24 rows=13436 width=725) (actual > time=224340.949..224343.499 rows=13436 loops=1) >Sort Key: name >Sort Method: quicksort Memory: 4977kB >-> Seq Scan on chart (cost=0.00..549.36 rows=13436 width=725) > (actual time=0.015..1.395 rows=13436 loops=1) > Planning time: 0.865 ms > Execution time: 224344.281 ms > (6 rows) > > The planner has predictably done a sequential scan followed by a sort. > Though it might have wished it hadn't and just used the index (there > is an index on name). The sort is taking a mind boggling 224 seconds, > nearly 2 minutes. > > This is on a cloud vps server. > > Interesting when I run the same query on my laptop it completes in > well under one second. > > I wonder what can cause such a massive discrepancy in the sort time. > Can it be that the VPS server has heavily over committed CPU. Note I > have tried this with 2 different company's servers with similar > results. > > I am baffled. The sort seems to be all done in memory (only 5MB). > Tested when nothing else was going on at the time. I can expect some > difference between the VPS and my laptop, but almost 1000x seems odd. > The CPUs are different but not that different. > > Any theories? > I am sure so sort of 10K rows cannot be 224sec. Really looks like VPS issue. Regards Pavel > Regards > Bob > >
Re: Interpreting shared_buffers setting
Bob Jolliffe writes: > Excuse me if this is a silly question. I am trying to fiddle with > shared_buffers setting on postgresql 10.6 on ubuntu 18.04 server. > > I have this at bottom of my config file: > shared_buffers = 1GB > > Yet when I check the setting from pg_setting I see something quite different: > > postgres=# SELECT name, setting FROM pg_settings where name = > 'shared_buffers'; > name | setting > +- > shared_buffers | 131072 Why not use the show command which is good about output in human terms... psql (11.1 (Ubuntu 11.1-1.pgdg16.04+1)) Type "help" for help. meta_a:postgres# select name, setting from pg_settings where name = 'shared_buffers'); ERROR: syntax error at or near ")" LINE 1: ...me, setting from pg_settings where name = 'shared_buffers'); ^ meta_a:postgres# > > Is this a question of units? It looks like 128M. Note when I change > the setting to 2GB in conf file I see 262144 from pg_setting. I am > now unsure what the actual shared_buffers allocation is. I cant see > anything in the docs which tells me how to interpret the integer. > > Any clarification welcome. > > Regards > Bob > > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consult...@comcast.net
How can sort performance be so different
The following is output from analyzing a simple query on a table of 13436 rows on postgresql 10, ubuntu 18.04. explain analyze select * from chart order by name; QUERY PLAN - Sort (cost=1470.65..1504.24 rows=13436 width=725) (actual time=224340.949..224343.499 rows=13436 loops=1) Sort Key: name Sort Method: quicksort Memory: 4977kB -> Seq Scan on chart (cost=0.00..549.36 rows=13436 width=725) (actual time=0.015..1.395 rows=13436 loops=1) Planning time: 0.865 ms Execution time: 224344.281 ms (6 rows) The planner has predictably done a sequential scan followed by a sort. Though it might have wished it hadn't and just used the index (there is an index on name). The sort is taking a mind boggling 224 seconds, nearly 2 minutes. This is on a cloud vps server. Interesting when I run the same query on my laptop it completes in well under one second. I wonder what can cause such a massive discrepancy in the sort time. Can it be that the VPS server has heavily over committed CPU. Note I have tried this with 2 different company's servers with similar results. I am baffled. The sort seems to be all done in memory (only 5MB). Tested when nothing else was going on at the time. I can expect some difference between the VPS and my laptop, but almost 1000x seems odd. The CPUs are different but not that different. Any theories? Regards Bob
Re: Will higher shared_buffers improve tpcb-like benchmarks?
> > That is likely correct, but the data will likely be stored in the OS file > cache, so reading it from there will still be pretty fast. > Right -- but increasing shared_buffers won't increase my TPS, right? Btw, I just realised that irrespective of shared_buffers, my entire DB is already in memory (DB size=30GB, RAM=64GB). I think the following output from iotop confirms this. All throughout the benchmarking (client=1,4,8,12,24,48,96), the *disk read* values remain zero! Total DISK READ : 0.00 B/s | Total DISK WRITE : 73.93 M/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 43.69 M/s Could this explain why my TPS numbers are not changing no matter how much I fiddle with the Postgres configuration? If my hypothesis is correct, increasing the pgbench scale to get a 200GB database would immediately show different results, right? -- Saurabh.
Re: Will higher shared_buffers improve tpcb-like benchmarks?
On Tue, Jan 29, 2019 at 6:39 AM Saurabh Nanda wrote: > Hi, > > I'm going crazy trying to optimise my Postgres config for a production > setting [1] Once I realised random changes weren't getting my anywhere, I > finally purchased PostgreSQL 10 - Higher Performance [2] and understood the > impact of shared_buffers. > > IIUC, shared_buffers won't have any significant impact in the following > scenario, right? > > -- DB size = 30GB > -- shared_buffers = 2GB > -- workload = tpcb-like > > This is because the tpcb-like workload selects & updates random rows from > the DB [3]. Therefore, with a 2GB shared buffer, there is only a 6-7% > chance (did I get my probability correct?) that the required data will be > in the shared_buffer. Did I understand this correctly? > That is likely correct, but the data will likely be stored in the OS file cache, so reading it from there will still be pretty fast. > > If nothing else becomes the bottleneck (eg. periodically writing dirty > pages to disk), increasing the shared_buffers to 15GB+ should have a > significant impact, for this DB-size and workload, right? (The system has > 64 GB RAM) > About the only way to know for sure that writing dirty data is not the bottleneck is to use a read only benchmark, such as the -S flag for pgbench. And at that point, the IPC overhead between pgbench and the backend, even when both are running on the same machine, is likely to be the bottleneck. And after that, the bottleneck might shift to opening and closing transactions and taking and releasing locks[1]. If you overcome that, then you might reliably see a difference between 2GB and 15GB of shared buffers, because at 2GB each query to pgbench_accounts is likely to fetch 2 pages into shared_buffers from the OS cache: the index leaf page for pgbench_accounts_pkey, and the table page for pgbench_accounts. At 15GB, the entire index should be reliably in shared_buffers (after enough warm-up time), so you would only need to fetch 1 page, and often not even that. Cheers, Jeff [1] I have a very old patch to pgbench that introduces a new query to overcome this, https://www.postgresql.org/message-id/BANLkTi%3DQBYOM%2Bzj%3DReQeiEKDyVpKUtHm6Q%40mail.gmail.com . I don't know how much work it would be to get it to compile against newer versions--I stopped maintaining it because it became too much work to rebase it past conflicting work, and because I lost interest in this line of research.
Re: pg_locks - what is a virtualxid locktype
On 2019-Jan-29, Shreeyansh Dba wrote: > The virtualxid lock is special. It’s a exclusive lock on the transaction’s > own virtual transaction ID that every transaction always holds. No other > transaction can ever acquire it while the transaction is running. > The purpose of this is to allow one transaction to wait until another > transaction commits or rolls back using PostgreSQL’s locking mechanism, and > it’s used internally. A little more specific than that: it's used when some process (such as CREATE INDEX CONCURRENTLY) needs to wait even on sessions that might be read-only. Such transactions don't have transaction-ids that write transactions have, which is why the only way is to wait on the virtual transaction-id. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Interpreting shared_buffers setting
Thank you Andrew and Thomas. All is now clear :-) On Tue, 29 Jan 2019 at 13:07, Andrew Gierth wrote: > > > "Bob" == Bob Jolliffe writes: > > Bob> Excuse me if this is a silly question. I am trying to fiddle with > Bob> shared_buffers setting on postgresql 10.6 on ubuntu 18.04 server. > > Bob> I have this at bottom of my config file: > Bob> shared_buffers = 1GB > > Bob> Yet when I check the setting from pg_setting I see something quite > Bob> different: > > Bob> postgres=# SELECT name, setting FROM pg_settings where name = > 'shared_buffers'; > Bob> name | setting > Bob> +- > Bob> shared_buffers | 131072 > > pg_settings can tell you more than you asked for: > > postgres=# select * from pg_settings where name='shared_buffers'; > -[ RECORD 1 ]---+- > name| shared_buffers > setting | 16384 > unit| 8kB > category| Resource Usage / Memory > short_desc | Sets the number of shared memory buffers used by the server. > extra_desc | > context | postmaster > vartype | integer > source | configuration file > min_val | 16 > max_val | 1073741823 > enumvals| > boot_val| 1024 > reset_val | 16384 > sourcefile | /home/andrew/work/pgsql/inst/9.5/data/postgresql.conf > sourceline | 113 > pending_restart | f > > notice that "unit 8kB" line; the displayed integer value is in units of > 8kB (which is the block size your server was compiled with, which you > can also see as the block_size parameter). > > -- > Andrew (irc:RhodiumToad)
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
On Mon, Jan 28, 2019 at 12:03 AM Saurabh Nanda wrote: > All this benchmarking has led me to a philosophical question, why does PG > need shared_buffers in the first place? > PostgreSQL cannot let the OS get its hands on a dirty shared buffer until the WAL record "protecting" that buffer has been flushed to disk. If a dirty shared buffer got written to disk, but then a crash happened before the WAL record go flushed to disk, then the data could be corrupted when it comes back up. So shared_buffers effectively serves as cooling pond where dirty buffers wait for their WAL to be flushed naturally so they can be written without instigating a performance-reducing flush just for them. Also, concurrent clients needs to access the same disk pages at overlapping times without corrupting each other. Perhaps that could be implemented to have just the buffer headers in shared memory to coordinate the locking, and not having the buffers themselves in shared memory. But that is not how it is currently implemented. > What's wrong with letting the OS do the caching/buffering? > Nothing, and that is what it does. Which is why the advice for shared_buffers is often to use a small fraction of RAM, leaving the rest for the OS to do its thing. But PostgreSQL still needs a way to lock those pages, both against concurrent access by its own clients, and against getting flushed out of order by the OS. There is no performant way to release the dirty pages immediately to the OS while still constraining the order in which the OS flushes them to disk. Finally, while reading a page from the OS cache into shared_buffers is much faster than reading it from disk, it is still much slower than finding it already located in shared_buffers. So if your entire database fits in RAM, you will get better performance if shared_buffers is large enough for the entire thing to fit in there, as well. This is an exception to the rule that shared_buffers should be a small fraction of RAM. > Isn't it optimised for this kind of stuff? > Maybe. But you might be surprised at poorly optimized it is. It depends on your OS and version of it, of course. If you have a high usage_count buffer which is re-dirtied constantly, it will only get written and flushed to disk once per checkpoint if under PostgreSQL control. But I've seen pages like that get written many times per second under kernel control. Whatever optimization it tried to do, it wasn't very good at. Also, if many contiguous pages are dirtied in a close time-frame, but not dirtied in their physical order, the kernel should be able to re-order them into long sequential writes, correct? But empirically, it doesn't, at least back in the late 2.* series kernels when I did the experiments. I don't know if it didn't even try, or tried but failed. (Of course back then, PostgreSQL didn't do a good job of it either) Cheers, Jeff
Re: Interpreting shared_buffers setting
> "Bob" == Bob Jolliffe writes: Bob> Excuse me if this is a silly question. I am trying to fiddle with Bob> shared_buffers setting on postgresql 10.6 on ubuntu 18.04 server. Bob> I have this at bottom of my config file: Bob> shared_buffers = 1GB Bob> Yet when I check the setting from pg_setting I see something quite Bob> different: Bob> postgres=# SELECT name, setting FROM pg_settings where name = 'shared_buffers'; Bob> name | setting Bob> +- Bob> shared_buffers | 131072 pg_settings can tell you more than you asked for: postgres=# select * from pg_settings where name='shared_buffers'; -[ RECORD 1 ]---+- name| shared_buffers setting | 16384 unit| 8kB category| Resource Usage / Memory short_desc | Sets the number of shared memory buffers used by the server. extra_desc | context | postmaster vartype | integer source | configuration file min_val | 16 max_val | 1073741823 enumvals| boot_val| 1024 reset_val | 16384 sourcefile | /home/andrew/work/pgsql/inst/9.5/data/postgresql.conf sourceline | 113 pending_restart | f notice that "unit 8kB" line; the displayed integer value is in units of 8kB (which is the block size your server was compiled with, which you can also see as the block_size parameter). -- Andrew (irc:RhodiumToad)
Re: Interpreting shared_buffers setting
Hi, check for blocksize (8k) as factor. 8k*131072=1G regards Thomas Am 29.01.19 um 13:32 schrieb Bob Jolliffe: Excuse me if this is a silly question. I am trying to fiddle with shared_buffers setting on postgresql 10.6 on ubuntu 18.04 server. I have this at bottom of my config file: shared_buffers = 1GB Yet when I check the setting from pg_setting I see something quite different: postgres=# SELECT name, setting FROM pg_settings where name = 'shared_buffers'; name | setting +- shared_buffers | 131072 Is this a question of units? It looks like 128M. Note when I change the setting to 2GB in conf file I see 262144 from pg_setting. I am now unsure what the actual shared_buffers allocation is. I cant see anything in the docs which tells me how to interpret the integer. Any clarification welcome. Regards Bob
Interpreting shared_buffers setting
Excuse me if this is a silly question. I am trying to fiddle with shared_buffers setting on postgresql 10.6 on ubuntu 18.04 server. I have this at bottom of my config file: shared_buffers = 1GB Yet when I check the setting from pg_setting I see something quite different: postgres=# SELECT name, setting FROM pg_settings where name = 'shared_buffers'; name | setting +- shared_buffers | 131072 Is this a question of units? It looks like 128M. Note when I change the setting to 2GB in conf file I see 262144 from pg_setting. I am now unsure what the actual shared_buffers allocation is. I cant see anything in the docs which tells me how to interpret the integer. Any clarification welcome. Regards Bob
Will higher shared_buffers improve tpcb-like benchmarks?
Hi, I'm going crazy trying to optimise my Postgres config for a production setting [1] Once I realised random changes weren't getting my anywhere, I finally purchased PostgreSQL 10 - Higher Performance [2] and understood the impact of shared_buffers. IIUC, shared_buffers won't have any significant impact in the following scenario, right? -- DB size = 30GB -- shared_buffers = 2GB -- workload = tpcb-like This is because the tpcb-like workload selects & updates random rows from the DB [3]. Therefore, with a 2GB shared buffer, there is only a 6-7% chance (did I get my probability correct?) that the required data will be in the shared_buffer. Did I understand this correctly? If nothing else becomes the bottleneck (eg. periodically writing dirty pages to disk), increasing the shared_buffers to 15GB+ should have a significant impact, for this DB-size and workload, right? (The system has 64 GB RAM) [1] Related thread at https://www.postgresql.org/message-id/flat/CAPz%3D2oGdmvirLNX5kys%2BuiY7LKzCP4sTiXXob39qq6eDkEuk2Q%40mail.gmail.com [2] https://www.packtpub.com/big-data-and-business-intelligence/postgresql-10-high-performance [3] https://www.postgresql.org/docs/11/pgbench.html#id-1.9.4.10.7.2 -- Saurabh.
Re: pg_locks - what is a virtualxid locktype
The virtualxid lock is special. It’s a exclusive lock on the transaction’s own virtual transaction ID that every transaction always holds. No other transaction can ever acquire it while the transaction is running. The purpose of this is to allow one transaction to wait until another transaction commits or rolls back using PostgreSQL’s locking mechanism, and it’s used internally. Thanks & Regards, *Shreeyansh DBA Team* www.shreeyansh.com On Tue, Jan 29, 2019 at 2:27 PM Mariel Cherkassky < mariel.cherkas...@gmail.com> wrote: > Hey, > I noticed that pg_locks has an addition row for every transaction that is > created with a locktype "virtualxid". Tried to search it online but I didnt > find an explanation for this behavior. Does anyone can explain why it > happens ? >
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
Le 29/01/2019 à 07:15, Saurabh Nanda a écrit : c) I tried another cloud hosting provider (E2E Networks) and just the raw performance numbers (with default configuration) are blowing Hetzner out of the water. I noticed that on E2E, the root filesystem is mounted with the following options: /dev/xvda on / type ext4 (rw,noatime,nodiratime,nobarrier,errors=remount-ro,stripe=512,data=ordered) whereas on Hetzner, it is mounted with the following options: /dev/nvme0n1p3 on / type ext4 (rw,relatime,data=ordered) How much of a difference can this have on absolute TPS numbers? Differences can be significative. noatime does not update inode access time, while relatime updates the inode access time if the change time was before access time (which can be often the case for a database) nobarrier disable block-level write barriers. Barriers ensure that data is effectively stored on system, The man command says: "If disabled on a device with a volatile (non-battery-backed) write-back cache, the nobarrier option will lead to filesystem corruption on a system crash or power loss." You should probably consider noatime compared to relatime, and nobarriers depends if you have a battery or not Also, this is an SSD, so you should TRIM it, either with preiodical fstrim, or using the discard option Nicolas
pg_locks - what is a virtualxid locktype
Hey, I noticed that pg_locks has an addition row for every transaction that is created with a locktype "virtualxid". Tried to search it online but I didnt find an explanation for this behavior. Does anyone can explain why it happens ?