pgbouncer bug?
Not sure if this is the right place to post this, but if not someone please point me in the right direction. My issue is with pgbouncer 1.14. This does not seem to happen on 1.13. If I do a service pgbouncer restart, then anytime I try to connect to my databases via pgbouncer, I get ERROR: no such user regardless of what user I'm using. It's almost like it's not recognizing the auth_query I have configured. But then if I issue a reload, then it seems to work fine and I no longer get the user not found. The problem is easy enough to work around as I don't restart pgbouncer all that much, but it doesn't seem like this is probably the intended behavior. Thanks, Greig Wise [Full Disclosure: I accidentally posted this message at first to the "base" PostgreSQL group and am now moving to "PostgreSQL - general".] -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Table with many NULLS for indexed column yields strange query plan
Seqscans are not disabled. Also, this is PostgreSQL 10.11 if that helps. Costs are as follows: seq_page_cost --- 1 random_page_cost -- 1.5 It is odd that it does not just do a seqscan on table3. It's a very small table... only like 36 rows. I'd think the plan *should* seq scan table3, get the id where number = '', then use the index index_table2_on_table3_id on table2 to get the matching rows for that id. It does use that index when I specify that table3_id is not null, but not otherwise. table3_id is very selective into table2 for any non-null value, so I don't know why it would choose to scan that entire index in the case of the first query where the table3_id clearly can't be null due to the inner join. Check out this: select tablename, attname, inherited, null_frac, avg_width, n_distinct, most_common_vals, most_common_freqs from pg_stats where tablename = 'table2' and attname = 'table3_id'; tablename | attname | inherited | null_frac | avg_width | n_distinct | ---+--+---+---+---++ table2 | table3_id| f | 0.996167 | 8 | 39 | most_common_vals: {985,363,990,991,992,45,81,8,126,307,378,739,855,993,994,190,338,366,369,537,663,805,846,155,277,803,870,988} most_common_freqs: {0.00023,0.0002,0.0002,0.0002,0.0002,0.00017,0.00017,0.00013,0.00013,0.00013,0.00013,0.00013,0.00013,0.00013,0.00013,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,6.7e-05,6.7e-05,6.7e-05,6.7e-05,6.7e-05} Thanks again for any help. Greig -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Table with many NULLS for indexed column yields strange query plan
I have a query like this: SELECT "table1".* FROM "table1" INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id" INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE "table3"."number" = '' AND ("table2"."type") IN ('Standard') ; table2 has a large number of NULLS in the column table3_id. There is an index on this column. Here is the result of explain analyze: Merge Join (cost=1001.20..4076.67 rows=17278 width=167) (actual time=284.918..300.167 rows=2244 loops=1) Merge Cond: (table2.table3_id = table3.id) -> Gather Merge (cost=1000.93..787825.78 rows=621995 width=175) (actual time=5.786..283.269 rows=64397 loops=1) Workers Planned: 4 Workers Launched: 4 -> Nested Loop (cost=0.87..712740.12 rows=155499 width=175) (actual time=0.091..102.708 rows=13107 loops=5) -> Parallel Index Scan using index_table2_on_table3_id on table2 (cost=0.43..489653.08 rows=155499 width=16) (actual time=0.027..22.327 rows=13107 loops=5) Filter: ((type)::text = 'Standard'::text) -> Index Scan using table1_pk on table1 (cost=0.44..1.43 rows=1 width=167) (actual time=0.005..0.005 rows=1 loops=65535) Index Cond: (id = table2.table1_id) -> Index Scan using table3_pkey on table3 (cost=0.27..53.40 rows=1 width=8) (actual time=0.041..0.048 rows=1 loops=1) Filter: ((number)::text = ''::text) Rows Removed by Filter: 35 Planning time: 0.450 ms Execution time: 310.230 ms You can see the row estimate there is way off on the Parallel Index Scan. I suspect that this is because it's including the rows with null in the selectivity estimate even though the table3_id can't possibly be null here due to the inner join. If I modify the query like this: SELECT "table1".* FROM "table1" INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id" INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE "table3"."number" = '' AND ("table2"."type") IN ('Standard') and table3_id is not null; Just adding in table3_id is not null at the end there, I get a much better plan. Nested Loop (cost=1.14..290.04 rows=66 width=167) (actual time=0.058..11.258 rows=2244 loops=1) -> Nested Loop (cost=0.70..64.46 rows=66 width=8) (actual time=0.049..2.873 rows=2244 loops=1) -> Index Scan using table3_pkey on table3 (cost=0.27..53.40 rows=1 width=8) (actual time=0.030..0.035 rows=1 loops=1) Filter: ((number)::text = ''::text) Rows Removed by Filter: 35 -> Index Scan using index_table2_on_table3_id on table2 (cost=0.43..11.05 rows=1 width=16) (actual time=0.017..2.102 rows=2244 loops=1) Index Cond: ((table3_id = table3.id) AND (table3_id IS NOT NULL)) Filter: ((type)::text = 'Standard'::text) -> Index Scan using table1_pk on table1 (cost=0.44..3.42 rows=1 width=167) (actual time=0.003..0.003 rows=1 loops=2244) Index Cond: (id = table2. id) Planning time: 0.403 ms Execution time: 11.672 ms Can I do anything statistics wise so that I get a better plan here or do I have to modify the query. It seems kinda hacky that I would have to specify is not null on that column since like I said it can't possibly be null. Thanks, Greig Wise -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: BitmapAnd on correlated column?
>Extended statistics will tell PostgreSQL that it is very unlikely >that the first condition will contribute significantly, but that >is no proof that the condition can be omitted, so the optimizer >cannot just skip the condition. Granted it cannot skip the condition, but that doesn't mean that it has to use that second index. It's doing a recheck on the conditions anyway, right? Thanks. Greig -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: BitmapAnd on correlated column?
yes. where and is a typo. -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Out of Memory
All right.. one more thing here. Any suggestions for how to set overcommit on a postgres db server with 16 GB of RAM and no swap? I think I want vm.overcommit_memory = 2, as I understand that prevents the OOM killer from zapping me. Is 100% the right way to go for overcommit_ratio? Is there a drawback to this? Thanks again. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Out of Memory
I think I figured it out: vm.overcommit_memory = 2 vm.overcommit_ratio = 50 Only allows me to use 50% of my RAM... ugh! I have 16 GB, so when only 8 is left, I start seeing OOM. Will increase this setting and see if it helps. Thanks everyone for the help. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Out of Memory
Tom Lane-2 wrote > greigwise > greigwise@ > writes: >> Is it possible that the fact that my stack size is limited is what is >> causing my issue? > > No. If you were hitting that limit you'd get a message specifically > talking about stack. > > regards, tom lane Well, darn. I'm at a loss... any suggestions of what I should do next to troubleshoot this? Thanks. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Out of Memory
There is also this: -bash-4.2$ prlimit -p 6590 RESOURCE DESCRIPTION SOFT HARD UNITS AS address space limitunlimited unlimited bytes CORE max core file size 0 unlimited blocks CPUCPU time unlimited unlimited seconds DATA max data size unlimited unlimited bytes FSIZE max file size unlimited unlimited blocks LOCKS max number of file locks held unlimited unlimited MEMLOCKmax locked-in-memory address space 65536 65536 bytes MSGQUEUE max bytes in POSIX mqueues819200819200 bytes NICE max nice prio allowed to raise 0 0 NOFILE max number of open files1024 4096 NPROC max number of processes 4096 59341 RSSmax resident set size unlimited unlimited pages RTPRIO max real-time priority 0 0 RTTIME timeout for real-time tasksunlimited unlimited microsecs SIGPENDING max number of pending signals 59341 59341 STACK max stack size 8388608 unlimited bytes Is it possible that the fact that my stack size is limited is what is causing my issue? Thanks, Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Out of Memory
Tom Lane-2 wrote > greigwise > greigwise@ > writes: >> If I have nearly 8 GB of memory left, why am I getting out of memory >> errors? > > Probably the postmaster is running under restrictive ulimit settings. > > regards, tom lane If I login as the user which runs postmaster, I get this: -bash-4.2$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 59341 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Max memory size is unlimited as is virtual memory... is there something else there I should be concerned with in regard to out of memory? Thanks again. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Out of Memory
Well, I've been unsuccessful so far on creating a standalone test. I have put some scripting in place to capture some additional information on the server with the out of memory issues. I have a script which just periodically dumps the output of free -m to a text file. So, the output of free -m immediately before and after the out of memory error looks like this: Just before: totalusedfree shared buff/cache available Mem: 148772978 1323553 11766 7943 Swap: 0 0 0 Just after: totalusedfree shared buff/cache available Mem: 148772946 6493548 11280 7982 Swap: 0 0 0 If I have nearly 8 GB of memory left, why am I getting out of memory errors? Thanks, Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Out of Memory
Well, we are 64-bit I'll see if I can make some kind of self contained test to repeat it. Thanks, Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: Out of Memory
ulimit -a for the postgres user shows memory unlimited. numactl --hardware gives command not found. Thanks again. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Out of Memory
Hello, I'm running postgres 9.6.10 on Centos 7. Seeing the occasional out of memory error trying to run a query. In the logs I see something like this: Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks); 319665696 used 2018-09-20 18:08:01 UTC 5ba3e1a2.7a8a dbname ERROR: out of memory 2018-09-20 18:08:01 UTC 5ba3e1a2.7a8a dbname DETAIL: Failed on request of size 2016. If I have 142439136 free, then why am I failing on a request of size 2016? Am I misunderstanding here? Thanks, Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: pg_basebackup: could not receive data from WAL stream
I should also add that when it fails, it's always right at the very end of the backup when it's very nearly done or maybe even after it's done. Thanks again. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
pg_basebackup: could not receive data from WAL stream
Hello. On postgresql 10.5, my pg_basebackup is failing with this error: pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request In the postgres log files, I'm seeing: 2018-09-02 00:57:32 UTC bkp_user 5b8b278c.11c3f [unknown] LOG: terminating walsender process due to replication timeout I'm running the following command right on the database server itself: pg_basebackup -U repl -D /var/tmp/pg_basebackup_20180901 -Ft -z It seems to be an intermittent problem.. I've had it fail or succeed about 50/50. I even bumped up the wal_sender_timeout to 2000. One notable thing is that I'm running on an ec2 instance on AWS. Any advice would be helpful. Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Errors with physical replication
Hello. We are on Postgresql version 9.6.6. We have 2 EC2 instances in different Amazon regions and we are doing physical replication via VPN. It all seems to work just fine most of the time. I'm noticing in the logs that we have recurring erros (maybe 10 or 12 times per day) that look like this: 2018-05-17 06:36:14 UTC 5af0599f.210d LOG: invalid resource manager ID 49 at 384/42A4AB00 2018-05-17 06:36:14 UTC 5afd22de.7ac4 LOG: started streaming WAL from primary at 384/4200 on timeline 1 2018-05-17 07:20:17 UTC 5afd22de.7ac4 FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Or some that also look like this: 2018-05-17 07:20:17 UTC 5af0599f.210d LOG: record with incorrect prev-link 49F07120/9F100C95 at 384/45209FC0 2018-05-17 07:20:18 UTC 5afd2d31.1889 LOG: started streaming WAL from primary at 384/4500 on timeline 1 2018-05-17 08:03:28 UTC 5afd2d31.1889 FATAL: could not receive data from WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. And some like this: 2018-05-17 23:00:13 UTC 5afd63ec.26fc LOG: invalid magic number in log segment 00010385003C, offset 10436608 2018-05-17 23:00:14 UTC 5afe097d.49aa LOG: started streaming WAL from primary at 385/3C00 on timeline 1 Then, like maybe once every couple months or so, we have a crash with logs looking like this: 2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology WARNING: terminating connection because of crash of another server process 2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology HINT: In a moment you should be able to reconnect to the database and repeat your command. 2018-05-17 08:03:28 UTC 5af0599f.210a LOG: database system is shut down When this last error occurs, the recovery is to go on the replica and remove all the WAL logs from the pg_xlog director and then restart Postgresql. Everything seems to recover and come up fine. I've done some tests comparing counts between the replica and the primary and everything seems synced just fine from all I can tell. So, a couple of questions. 1) Should I be worried that my replica is corrupt in some way or given that everything *seems* ok, is it reasonable to believe that things are working correctly in spite of these errors being reported. 2) Is there something I should configure differently to avoid some of these errors? Thanks in advance for any help. Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
postgres on physical replica crashes
Hello. I've had several instances where postgres on my physical replica under version 9.6.6 is crashing with messages like the following in the logs: 2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser HINT: In a moment you should be able to reconnect to the database and repeat your command. 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser WARNING: terminating connection because of crash of another server process 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser HINT: In a moment you should be able to reconnect to the database and repeat your command. 2018-04-18 05:43:27 UTC 5acf5e12.6819 LOG: database system is shut down When this happens, what I've found is that I can go into the pg_xlog directory on the replica, remove all the log files and the postgres will restart and things seem to come back up normally. So, the question is what's going on here... is the log maybe getting corrupt in transmission somehow? Should I be concerned about the viability of my replica after having restarted in the described fashion? Thanks, Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
pglogical lag due to replication timeout
Hello all. I've recently configured logical replication on postgres 9.6.5 using pglogical 2.1.1. The problem I'm seeing is pretty constant stream of these errors on the service side in the postgres logs: LOG: terminating walsender process due to replication timeout After so many of these, I can see on the replication target side, that things are not being updated. Eventually, the source server seems to retry and then all the updates go through, but it seems like there's more lag than there really should be due to all these "replication timeouts". I'm not seeing anything on the network side, it's really a pretty fast connection between the 2 servers, I can't see that being the issue. Plus I have the wal_sender_timeout set to 600 (10 minutes) which seems like it should be way overkill. Any advice on this? Thanks, Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: pglogical in postgres 9.6
No need! I figured it out. Had to put this "synchronize_data := false" on the create_subscription call. Weird that there seem to be redundant parameters for this; one on the replication set add and one on the create subscription. Maybe I'm not quite understanding the usage on those or something. If anyone knows the difference, I'd be interested to hear. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: pglogical in postgres 9.6
hireology_tmp=# \dx List of installed extensions Name | Version | Schema | Description --+-+--+ pglogical| 2.0.0 | pglogical| PostgreSQL Logical Replication pglogical_origin | 1.0.0 | pglogical_origin | Dummy extension for compatibility when upgrading from Postgres 9.4 -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: pglogical in postgres 9.6
I downloaded from git and compiled from source. Based on the pglogical.control file, it appears to be 2.0.0. Is there a better way to determine the version? Thanks, Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Re: pglogical in postgres 9.6
I downloaded it from git and compiled from source just today. Looks like it's 2.0.0 based on the pglogical.control file that was in the source. I'm not sure how else I'd know. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
pglogical in postgres 9.6
Hello. I'm trying to get pglogical setup between 2 postgres 9.6 instances. I can get everything replicating over fine, my problem is that I can't seem to get it to work in the mode where it does not try to copy all the data over initially. On the source side, I'm doing this: psql -U hireology -d $SRCDB -c "SELECT pglogical.create_node( node_name := '${SRCNODE}', dsn := 'host=localhost port=${SRCPORT} dbname=${SRCDB} user=logical_replication')" psql -U hireology -d $SRCDB -c "select pglogical.create_replication_set('dw_repl', true, true, true, true)" psql -U hireology -d $SRCDB -c "select pglogical.replication_set_add_table ('dw_repl', 'accounts', false)" psql -U hireology -d $SRCDB -c "select pglogical.replication_set_add_table ('dw_repl', 'organizations', false)" psql -U hireology -d $SRCDB -c "select pglogical.replication_set_add_sequence ('dw_repl', 'accounts_id_seq', false)" psql -U hireology -d $SRCDB -c "select pglogical.replication_set_add_sequence ('dw_repl', 'organizations_id_seq', false)" Note on the replication_set_add_table command, I'm sending false as the third parameter which should tell it to not copy over the initial data. But when I create the subscription on the target side, I can see in the logs where it is issuing the copy command. And since I have the data already there, I get duplicate key errors and it bombs out. Any suggestions? Thanks, Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html