pgbouncer bug?

2020-08-21 Thread greigwise
Not sure if this is the right place to post this, but if not someone please
point me in the right direction.

My issue is with pgbouncer 1.14.   This does not seem to happen on 1.13.

If I do a service pgbouncer restart, then anytime I try to connect to my
databases via pgbouncer, I get ERROR: no such user regardless of what user
I'm using.  It's almost like it's not recognizing the auth_query I have
configured.  But then if I issue a reload, then it seems to work fine and I
no longer get the user not found.  The problem is easy enough to work around
as I don't restart pgbouncer all that much, but it doesn't seem like this is
probably the intended behavior.

Thanks,
Greig Wise

[Full Disclosure: I accidentally posted this message at first to the "base"
PostgreSQL group and am now moving to "PostgreSQL - general".]



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Table with many NULLS for indexed column yields strange query plan

2020-03-09 Thread greigwise
Seqscans are not disabled.  Also, this is PostgreSQL 10.11 if that helps.

Costs are as follows:

 seq_page_cost
---
 1

 random_page_cost
--
 1.5

It is odd that it does not just do a seqscan on table3.   It's a very small
table... only like 36 rows.   I'd think the plan *should* seq scan table3,
get the id where number = '', then use the index
index_table2_on_table3_id on table2 to get the matching rows for that id. 
It does use that index when I specify that table3_id is not null, but not
otherwise.  

table3_id is very selective into table2 for any non-null value, so I don't
know why it would choose to scan that entire index in the case of the first
query where the table3_id clearly can't be null due to the inner join.  
Check out this:

select tablename, attname, inherited, null_frac, avg_width, n_distinct, 
most_common_vals, most_common_freqs from pg_stats where tablename = 'table2'
and attname = 'table3_id';
   tablename   | attname  | inherited | null_frac | avg_width |
n_distinct |
---+--+---+---+---++
 table2   | table3_id| f |  0.996167 |   
8 |   39 |

most_common_vals: 
{985,363,990,991,992,45,81,8,126,307,378,739,855,993,994,190,338,366,369,537,663,805,846,155,277,803,870,988}

most_common_freqs:
{0.00023,0.0002,0.0002,0.0002,0.0002,0.00017,0.00017,0.00013,0.00013,0.00013,0.00013,0.00013,0.00013,0.00013,0.00013,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,6.7e-05,6.7e-05,6.7e-05,6.7e-05,6.7e-05}

Thanks again for any help.

Greig



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Table with many NULLS for indexed column yields strange query plan

2020-03-05 Thread greigwise
I have a query like this:

SELECT  "table1".* FROM "table1"
INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE
"table3"."number" = ''
AND ("table2"."type") IN ('Standard') ;

table2 has a large number of NULLS in the column table3_id.  There is an
index on this column.  Here is the result of explain analyze:

Merge Join  (cost=1001.20..4076.67 rows=17278 width=167) (actual
time=284.918..300.167 rows=2244 loops=1)
   Merge Cond: (table2.table3_id = table3.id)
   ->  Gather Merge  (cost=1000.93..787825.78 rows=621995 width=175) (actual
time=5.786..283.269 rows=64397 loops=1)
 Workers Planned: 4
 Workers Launched: 4
 ->  Nested Loop  (cost=0.87..712740.12 rows=155499 width=175)
(actual time=0.091..102.708 rows=13107 loops=5)
   ->  Parallel Index Scan using index_table2_on_table3_id on
table2  (cost=0.43..489653.08 rows=155499 width=16) (actual
time=0.027..22.327 rows=13107 loops=5)
 Filter: ((type)::text = 'Standard'::text)
   ->  Index Scan using table1_pk on table1  (cost=0.44..1.43
rows=1 width=167) (actual time=0.005..0.005 rows=1 loops=65535)
 Index Cond: (id = table2.table1_id)
   ->  Index Scan using table3_pkey on table3  (cost=0.27..53.40 rows=1
width=8) (actual time=0.041..0.048 rows=1 loops=1)
 Filter: ((number)::text = ''::text)
 Rows Removed by Filter: 35
 Planning time: 0.450 ms
 Execution time: 310.230 ms

You can see the row estimate there is way off on the Parallel Index Scan.  
I suspect that this is because it's including the rows with null in the
selectivity estimate even though the table3_id can't possibly be null here
due to the inner join.  

If I modify the query like this:

SELECT  "table1".* FROM "table1"
INNER JOIN "table2" ON "table2"."table1_id" = "table1"."id"
INNER JOIN "table3" ON "table3"."id" = "table2"."table3_id" WHERE
"table3"."number" = ''
AND ("table2"."type") IN ('Standard') and table3_id is not null;

Just adding in table3_id is not null at the end there, I get a much better
plan.

Nested Loop  (cost=1.14..290.04 rows=66 width=167) (actual
time=0.058..11.258 rows=2244 loops=1)
   ->  Nested Loop  (cost=0.70..64.46 rows=66 width=8) (actual
time=0.049..2.873 rows=2244 loops=1)
 ->  Index Scan using table3_pkey on table3  (cost=0.27..53.40
rows=1 width=8) (actual time=0.030..0.035 rows=1 loops=1)
   Filter: ((number)::text = ''::text)
   Rows Removed by Filter: 35
 ->  Index Scan using index_table2_on_table3_id on table2 
(cost=0.43..11.05 rows=1 width=16) (actual time=0.017..2.102 rows=2244
loops=1)
   Index Cond: ((table3_id = table3.id) AND (table3_id IS NOT
NULL))
   Filter: ((type)::text = 'Standard'::text)
   ->  Index Scan using table1_pk on table1  (cost=0.44..3.42 rows=1
width=167) (actual time=0.003..0.003 rows=1 loops=2244)
 Index Cond: (id = table2. id)
 Planning time: 0.403 ms
 Execution time: 11.672 ms

Can I do anything statistics wise so that I get a better plan here or do I
have to modify the query.  It seems kinda hacky that I would have to specify
is not null on that column since like I said it can't possibly be null.

Thanks,

Greig Wise



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: BitmapAnd on correlated column?

2019-10-04 Thread greigwise
>Extended statistics will tell PostgreSQL that it is very unlikely
>that the first condition will contribute significantly, but that
>is no proof that the condition can be omitted, so the optimizer
>cannot just skip the condition.

Granted it cannot skip the condition, but that doesn't mean that it has to
use that second index.  It's doing a  recheck on the conditions anyway,
right?

Thanks.
Greig



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: BitmapAnd on correlated column?

2019-10-03 Thread greigwise
yes.  where and is a typo.



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Out of Memory

2018-09-26 Thread greigwise
All right.. one more thing here.  Any suggestions for how to set overcommit
on a postgres db server with 16 GB of RAM and no swap?   I think I want
vm.overcommit_memory = 2, as I understand that prevents the OOM killer from
zapping me.   Is 100% the right way to go for overcommit_ratio?  Is there a
drawback to this?

Thanks again.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-26 Thread greigwise
I think I figured it out:

vm.overcommit_memory = 2
vm.overcommit_ratio = 50

Only allows me to use 50% of my RAM... ugh!  I have 16 GB, so when only 8 is
left, I start seeing OOM.  Will increase this setting and see if it helps.

Thanks everyone for the help.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-26 Thread greigwise
Tom Lane-2 wrote
> greigwise 

> greigwise@

>  writes:
>> Is it possible that the fact that my stack size is limited is what is
>> causing my issue?
> 
> No.  If you were hitting that limit you'd get a message specifically
> talking about stack.
> 
>   regards, tom lane

Well, darn.  I'm at a loss... any suggestions of what I should do next to
troubleshoot this?

Thanks.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-26 Thread greigwise
There is also this:

-bash-4.2$ prlimit -p 6590
RESOURCE   DESCRIPTION SOFT  HARD UNITS
AS address space limitunlimited unlimited bytes
CORE   max core file size 0 unlimited blocks
CPUCPU time   unlimited unlimited seconds
DATA   max data size  unlimited unlimited bytes
FSIZE  max file size  unlimited unlimited blocks
LOCKS  max number of file locks held  unlimited unlimited
MEMLOCKmax locked-in-memory address space 65536 65536 bytes
MSGQUEUE   max bytes in POSIX mqueues819200819200 bytes
NICE   max nice prio allowed to raise 0 0
NOFILE max number of open files1024  4096
NPROC  max number of processes 4096 59341
RSSmax resident set size  unlimited unlimited pages
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasksunlimited unlimited microsecs
SIGPENDING max number of pending signals  59341 59341
STACK  max stack size   8388608 unlimited bytes

Is it possible that the fact that my stack size is limited is what is
causing my issue?

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-25 Thread greigwise
Tom Lane-2 wrote
> greigwise 

> greigwise@

>  writes:
>> If I have nearly 8 GB of memory left, why am I getting out of memory
>> errors?
> 
> Probably the postmaster is running under restrictive ulimit settings.
> 
>   regards, tom lane

If I login as the user which runs postmaster, I get this:

-bash-4.2$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 59341
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 4096
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Max memory size is unlimited as is virtual memory... is there something else
there I should be concerned with in regard to out of memory?

Thanks again.
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-25 Thread greigwise
Well, I've been unsuccessful so far on creating a standalone test.   

I have put some scripting in place to capture some additional information on
the server with the out of memory issues.   I have a script which just
periodically dumps the output of free -m to a text file.  

So, the output of free -m immediately before and after the out of memory
error looks like this:

Just before:
  totalusedfree  shared  buff/cache  
available
Mem:  148772978 1323553   11766   
7943
Swap: 0   0   0

Just after: 
  totalusedfree  shared  buff/cache  
available
Mem:  148772946 6493548   11280   
7982
Swap: 0   0   0

If I have nearly 8 GB of memory left, why am I getting out of memory errors?

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-21 Thread greigwise
Well, we are 64-bit I'll see if I can make some kind of self contained
test to repeat it.

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Out of Memory

2018-09-20 Thread greigwise
ulimit -a for the postgres user shows memory unlimited.

numactl --hardware gives command not found.

Thanks again.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Out of Memory

2018-09-20 Thread greigwise
Hello, I'm running postgres 9.6.10 on Centos 7.   Seeing the occasional out
of memory error trying to run a query.   In the logs I see something like
this:

Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks);
319665696 used
2018-09-20 18:08:01 UTC  5ba3e1a2.7a8a dbname ERROR:  out of memory
2018-09-20 18:08:01 UTC  5ba3e1a2.7a8a dbname DETAIL:  Failed on request
of size 2016.

If I have 142439136 free, then why am I failing on a request of size 2016?

Am I misunderstanding here?

Thanks,
Greig Wise



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: pg_basebackup: could not receive data from WAL stream

2018-09-01 Thread greigwise
I should also add that when it fails, it's always right at the very end of
the backup when it's very nearly done or maybe even after it's done.

Thanks again.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



pg_basebackup: could not receive data from WAL stream

2018-09-01 Thread greigwise
Hello.

On postgresql 10.5, my pg_basebackup is failing with this error:

pg_basebackup: could not receive data from WAL stream: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request

In the postgres log files, I'm seeing:

2018-09-02 00:57:32 UTC bkp_user 5b8b278c.11c3f [unknown] LOG:  terminating
walsender process due to replication timeout

I'm running the following command right on the database server itself:

pg_basebackup -U repl -D /var/tmp/pg_basebackup_20180901 -Ft -z

It seems to be an intermittent problem.. I've had it fail or succeed about
50/50.  I even bumped up the wal_sender_timeout to 2000.  One notable thing
is that I'm running on an ec2 instance on AWS.

Any advice would be helpful.

Greig Wise



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Errors with physical replication

2018-05-21 Thread greigwise
Hello.  

We are on Postgresql version 9.6.6.  We have 2 EC2 instances in different
Amazon regions and we are doing physical replication via VPN.  It all seems
to work just fine most of the time.   I'm noticing in the logs that we have
recurring erros (maybe 10 or 12 times per day) that look like this:

2018-05-17 06:36:14 UTC  5af0599f.210d  LOG:  invalid resource manager ID 49
at 384/42A4AB00
2018-05-17 06:36:14 UTC  5afd22de.7ac4  LOG:  started streaming WAL from
primary at 384/4200 on timeline 1
2018-05-17 07:20:17 UTC  5afd22de.7ac4  FATAL:  could not receive data from
WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Or some that also look like this:

2018-05-17 07:20:17 UTC  5af0599f.210d  LOG:  record with incorrect
prev-link 49F07120/9F100C95 at 384/45209FC0
2018-05-17 07:20:18 UTC  5afd2d31.1889  LOG:  started streaming WAL from
primary at 384/4500 on timeline 1
2018-05-17 08:03:28 UTC  5afd2d31.1889  FATAL:  could not receive data from
WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

And some like this:

2018-05-17 23:00:13 UTC  5afd63ec.26fc  LOG:  invalid magic number  in
log segment 00010385003C, offset 10436608
2018-05-17 23:00:14 UTC  5afe097d.49aa  LOG:  started streaming WAL from
primary at 385/3C00 on timeline 1

Then, like maybe once every couple months or so, we have a crash with logs
looking like this:

2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology WARNING: 
terminating connection because of crash of another server process
2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2018-05-17 08:03:28 UTC hireology 5af47b75.2670 hireology HINT:  In a moment
you should be able to reconnect to the database and repeat your command.
2018-05-17 08:03:28 UTC  5af0599f.210a  LOG:  database system is shut down

When this last error occurs, the recovery is to go on the replica and remove
all the WAL logs from the pg_xlog director and then restart Postgresql. 
Everything seems to recover and come up fine.  I've done some tests
comparing counts between the replica and the primary and everything seems
synced just fine from all I can tell.  

So, a couple of questions.  1) Should I be worried that my replica is
corrupt in some way or given that everything *seems* ok, is it reasonable to
believe that things are working correctly in spite of these errors being
reported.  2)  Is there something I should configure differently to avoid
some of these errors?

Thanks in advance for any help.

Greig Wise




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



postgres on physical replica crashes

2018-04-18 Thread greigwise
Hello.  I've had several instances where postgres on my physical replica
under version 9.6.6 is crashing with messages like the following in the
logs:

2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser WARNING:  terminating
connection because of crash of another server process
2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2018-04-18 05:43:27 UTC  5acf5e12.6819  LOG:  database system is shut down

When this happens, what I've found is that I can go into the pg_xlog
directory on the replica, remove all the log files and the postgres will
restart and things seem to come back up normally.

So, the question is what's going on here... is the log maybe getting corrupt
in transmission somehow?  Should I be concerned about the viability of my
replica after having restarted in the described fashion?

Thanks,
Greig Wise



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



pglogical lag due to replication timeout

2018-03-12 Thread greigwise
Hello all.

I've recently configured logical replication on postgres 9.6.5 using
pglogical 2.1.1.

The problem I'm seeing is pretty constant stream of these errors on the
service side in the postgres logs:

LOG:  terminating walsender process due to replication timeout

After so many of these, I can see on the replication target side, that
things are not being updated.  Eventually, the source server seems to retry
and then all the updates go through, but it seems like there's more lag than
there really should be due to all these "replication timeouts".  

I'm not seeing anything on the network side, it's really a pretty fast
connection between the 2 servers, I can't see that being the issue.  Plus I
have the wal_sender_timeout set to 600 (10 minutes) which seems like it
should be way overkill.

Any advice on this?

Thanks,
Greig Wise



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: pglogical in postgres 9.6

2018-02-13 Thread greigwise
No need!  I figured it out.

Had to put this "synchronize_data := false" on the create_subscription call. 
Weird that there seem to be redundant parameters for this; one on the
replication set add and one on the create subscription.  Maybe I'm not quite
understanding the usage on those or something.   If anyone knows the
difference, I'd be interested to hear.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: pglogical in postgres 9.6

2018-02-12 Thread greigwise
hireology_tmp=# \dx
List of installed extensions
   Name   | Version |  Schema  |   
Description
--+-+--+
 pglogical| 2.0.0   | pglogical| PostgreSQL Logical
Replication
 pglogical_origin | 1.0.0   | pglogical_origin | Dummy extension for
compatibility when upgrading from Postgres 9.4



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: pglogical in postgres 9.6

2018-02-12 Thread greigwise
I downloaded from git and compiled from source.  Based on the
pglogical.control file, it appears to be 2.0.0.  Is there a better way to
determine the version?

Thanks,
Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: pglogical in postgres 9.6

2018-02-12 Thread greigwise
I downloaded it from git and compiled from source just today.

Looks like it's 2.0.0 based on the pglogical.control file that was in the
source.  I'm not sure how else I'd know.

Greig



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



pglogical in postgres 9.6

2018-02-12 Thread greigwise
Hello.  I'm trying to get pglogical setup between 2 postgres 9.6 instances. 
I can get everything replicating over fine, my problem is that I can't seem
to get it to work in the mode where it does not try to copy all the data
over initially.  On the source side, I'm doing this:

  psql -U hireology -d $SRCDB -c "SELECT pglogical.create_node( node_name :=
'${SRCNODE}', dsn := 'host=localhost port=${SRCPORT} dbname=${SRCDB}
user=logical_replication')"
  
  psql -U hireology -d $SRCDB -c "select
pglogical.create_replication_set('dw_repl', true, true, true, true)"
  
  psql -U hireology -d $SRCDB -c "select pglogical.replication_set_add_table
('dw_repl', 'accounts', false)"
  psql -U hireology -d $SRCDB -c "select pglogical.replication_set_add_table
('dw_repl', 'organizations', false)"
  psql -U hireology -d $SRCDB -c "select
pglogical.replication_set_add_sequence ('dw_repl', 'accounts_id_seq',
false)"
  psql -U hireology -d $SRCDB -c "select
pglogical.replication_set_add_sequence ('dw_repl', 'organizations_id_seq',
false)"

Note on the replication_set_add_table command, I'm sending false as the
third parameter which should tell it to not copy over the initial data.  But
when I create the subscription on the target side, I can see in the logs
where it is issuing the copy command.  And since I have the data already
there, I get duplicate key errors and it bombs out.  Any suggestions?

Thanks,
Greig Wise




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html