Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-14 Thread Adrian Klaver

On 12/13/2015 01:23 PM, Gerhard Wiesinger wrote:

On 13.12.2015 21:14, Bill Moran wrote:

Wait ... this is a combined HTTP/Postgres server? You didn't mention that
earlier, and it's kind of important.

What evidence do you have that Postgres is actually the part of
this system running out of memory?


For me the complete picture doesn't look consistent.


I don't see any such evidence in any of
your emails, and (based on experience) I find it pretty likely that
whatever
is running under node is doing something in a horrifically
memory-inefficient
manner. Since you mention that you see nothing in the PG logs, that
makes it
even more likely (to me) that you're looking entirely in the wrong place.

I'd be willing to bet a steak dinner that if you put the web server on a
different server than the DB, that the memory problems would follow the
web server and not the DB server.


Changes in config:
track_activity_query_size = 102400
work_mem = 100MB

Ok, we restarted PostgreSQL and had it stopped for seconds, and logged
top every second:

When PostgreSQL was down nearly all memory was freed, looks good to me.
So it is likely that node and other processes are not the cause.
Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
Swap: 512M Total, 477M Used, 35M Free, 93% Inuse

When PostgreSQL restarted, Inactive was growing fast (~1min):
Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M
Free
Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

After some few minutes we are back again at the same situation:
Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

The steak dinner is mine :-) Donating to the PostgreSQL community :-)


To me all the above proves is that this a complete system issue and only 
with all the parts running do you get a problem. It still does indicate 
which part or interaction of parts is the issue. This is further muddied 
by no description of what, if anything, you where doing in the above 
scenarios.




Any further ideas, I don't think this is normal system behaviour.

Ciao,
Gerhard






--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Gerhard Wiesinger

Hello,

some further details from the original FreeBSD 10.1 machine:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

  PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU COMMAND
77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98% 
postgres: username dbnamee 127.0.0.1(43367)  (postgres)
75862 pgsql 1  200  7321M  7266M select  4   7:04 0.00% 
postgres: checkpointer process(postgres)
75863 pgsql 1  200  7321M  7260M select  4   3:34 0.00% 
postgres: writer process(postgres)
75867 pgsql 1  200  7323M  7237M sbwait  0   3:19 0.00% 
postgres: username dbnamee ipaddress(58977)  (postgres)
76178 pgsql 1  200  7323M  7237M sbwait  2   3:18 0.00% 
postgres: username dbnamee ipaddress(35807)  (postgres)
75868 pgsql 1  200  7323M  7237M sbwait  0   3:20 0.00% 
postgres: username dbnamee ipaddress(58978)  (postgres)
75884 pgsql 1  200  7323M  7236M sbwait  3   3:20 0.00% 
postgres: username dbnamee ipaddress(59908)  (postgres)
75869 pgsql 1  200  7323M  7236M sbwait  3   3:20 0.00% 
postgres: username dbnamee ipaddress(58982)  (postgres)
76181 pgsql 1  200  7323M  7236M sbwait  2   3:18 0.00% 
postgres: username dbnamee ipaddress(35813)  (postgres)
75883 pgsql 1  200  7323M  7236M sbwait  0   3:20 0.00% 
postgres: username dbnamee ipaddress(59907)  (postgres)
76180 pgsql 1  200  7323M  7236M sbwait  1   3:19 0.00% 
postgres: username dbnamee ipaddress(35811)  (postgres)
76177 pgsql 1  200  7323M  7236M sbwait  1   3:18 0.00% 
postgres: username dbnamee ipaddress(35712)  (postgres)
76179 pgsql 1  200  7323M  7236M sbwait  5   3:18 0.00% 
postgres: username dbnamee ipaddress(35810)  (postgres)
64951 pgsql 1  750  7375M   662M CPU11   0:01 11.47% 
postgres: username dbnamee 127.0.0.1(32073)  (postgres)
64950 pgsql 1  770  7325M   598M CPU66   0:02 16.55% 
postgres: username dbnamee 127.0.0.1(44251)  (postgres)


Out of memory:
kernel: swap_pager_getswapspace(4): failed
kernel: swap_pager_getswapspace(8): failed
kernel: swap_pager_getswapspace(3): failed

Main issue is IHMO (as far as I understood the FreeBSD Memory system) 
that 20G are INACTIVE. When I subtract the shared memory, even ~13GB 
should be available, but they are still allocated but inactive 
(INACTIVE). INACTIVE memory might be clean or dirty. As we get into out 
of memory situations it is likely that the memory is dirty (otherwise it 
would have been reused).


Config:
Memory: 32GB, Swap: 512MB

maintenance_work_mem = 512MB
effective_cache_size = 10GB
work_mem = 892MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7080MB
max_connections = 80
autovacuum_max_workers = 3

Thnx.

Ciao,
Gerhard


On 13.12.2015 08:49, Gerhard Wiesinger wrote:

Hello,

We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple 
worker processes connected via persistent connections to PostgreSQL, 
they perform just simple queries with SELECT on primary keys and 
simple INSERTS/UPDATES. Normally nearly all the workers are idle but 
they still consume the maximum configured work mem on the PostgreSQL 
server and the memory is also resident. If some other queries get in 
we get into out of memory situations. So it looks like PostgreSQL has 
memory leaks.


I found a test scenario to reproduce it also on a newer FreeBSD 10.2 
VM as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):


Executions in psql with one persisent connection:
-- Create the table
CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM 
generate_Series(1,1) s;

-- Create the index
CREATE INDEX ON t_random(s);

-- Restart psql with a new connection:

-- Memory goes slighty up after each execution even after canceling:
-- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on 
cancel the query or multiple execution

SELECT * FROM t_random ORDER BY md5 LIMIT 10;

-- Therefore I created a function:
CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
RETURNS void AS $$
BEGIN
  -- RAISE NOTICE 'num=%', num;
  FOR i IN 1..num LOOP
PERFORM * FROM t_random WHERE s = i;
  END LOOP;
END;
$$  LANGUAGE plpgsql;

-- Test it several times
SELECT execmultiplei(1000);

-- Linux testing (FreeBSD is similar), relevant part is RES (resident 
memory):

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
-- after startup of psql
26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00 
postgres: postgres postgres [local] idle

-- Memory goes up, ok so far
26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90 
postgres: postgres 

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Gerhard Wiesinger

Hello Bill,

Thank you for your response, comments inline:

On 13.12.2015 16:05, Bill Moran wrote:

On Sun, 13 Dec 2015 09:57:21 +0100
Gerhard Wiesinger  wrote:

some further details from the original FreeBSD 10.1 machine:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU COMMAND
77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98%
postgres: username dbnamee 127.0.0.1(43367)  (postgres)



I see no evidence of an actual leak here. Each process is basically using
the 7G of shared_buffers you have allocated in the config (which is only
7G _total_ for all processes, since it's shared memory)


OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?


Out of memory:
kernel: swap_pager_getswapspace(4): failed
kernel: swap_pager_getswapspace(8): failed
kernel: swap_pager_getswapspace(3): failed

Main issue is IHMO (as far as I understood the FreeBSD Memory system)
that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
should be available, but they are still allocated but inactive
(INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
of memory situations it is likely that the memory is dirty (otherwise it
would have been reused).

Not quite correct. Inactive memory is _always_ available for re-use.


Are you sure that's true?

Monitoring inactive memory:
cat vm_stat.sh
#!/usr/bin/env bash

while [ 1 ]; do
  date +%Y.%m.%d.%H.%M.%S
  sysctl -a | grep vm.stats.vm.
  sleep 1
done

And even we get out of memory with swap_pager_getswapspace Inactive 
Memory (from the log file) is around 20GB (doesn't go down or up)

vm.stats.vm.v_inactive_count: 5193530 (*4k pages is around 20GB)

Then we have 20GB inactive memory, but we still get out of memory with 
kernel: swap_pager_getswapspace(4): failed. Any ideas why?





Config:
Memory: 32GB, Swap: 512MB

Probably not the cause of this problem, but this is a non-optimal layout.
If you're going to use swap at all, it should generally be 2x the size of
your RAM for optimal performance under memory pressure.


Yes, we will try to disable it totally. Nevertheless why do we get out 
of memory/Swap?





maintenance_work_mem = 512MB
effective_cache_size = 10GB
work_mem = 892MB

I expect that this value is the cause of the problem. The scenario you
describe below is sorting a large table on an unindexed column, meaning
it will have to use all that work_mem. I'd be interested to see the
output of:

EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 10;


That was only a test query, has nothing to do with production based 
query. They are mostly SELECT/INSERTS/UPDATES on primary keys.



But even without that information, I'd recommend you reduce work_mem
to about 16M or so.


Why so low? E.g. sorting on reporting or some long running queries are 
then done on disk and not in memory.



wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7080MB
max_connections = 80
autovacuum_max_workers = 3

[snip]


We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple
worker processes connected via persistent connections to PostgreSQL,
they perform just simple queries with SELECT on primary keys and
simple INSERTS/UPDATES.

That's not at all the test scenario you show below. The scenario below
is a large sort operation on a non-indexed column, which is vastly
different than a single-row fetch based on an index.


Yes, that non indexed select was just for testing purporeses.




Normally nearly all the workers are idle but
they still consume the maximum configured work mem on the PostgreSQL
server and the memory is also resident.

I see no evidence of that in your top output. Each process has a
reference to the 7G of shared_buffers you allocated, which is memory
shared by all processes, and is expected. I'm not as familiar with
Linux top, but the results appear to be the same.


Yes, might be the case, but if it is n times 7G shared memory then we 
have ~20GB Inactive Memory available, so plenty of memory. And why are 
we getting: kernel: swap_pager_getswapspace(4): failed?


Thnx.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Tom Lane
Gerhard Wiesinger  writes:
>> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
>> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

Just judging from the name of the function, I would bet this is a direct
result of having only 512M of swap configured.  As Bill already pointed
out, that's a pretty useless choice on a system with 32G of RAM.  As soon
as the kernel tries to push out any significant amount of idle processes,
it's gonna be out of swap space.  The numbers you show above prove that
it is almost out of free swap already.

Also, while that 20G of "inactive" pages may be candidates for reuse,
they probably can't actually be reused without swapping them out ...
and there's noplace for that data to go.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Bill Moran
On Sun, 13 Dec 2015 16:35:08 +0100
Gerhard Wiesinger  wrote:

> Hello Bill,
> 
> Thank you for your response, comments inline:
> 
> On 13.12.2015 16:05, Bill Moran wrote:
> > On Sun, 13 Dec 2015 09:57:21 +0100
> > Gerhard Wiesinger  wrote:
> >> some further details from the original FreeBSD 10.1 machine:
> >>
> >> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
> >> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
> >>
> >> PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU 
> >> COMMAND
> >> 77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98%
> >> postgres: username dbnamee 127.0.0.1(43367)  (postgres)
> > 
> >
> > I see no evidence of an actual leak here. Each process is basically using
> > the 7G of shared_buffers you have allocated in the config (which is only
> > 7G _total_ for all processes, since it's shared memory)
> 
> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

You haven't provided enough information to isolate that cause yet. What's
in the Postgres log? Surely it will have logged something when its request
for RAM was denied, and it should be more informational than the OS'
generic message.

> >> Out of memory:
> >> kernel: swap_pager_getswapspace(4): failed
> >> kernel: swap_pager_getswapspace(8): failed
> >> kernel: swap_pager_getswapspace(3): failed
> >>
> >> Main issue is IHMO (as far as I understood the FreeBSD Memory system)
> >> that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
> >> should be available, but they are still allocated but inactive
> >> (INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
> >> of memory situations it is likely that the memory is dirty (otherwise it
> >> would have been reused).
> > Not quite correct. Inactive memory is _always_ available for re-use.
> 
> Are you sure that's true?

Yes. Read The Design and Implementation of FreeBSD for the details.

> Monitoring inactive memory:
> cat vm_stat.sh
> #!/usr/bin/env bash
> 
> while [ 1 ]; do
>date +%Y.%m.%d.%H.%M.%S
>sysctl -a | grep vm.stats.vm.
>sleep 1
> done
> 
> And even we get out of memory with swap_pager_getswapspace Inactive 
> Memory (from the log file) is around 20GB (doesn't go down or up)
> vm.stats.vm.v_inactive_count: 5193530 (*4k pages is around 20GB)
> 
> Then we have 20GB inactive memory, but we still get out of memory with 
> kernel: swap_pager_getswapspace(4): failed. Any ideas why?

Theory: If the planner decides it needs to do 30 sort operations for a
query, it will try to allocate 27G of RAM, which exceeds what's available,
and therefore never gets allocated. So you get the "out of space" message,
but the actual memory usage doesn't change.

> >> maintenance_work_mem = 512MB
> >> effective_cache_size = 10GB
> >> work_mem = 892MB
> > I expect that this value is the cause of the problem. The scenario you
> > describe below is sorting a large table on an unindexed column, meaning
> > it will have to use all that work_mem. I'd be interested to see the
> > output of:
> >
> > EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 10;
> 
> That was only a test query, has nothing to do with production based 
> query. They are mostly SELECT/INSERTS/UPDATES on primary keys.

Then provide the _actual_ queries and the EXPLAIN ANALYZE and the table
layouts and basic data distribution of the actual cause. If your test
case is completely non-representative of what's happening, then you're
not going to get useful answers.

> > But even without that information, I'd recommend you reduce work_mem
> > to about 16M or so.
> 
> Why so low? E.g. sorting on reporting or some long running queries are 
> then done on disk and not in memory.

Even a simple query could involve multiple sorts, and you're allowing
each sort to use up to 890M of RAM (which is _not_ shared). As noted
earlier, even a moderately complex query could exceed the available
RAM on the system. But since you don't provide the actual queries and
tables causing problems, I can only guess. And since you appear to
have already decided what the cause of the problem is, then crafted
completely non-relevent queries that you think prove your point, I'm
not sure there's anything I can do to help you.

> >> wal_buffers = 8MB
> >> checkpoint_segments = 16
> >> shared_buffers = 7080MB
> >> max_connections = 80
> >> autovacuum_max_workers = 3
> > [snip]
> >
> >>> We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple
> >>> worker processes connected via persistent connections to PostgreSQL,
> >>> they perform just simple queries with SELECT on primary keys and
> >>> simple INSERTS/UPDATES.
> > That's not at all the test scenario you show below. The scenario below
> > is a large sort operation on a non-indexed column, which is vastly
> > different than a single-row fetch based on an index.
> 
> Yes, that non indexed select was just for testing purporeses.


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Bill Moran
On Sun, 13 Dec 2015 09:57:21 +0100
Gerhard Wiesinger  wrote:
> 
> some further details from the original FreeBSD 10.1 machine:
> 
> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
> 
>PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU COMMAND
> 77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98% 
> postgres: username dbnamee 127.0.0.1(43367)  (postgres)



I see no evidence of an actual leak here. Each process is basically using
the 7G of shared_buffers you have allocated in the config (which is only
7G _total_ for all processes, since it's shared memory)

> Out of memory:
> kernel: swap_pager_getswapspace(4): failed
> kernel: swap_pager_getswapspace(8): failed
> kernel: swap_pager_getswapspace(3): failed
> 
> Main issue is IHMO (as far as I understood the FreeBSD Memory system) 
> that 20G are INACTIVE. When I subtract the shared memory, even ~13GB 
> should be available, but they are still allocated but inactive 
> (INACTIVE). INACTIVE memory might be clean or dirty. As we get into out 
> of memory situations it is likely that the memory is dirty (otherwise it 
> would have been reused).

Not quite correct. Inactive memory is _always_ available for re-use.

> Config:
> Memory: 32GB, Swap: 512MB

Probably not the cause of this problem, but this is a non-optimal layout.
If you're going to use swap at all, it should generally be 2x the size of
your RAM for optimal performance under memory pressure.

> maintenance_work_mem = 512MB
> effective_cache_size = 10GB
> work_mem = 892MB

I expect that this value is the cause of the problem. The scenario you
describe below is sorting a large table on an unindexed column, meaning
it will have to use all that work_mem. I'd be interested to see the
output of:

EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 10;

But even without that information, I'd recommend you reduce work_mem
to about 16M or so.

> wal_buffers = 8MB
> checkpoint_segments = 16
> shared_buffers = 7080MB
> max_connections = 80
> autovacuum_max_workers = 3

[snip]

> > We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple 
> > worker processes connected via persistent connections to PostgreSQL, 
> > they perform just simple queries with SELECT on primary keys and 
> > simple INSERTS/UPDATES.

That's not at all the test scenario you show below. The scenario below
is a large sort operation on a non-indexed column, which is vastly
different than a single-row fetch based on an index.

> > Normally nearly all the workers are idle but 
> > they still consume the maximum configured work mem on the PostgreSQL 
> > server and the memory is also resident.

I see no evidence of that in your top output. Each process has a
reference to the 7G of shared_buffers you allocated, which is memory
shared by all processes, and is expected. I'm not as familiar with
Linux top, but the results appear to be the same.

> > If some other queries get in 
> > we get into out of memory situations. So it looks like PostgreSQL has 
> > memory leaks.
> >
> > I found a test scenario to reproduce it also on a newer FreeBSD 10.2 
> > VM as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):
> >
> > Executions in psql with one persisent connection:
> > -- Create the table
> > CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM 
> > generate_Series(1,1) s;
> > -- Create the index
> > CREATE INDEX ON t_random(s);
> >
> > -- Restart psql with a new connection:
> >
> > -- Memory goes slighty up after each execution even after canceling:
> > -- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on 
> > cancel the query or multiple execution
> > SELECT * FROM t_random ORDER BY md5 LIMIT 10;
> >
> > -- Therefore I created a function:
> > CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
> > RETURNS void AS $$
> > BEGIN
> >   -- RAISE NOTICE 'num=%', num;
> >   FOR i IN 1..num LOOP
> > PERFORM * FROM t_random WHERE s = i;
> >   END LOOP;
> > END;
> > $$  LANGUAGE plpgsql;
> >
> > -- Test it several times
> > SELECT execmultiplei(1000);
> >
> > -- Linux testing (FreeBSD is similar), relevant part is RES (resident 
> > memory):
> >   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> > -- after startup of psql
> > 26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00 
> > postgres: postgres postgres [local] idle
> > -- Memory goes up, ok so far
> > 26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77 
> > postgres: postgres postgres [local] SELECT
> > 26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81 
> > postgres: postgres postgres [local] SELECT
> > 26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90 
> > postgres: postgres postgres [local] SELECT
> > -- Function execmultiplei and transaction terminated, but memory still 
> > allocated!!!
> > 26851 postgres  20   0 

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Gerhard Wiesinger

On 13.12.2015 18:17, Tom Lane wrote:

Gerhard Wiesinger  writes:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

Just judging from the name of the function, I would bet this is a direct
result of having only 512M of swap configured.  As Bill already pointed
out, that's a pretty useless choice on a system with 32G of RAM.  As soon
as the kernel tries to push out any significant amount of idle processes,
it's gonna be out of swap space.  The numbers you show above prove that
it is almost out of free swap already.


The system wasn't designed by me, I wouldn't do it either that way. Does 
swapoff help?




Also, while that 20G of "inactive" pages may be candidates for reuse,
they probably can't actually be reused without swapping them out ...
and there's noplace for that data to go.


There is no log in syslog (where postgres log) when 
swap_pager_getswapspace is logged.


But why do we have 20G of Inactive pages? They are still allocated by 
kernel or user space. As you can see below (top output) NON Postgres 
processes are around 9G in virtual size, resident even lower. The system 
is nearly idle, and the queries typically aren't active after one second 
agin. Therefore where does the rest of the 11G of Inactive pages come 
from (if it isn't a Postgres/FreeBSD memory leak)?

I read that Postgres has it's own memory allocator:
https://www.reddit.com/r/programming/comments/18zija/github_got_30_better_performance_using_tcmalloc/
Might that be an issue with double allocation/freeing and the "cheese 
hole" topic with memory fragmentation?


https://www.opennet.ru/base/dev/fbsdvm.txt.html
inactivepages not actively used by programs which are
dirty and (at some point) need to be written
to their backing store (typically disk).
These pages are still associated with objects and
can be reclaimed if a program references them.
Pages can be moved from the active to the inactive
queue at any time with little adverse effect.
Moving pages to the cache queue has bigger
consequences (note 1)

https://unix.stackexchange.com/questions/134862/what-do-the-different-memory-counters-in-freebsd-mean
Active: Memory currently being used by a process
Inactive: Memory that has been freed but is still cached since it 
may be used again. If more Free memory is required, this memory can be 
cleared and become free. This memory is not cleared before it is needed, 
because "free memory is wasted memory", it doesn't cost anything to keep 
the old data around in case it is needed again.
Wired: Memory in use by the Kernel. This memory cannot be swapped 
out (GW: including ZFS cache!!!)
Cache: Memory being used to cache data, can be freed immediately if 
required

Buffers: Disk cache
Free: Memory that is completely free and ready to use. Inactive, 
Cache and Buffers can become free if they are cleaned up.


Thnx.

Ciao,
Gerhard


last pid:  7277;  load averages:  0.91,  0.96,  1.02  up 18+06:22:31
18:57:54

135 processes: 2 running, 132 sleeping, 1 waiting

Mem: 8020M Active, 19G Inact, 3537M Wired, 299M Cache, 1679M Buf, 38M Free
Swap: 512M Total, 501M Used, 12M Free, 97% Inuse

  PID USERNAMETHR PRI NICE   SIZERES STATE   C TIMEWCPU COMMAND
77941 pgsql 5  200  7921M  7295M usem7 404:32  10.25% 
postgres

79570 pgsql 1  200  7367M  6968M sbwait  6 4:24   0.59% postgres
73449 pgsql 1  270  7367M  6908M sbwait  4 8:08   5.08% postgres
74209 pgsql 1  270  7367M  6803M sbwait  0 6:37   1.46% postgres
74207 pgsql 1  250  7367M  6748M sbwait  6 6:34   1.76% postgres
74206 pgsql 1  220  7367M  6548M sbwait  5 6:44   1.86% postgres
73380 pgsql 1  230  7367M  7265M sbwait  4 8:15   1.17% postgres
74208 pgsql 1  240  7367M  7223M sbwait  1 6:30   4.59% postgres
79569 pgsql 1  240  7367M  7105M sbwait  3 4:36   1.17% postgres
74210 pgsql 1  290  7363M  7182M sbwait  5 6:41   5.47% postgres
73479 pgsql 1  220  7363M  6560M sbwait  6 7:14   3.56% postgres
83030 pgsql 1  200  7329M   193M sbwait  5 0:00   0.00% postgres
76178 pgsql 1  200  7323M  7245M sbwait  2 3:44   0.00% postgres
75867 pgsql 1  200  7323M  7245M sbwait  2 3:45   0.00% postgres
75869 pgsql 1  200  7323M  7245M sbwait  2 3:46   0.00% postgres
75883 pgsql 1  200  7323M  7245M sbwait  7 3:46   0.00% postgres
76180 pgsql   

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Gerhard Wiesinger

On 13.12.2015 21:14, Bill Moran wrote:

Wait ... this is a combined HTTP/Postgres server? You didn't mention that
earlier, and it's kind of important.

What evidence do you have that Postgres is actually the part of
this system running out of memory?


For me the complete picture doesn't look consistent.


I don't see any such evidence in any of
your emails, and (based on experience) I find it pretty likely that whatever
is running under node is doing something in a horrifically memory-inefficient
manner. Since you mention that you see nothing in the PG logs, that makes it
even more likely (to me) that you're looking entirely in the wrong place.

I'd be willing to bet a steak dinner that if you put the web server on a
different server than the DB, that the memory problems would follow the
web server and not the DB server.


Changes in config:
track_activity_query_size = 102400
work_mem = 100MB

Ok, we restarted PostgreSQL and had it stopped for seconds, and logged 
top every second:


When PostgreSQL was down nearly all memory was freed, looks good to me. 
So it is likely that node and other processes are not the cause.

Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
Swap: 512M Total, 477M Used, 35M Free, 93% Inuse

When PostgreSQL restarted, Inactive was growing fast (~1min):
Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M 
Free

Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

After some few minutes we are back again at the same situation:
Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

The steak dinner is mine :-) Donating to the PostgreSQL community :-)

Any further ideas, I don't think this is normal system behaviour.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Bill Moran
On Sun, 13 Dec 2015 20:09:04 +0100
Gerhard Wiesinger  wrote:

> On 13.12.2015 18:17, Tom Lane wrote:
> > Gerhard Wiesinger  writes:
> >>> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
> >>> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
> >> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?
> > Just judging from the name of the function, I would bet this is a direct
> > result of having only 512M of swap configured.  As Bill already pointed
> > out, that's a pretty useless choice on a system with 32G of RAM.  As soon
> > as the kernel tries to push out any significant amount of idle processes,
> > it's gonna be out of swap space.  The numbers you show above prove that
> > it is almost out of free swap already.
> 
> The system wasn't designed by me, I wouldn't do it either that way. Does 
> swapoff help?

FreeBSD and Linux (and most modern OS) are designed to have swap,
and usually more swap than RAM. I have never heard a good reason for
not using swap, and the reasons I _have_ heard have always been by
people misinformed about how the OS works.

If someone has a _good_ explanation for why you wouldn't want any
swap on a DB server, I'd love to hear it; but everything I've heard
up till now has been speculation based on misinformation.

IOW: no, you should not turn swap off, you should instead allocate
the appropriate amount of swap space.

> > Also, while that 20G of "inactive" pages may be candidates for reuse,
> > they probably can't actually be reused without swapping them out ...
> > and there's noplace for that data to go.
> 
> There is no log in syslog (where postgres log) when 
> swap_pager_getswapspace is logged.
> 
> But why do we have 20G of Inactive pages? They are still allocated by 
> kernel or user space. As you can see below (top output) NON Postgres 
> processes are around 9G in virtual size, resident even lower. The system 
> is nearly idle, and the queries typically aren't active after one second 
> agin. Therefore where does the rest of the 11G of Inactive pages come 
> from (if it isn't a Postgres/FreeBSD memory leak)?
> I read that Postgres has it's own memory allocator:
> https://www.reddit.com/r/programming/comments/18zija/github_got_30_better_performance_using_tcmalloc/
> Might that be an issue with double allocation/freeing and the "cheese 
> hole" topic with memory fragmentation?

If there were a memory leak in either FreeBSD or Postgres of the
seriousness you're describing that were as easy to trigger as you
claim, I would expect the mailing lists and other support forums
to be exploding in panic. Notice that they are not. Also, I still
don't see _ANY_ evidence of a leak. I see evidence that something
is happening that is trying to allocate a LOT of RAM, that isn't
available on your system; but that's not the same as a leak.

> https://www.opennet.ru/base/dev/fbsdvm.txt.html
>  inactivepages not actively used by programs which are
>  dirty and (at some point) need to be written
>  to their backing store (typically disk).
>  These pages are still associated with objects and
>  can be reclaimed if a program references them.
>  Pages can be moved from the active to the inactive
>  queue at any time with little adverse effect.
>  Moving pages to the cache queue has bigger
>  consequences (note 1)

Correct, but, when under pressure, the system _will_ recycle those
pages to be available.

Tom might be correct in that the system thinks they are inactive
because it could easily push them out to swap, but then it can't
_actually_ do that because you haven't allocated enough swap, but
that doesn't match my understanding of how inactive is used. A
question of that detail would be better asked on a FreeBSD forum,
as the differences between different VM implementations can be
pretty specific and technical.

[snip]

> Mem: 8020M Active, 19G Inact, 3537M Wired, 299M Cache, 1679M Buf, 38M Free
> Swap: 512M Total, 501M Used, 12M Free, 97% Inuse
> 
>PID USERNAMETHR PRI NICE   SIZERES STATE   C TIMEWCPU COMMAND
> 77941 pgsql 5  200  7921M  7295M usem7 404:32  10.25% 
> postgres
> 79570 pgsql 1  200  7367M  6968M sbwait  6 4:24   0.59% postgres

[snip about 30 identical PG processes]

> 32387 myusername9  200   980M   375M uwait   5 69:03   1.27% node

[snip similar processes]

>622 myusername1  200   261M  3388K kqread  3 41:01   0.00% nginx

[snip similar processes]

Wait ... this is a combined HTTP/Postgres server? You didn't mention that
earlier, and it's kind of important.

What evidence do you have that Postgres is actually the part of
this system running out of memory? I don't see any such evidence in any of
your emails, and 

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Bill Moran
On Sun, 13 Dec 2015 22:23:19 +0100
Gerhard Wiesinger  wrote:

> On 13.12.2015 21:14, Bill Moran wrote:
> > Wait ... this is a combined HTTP/Postgres server? You didn't mention that
> > earlier, and it's kind of important.
> >
> > What evidence do you have that Postgres is actually the part of
> > this system running out of memory?
> 
> For me the complete picture doesn't look consistent.

That's because you haven't gathered enough of the right type of information.

> > I don't see any such evidence in any of
> > your emails, and (based on experience) I find it pretty likely that whatever
> > is running under node is doing something in a horrifically 
> > memory-inefficient
> > manner. Since you mention that you see nothing in the PG logs, that makes it
> > even more likely (to me) that you're looking entirely in the wrong place.
> >
> > I'd be willing to bet a steak dinner that if you put the web server on a
> > different server than the DB, that the memory problems would follow the
> > web server and not the DB server.
> 
> Changes in config:
> track_activity_query_size = 102400
> work_mem = 100MB
> 
> Ok, we restarted PostgreSQL and had it stopped for seconds, and logged 
> top every second:
> 
> When PostgreSQL was down nearly all memory was freed, looks good to me. 
> So it is likely that node and other processes are not the cause.
> Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
> Swap: 512M Total, 477M Used, 35M Free, 93% Inuse
> 
> When PostgreSQL restarted, Inactive was growing fast (~1min):
> Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M 
> Free
> Swap: 512M Total, 472M Used, 41M Free, 92% Inuse
> 
> After some few minutes we are back again at the same situation:
> Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
> Swap: 512M Total, 472M Used, 41M Free, 92% Inuse
> 
> The steak dinner is mine :-) Donating to the PostgreSQL community :-)

Based on the fact that the inactive memory increased? Your understanding
of inactive memory in FreeBSD is incorrect. Those pages are probably DB
pages that the OS is keeping in inactive memory because Postgres requests
them over and over, which is what the OS is supposed to do to ensure the
best performance. Are you seeing any out of swap space errors? Even if
you are, you still haven't determined if the problem is the result of
Postgres or the node.js stuff you have running. I don't know what node.js
might be caching on the client side ... do you?

No. Until you can actually report back something other than wild
speculation, I'll keep that steak dinner for myself. Besides, that bet
was based on you putting the PG server on seperate hardware from the
web server, which you didn't do.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Memory Leak executing small queries without closing the connection

2015-12-12 Thread Gerhard Wiesinger

Hello,

We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple worker 
processes connected via persistent connections to PostgreSQL, they 
perform just simple queries with SELECT on primary keys and simple 
INSERTS/UPDATES. Normally nearly all the workers are idle but they still 
consume the maximum configured work mem on the PostgreSQL server and the 
memory is also resident. If some other queries get in we get into out of 
memory situations. So it looks like PostgreSQL has memory leaks.


I found a test scenario to reproduce it also on a newer FreeBSD 10.2 VM 
as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):


Executions in psql with one persisent connection:
-- Create the table
CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM 
generate_Series(1,1) s;

-- Create the index
CREATE INDEX ON t_random(s);

-- Restart psql with a new connection:

-- Memory goes slighty up after each execution even after canceling:
-- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on cancel 
the query or multiple execution

SELECT * FROM t_random ORDER BY md5 LIMIT 10;

-- Therefore I created a function:
CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
RETURNS void AS $$
BEGIN
  -- RAISE NOTICE 'num=%', num;
  FOR i IN 1..num LOOP
PERFORM * FROM t_random WHERE s = i;
  END LOOP;
END;
$$  LANGUAGE plpgsql;

-- Test it several times
SELECT execmultiplei(1000);

-- Linux testing (FreeBSD is similar), relevant part is RES (resident 
memory):

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
-- after startup of psql
26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00 
postgres: postgres postgres [local] idle

-- Memory goes up, ok so far
26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90 
postgres: postgres postgres [local] SELECT
-- Function execmultiplei and transaction terminated, but memory still 
allocated!!!
26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 0:41.40 
postgres: postgres postgres [local] idle

-- Calling it again
26851 postgres  20   0 2365732 920668 918748 R  99.0 22.7 0:46.51 
postgres: postgres postgres [local] SELECT

-- idle again, memory still allocated
26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 1:22.54 
postgres: postgres postgres [local] idle


Memory will only be released if psql is exited. According to the 
PostgreSQL design memory should be freed when the transaction completed.


top commands on FreeBSD: top -SaPz -o res -s 1
top commands on Linux: top -o RES d1

Config: VMs with 4GB of RAM, 2 vCPUs
shared_buffers = 2048MB # min 128kB
effective_cache_size = 2GB
work_mem = 892MB
wal_buffers = 8MB
checkpoint_segments = 16

Any ideas?

Thank you.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general