from:"Gerhard Wiesinger"


On 13.12.2015 21:14, Bill Moran wrote:

Wait ... this is a combined HTTP/Postgres server? You didn't mention that
earlier, and it's kind of important.

What evidence do you have that Postgres is actually the part of
this system running out of memory?


For me the complete picture doesn't look consistent.


I don't see any such evidence in any of
your emails, and (based on experience) I find it pretty likely that whatever
is running under node is doing something in a horrifically memory-inefficient
manner. Since you mention that you see nothing in the PG logs, that makes it
even more likely (to me) that you're looking entirely in the wrong place.

I'd be willing to bet a steak dinner that if you put the web server on a
different server than the DB, that the memory problems would follow the
web server and not the DB server.


Changes in config:
track_activity_query_size = 102400
work_mem = 100MB

Ok, we restarted PostgreSQL and had it stopped for seconds, and logged 
top every second:


When PostgreSQL was down nearly all memory was freed, looks good to me. 
So it is likely that node and other processes are not the cause.

Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
Swap: 512M Total, 477M Used, 35M Free, 93% Inuse

When PostgreSQL restarted, Inactive was growing fast (~1min):
Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M 
Free

Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

After some few minutes we are back again at the same situation:
Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

The steak dinner is mine :-) Donating to the PostgreSQL community :-)

Any further ideas, I don't think this is normal system behaviour.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD


On 13.12.2015 18:17, Tom Lane wrote:

Gerhard Wiesinger  writes:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

Just judging from the name of the function, I would bet this is a direct
result of having only 512M of swap configured.  As Bill already pointed
out, that's a pretty useless choice on a system with 32G of RAM.  As soon
as the kernel tries to push out any significant amount of idle processes,
it's gonna be out of swap space.  The numbers you show above prove that
it is almost out of free swap already.


The system wasn't designed by me, I wouldn't do it either that way. Does 
swapoff help?




Also, while that 20G of "inactive" pages may be candidates for reuse,
they probably can't actually be reused without swapping them out ...
and there's noplace for that data to go.


There is no log in syslog (where postgres log) when 
swap_pager_getswapspace is logged.


But why do we have 20G of Inactive pages? They are still allocated by 
kernel or user space. As you can see below (top output) NON Postgres 
processes are around 9G in virtual size, resident even lower. The system 
is nearly idle, and the queries typically aren't active after one second 
agin. Therefore where does the rest of the 11G of Inactive pages come 
from (if it isn't a Postgres/FreeBSD memory leak)?

I read that Postgres has it's own memory allocator:
https://www.reddit.com/r/programming/comments/18zija/github_got_30_better_performance_using_tcmalloc/
Might that be an issue with double allocation/freeing and the "cheese 
hole" topic with memory fragmentation?


https://www.opennet.ru/base/dev/fbsdvm.txt.html
inactivepages not actively used by programs which are
dirty and (at some point) need to be written
to their backing store (typically disk).
These pages are still associated with objects and
can be reclaimed if a program references them.
Pages can be moved from the active to the inactive
queue at any time with little adverse effect.
Moving pages to the cache queue has bigger
consequences (note 1)

https://unix.stackexchange.com/questions/134862/what-do-the-different-memory-counters-in-freebsd-mean
Active: Memory currently being used by a process
Inactive: Memory that has been freed but is still cached since it 
may be used again. If more Free memory is required, this memory can be 
cleared and become free. This memory is not cleared before it is needed, 
because "free memory is wasted memory", it doesn't cost anything to keep 
the old data around in case it is needed again.
Wired: Memory in use by the Kernel. This memory cannot be swapped 
out (GW: including ZFS cache!!!)
Cache: Memory being used to cache data, can be freed immediately if 
required

Buffers: Disk cache
Free: Memory that is completely free and ready to use. Inactive, 
Cache and Buffers can become free if they are cleaned up.


Thnx.

Ciao,
Gerhard


last pid:  7277;  load averages:  0.91,  0.96,  1.02  up 18+06:22:31
18:57:54

135 processes: 2 running, 132 sleeping, 1 waiting

Mem: 8020M Active, 19G Inact, 3537M Wired, 299M Cache, 1679M Buf, 38M Free
Swap: 512M Total, 501M Used, 12M Free, 97% Inuse

  PID USERNAMETHR PRI NICE   SIZERES STATE   C TIMEWCPU COMMAND
77941 pgsql 5  200  7921M  7295M usem7 404:32  10.25% 
postgres

79570 pgsql 1  200  7367M  6968M sbwait  6 4:24   0.59% postgres
73449 pgsql 1  270  7367M  6908M sbwait  4 8:08   5.08% postgres
74209 pgsql 1  270  7367M  6803M sbwait  0 6:37   1.46% postgres
74207 pgsql 1  250  7367M  6748M sbwait  6 6:34   1.76% postgres
74206 pgsql 1  220  7367M  6548M sbwait  5 6:44   1.86% postgres
73380 pgsql 1  230  7367M  7265M sbwait  4 8:15   1.17% postgres
74208 pgsql 1  240  7367M  7223M sbwait  1 6:30   4.59% postgres
79569 pgsql 1  240  7367M  7105M sbwait  3 4:36   1.17% postgres
74210 pgsql 1  290  7363M  7182M sbwait  5 6:41   5.47% postgres
73479 pgsql 1  220  7363M  6560M sbwait  6 7:14   3.56% postgres
83030 pgsql 1  200  7329M   193M sbwait  5 0:00   0.00% postgres
76178 pgsql 1  200  7323M  7245M sbwait  2 3:44   0.00% postgres
75867 pgsql 1  200  7323M  7245M sbwait  2 3:45   0.00% postgres
75869 pgsql 1  200  7323M  7245M sbwait  2 3:46   0.00% postgres
75883 pgsql 1  200  73

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD


Hello Bill,

Thank you for your response, comments inline:

On 13.12.2015 16:05, Bill Moran wrote:

On Sun, 13 Dec 2015 09:57:21 +0100
Gerhard Wiesinger  wrote:

some further details from the original FreeBSD 10.1 machine:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU COMMAND
77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98%
postgres: username dbnamee 127.0.0.1(43367)  (postgres)



I see no evidence of an actual leak here. Each process is basically using
the 7G of shared_buffers you have allocated in the config (which is only
7G _total_ for all processes, since it's shared memory)


OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?


Out of memory:
kernel: swap_pager_getswapspace(4): failed
kernel: swap_pager_getswapspace(8): failed
kernel: swap_pager_getswapspace(3): failed

Main issue is IHMO (as far as I understood the FreeBSD Memory system)
that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
should be available, but they are still allocated but inactive
(INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
of memory situations it is likely that the memory is dirty (otherwise it
would have been reused).

Not quite correct. Inactive memory is _always_ available for re-use.


Are you sure that's true?

Monitoring inactive memory:
cat vm_stat.sh
#!/usr/bin/env bash

while [ 1 ]; do
  date +%Y.%m.%d.%H.%M.%S
  sysctl -a | grep vm.stats.vm.
  sleep 1
done

And even we get out of memory with swap_pager_getswapspace Inactive 
Memory (from the log file) is around 20GB (doesn't go down or up)

vm.stats.vm.v_inactive_count: 5193530 (*4k pages is around 20GB)

Then we have 20GB inactive memory, but we still get out of memory with 
kernel: swap_pager_getswapspace(4): failed. Any ideas why?





Config:
Memory: 32GB, Swap: 512MB

Probably not the cause of this problem, but this is a non-optimal layout.
If you're going to use swap at all, it should generally be 2x the size of
your RAM for optimal performance under memory pressure.


Yes, we will try to disable it totally. Nevertheless why do we get out 
of memory/Swap?





maintenance_work_mem = 512MB
effective_cache_size = 10GB
work_mem = 892MB

I expect that this value is the cause of the problem. The scenario you
describe below is sorting a large table on an unindexed column, meaning
it will have to use all that work_mem. I'd be interested to see the
output of:

EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 10;


That was only a test query, has nothing to do with production based 
query. They are mostly SELECT/INSERTS/UPDATES on primary keys.



But even without that information, I'd recommend you reduce work_mem
to about 16M or so.


Why so low? E.g. sorting on reporting or some long running queries are 
then done on disk and not in memory.



wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7080MB
max_connections = 80
autovacuum_max_workers = 3

[snip]


We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple
worker processes connected via persistent connections to PostgreSQL,
they perform just simple queries with SELECT on primary keys and
simple INSERTS/UPDATES.

That's not at all the test scenario you show below. The scenario below
is a large sort operation on a non-indexed column, which is vastly
different than a single-row fetch based on an index.


Yes, that non indexed select was just for testing purporeses.




Normally nearly all the workers are idle but
they still consume the maximum configured work mem on the PostgreSQL
server and the memory is also resident.

I see no evidence of that in your top output. Each process has a
reference to the 7G of shared_buffers you allocated, which is memory
shared by all processes, and is expected. I'm not as familiar with
Linux top, but the results appear to be the same.


Yes, might be the case, but if it is n times 7G shared memory then we 
have ~20GB Inactive Memory available, so plenty of memory. And why are 
we getting: kernel: swap_pager_getswapspace(4): failed?


Thnx.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD


Hello,

some further details from the original FreeBSD 10.1 machine:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

  PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU COMMAND
77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98% 
postgres: username dbnamee 127.0.0.1(43367)  (postgres)
75862 pgsql 1  200  7321M  7266M select  4   7:04 0.00% 
postgres: checkpointer process(postgres)
75863 pgsql 1  200  7321M  7260M select  4   3:34 0.00% 
postgres: writer process(postgres)
75867 pgsql 1  200  7323M  7237M sbwait  0   3:19 0.00% 
postgres: username dbnamee ipaddress(58977)  (postgres)
76178 pgsql 1  200  7323M  7237M sbwait  2   3:18 0.00% 
postgres: username dbnamee ipaddress(35807)  (postgres)
75868 pgsql 1  200  7323M  7237M sbwait  0   3:20 0.00% 
postgres: username dbnamee ipaddress(58978)  (postgres)
75884 pgsql 1  200  7323M  7236M sbwait  3   3:20 0.00% 
postgres: username dbnamee ipaddress(59908)  (postgres)
75869 pgsql 1  200  7323M  7236M sbwait  3   3:20 0.00% 
postgres: username dbnamee ipaddress(58982)  (postgres)
76181 pgsql 1  200  7323M  7236M sbwait  2   3:18 0.00% 
postgres: username dbnamee ipaddress(35813)  (postgres)
75883 pgsql 1  200  7323M  7236M sbwait  0   3:20 0.00% 
postgres: username dbnamee ipaddress(59907)  (postgres)
76180 pgsql 1  200  7323M  7236M sbwait  1   3:19 0.00% 
postgres: username dbnamee ipaddress(35811)  (postgres)
76177 pgsql 1  200  7323M  7236M sbwait  1   3:18 0.00% 
postgres: username dbnamee ipaddress(35712)  (postgres)
76179 pgsql 1  200  7323M  7236M sbwait  5   3:18 0.00% 
postgres: username dbnamee ipaddress(35810)  (postgres)
64951 pgsql 1  750  7375M   662M CPU11   0:01 11.47% 
postgres: username dbnamee 127.0.0.1(32073)  (postgres)
64950 pgsql 1  770  7325M   598M CPU66   0:02 16.55% 
postgres: username dbnamee 127.0.0.1(44251)  (postgres)


Out of memory:
kernel: swap_pager_getswapspace(4): failed
kernel: swap_pager_getswapspace(8): failed
kernel: swap_pager_getswapspace(3): failed

Main issue is IHMO (as far as I understood the FreeBSD Memory system) 
that 20G are INACTIVE. When I subtract the shared memory, even ~13GB 
should be available, but they are still allocated but inactive 
(INACTIVE). INACTIVE memory might be clean or dirty. As we get into out 
of memory situations it is likely that the memory is dirty (otherwise it 
would have been reused).


Config:
Memory: 32GB, Swap: 512MB

maintenance_work_mem = 512MB
effective_cache_size = 10GB
work_mem = 892MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7080MB
max_connections = 80
autovacuum_max_workers = 3

Thnx.

Ciao,
Gerhard


On 13.12.2015 08:49, Gerhard Wiesinger wrote:

Hello,

We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple 
worker processes connected via persistent connections to PostgreSQL, 
they perform just simple queries with SELECT on primary keys and 
simple INSERTS/UPDATES. Normally nearly all the workers are idle but 
they still consume the maximum configured work mem on the PostgreSQL 
server and the memory is also resident. If some other queries get in 
we get into out of memory situations. So it looks like PostgreSQL has 
memory leaks.


I found a test scenario to reproduce it also on a newer FreeBSD 10.2 
VM as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):


Executions in psql with one persisent connection:
-- Create the table
CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM 
generate_Series(1,1) s;

-- Create the index
CREATE INDEX ON t_random(s);

-- Restart psql with a new connection:

-- Memory goes slighty up after each execution even after canceling:
-- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on 
cancel the query or multiple execution

SELECT * FROM t_random ORDER BY md5 LIMIT 10;

-- Therefore I created a function:
CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
RETURNS void AS $$
BEGIN
  -- RAISE NOTICE 'num=%', num;
  FOR i IN 1..num LOOP
PERFORM * FROM t_random WHERE s = i;
  END LOOP;
END;
$$  LANGUAGE plpgsql;

-- Test it several times
SELECT execmultiplei(1000);

-- Linux testing (FreeBSD is similar), relevant part is RES (resident 
memory):

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
-- after startup of psql
26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00 
postgres: postgres postgres [local] idle

-- Memory goes up, ok so far
26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90 
postgres

[GENERAL] Memory Leak executing small queries without closing the connection

2015-12-12 Thread Gerhard Wiesinger


Hello,

We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple worker 
processes connected via persistent connections to PostgreSQL, they 
perform just simple queries with SELECT on primary keys and simple 
INSERTS/UPDATES. Normally nearly all the workers are idle but they still 
consume the maximum configured work mem on the PostgreSQL server and the 
memory is also resident. If some other queries get in we get into out of 
memory situations. So it looks like PostgreSQL has memory leaks.


I found a test scenario to reproduce it also on a newer FreeBSD 10.2 VM 
as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):


Executions in psql with one persisent connection:
-- Create the table
CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM 
generate_Series(1,1) s;

-- Create the index
CREATE INDEX ON t_random(s);

-- Restart psql with a new connection:

-- Memory goes slighty up after each execution even after canceling:
-- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on cancel 
the query or multiple execution

SELECT * FROM t_random ORDER BY md5 LIMIT 10;

-- Therefore I created a function:
CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
RETURNS void AS $$
BEGIN
  -- RAISE NOTICE 'num=%', num;
  FOR i IN 1..num LOOP
PERFORM * FROM t_random WHERE s = i;
  END LOOP;
END;
$$  LANGUAGE plpgsql;

-- Test it several times
SELECT execmultiplei(1000);

-- Linux testing (FreeBSD is similar), relevant part is RES (resident 
memory):

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
-- after startup of psql
26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00 
postgres: postgres postgres [local] idle

-- Memory goes up, ok so far
26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90 
postgres: postgres postgres [local] SELECT
-- Function execmultiplei and transaction terminated, but memory still 
allocated!!!
26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 0:41.40 
postgres: postgres postgres [local] idle

-- Calling it again
26851 postgres  20   0 2365732 920668 918748 R  99.0 22.7 0:46.51 
postgres: postgres postgres [local] SELECT

-- idle again, memory still allocated
26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 1:22.54 
postgres: postgres postgres [local] idle


Memory will only be released if psql is exited. According to the 
PostgreSQL design memory should be freed when the transaction completed.


top commands on FreeBSD: top -SaPz -o res -s 1
top commands on Linux: top -o RES d1

Config: VMs with 4GB of RAM, 2 vCPUs
shared_buffers = 2048MB # min 128kB
effective_cache_size = 2GB
work_mem = 892MB
wal_buffers = 8MB
checkpoint_segments = 16

Any ideas?

Thank you.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs


On Fri, 23 Mar 2012, Tom Lane wrote:


Merlin Moncure  writes:

On Fri, Mar 23, 2012 at 11:35 AM, Gerhard Wiesinger  wrote:

So I think XID overflow should be planned for one of the next PostgreSQL
releases.



two mitigating factors:
1. read only transactions do not increment xid counter


Yes.  Ask your admin what his throughput of *data modifying*
transactions is.  I'll bet that number is a few orders of magnitude
smaller again.


That were all small writing transactions benchmarking the database (with 
the corrected values discussed).


Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs


On Fri, 23 Mar 2012, John R Pierce wrote:


On 03/23/12 9:17 AM, Gerhard Wiesinger wrote:

So in that time autovacuum is triggered.


autovacuum runs pretty much continuously in the background, its not an on/off 
thing.


Yes, I know. I ment that it runs at least once in 1.5 days.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs


On 23.03.2012 11:16, Jan Kesten wrote:

On 23.03.2012 06:45, Gerhard Wiesinger wrote:


With a database admin of a commercial database system I've discussed
that they have to provide and they also achieve 2^31 transactions per
SECOND!
As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they
would turn around in about one second.

Wow. What application issues that much transactions? And what is the
database system that can handle that? I can't think of a single machine
capable of this - and hardy believe postgresql can came close. 2^31
transactions mean that a single one lasts 0.5ns. Even the fastest
DDR3-2133 has cycle times of 4ns.

I have seen a database monster in action - 43 trillion (academic)
transactions per day, but that's only 5*10^8 transactions per second,
under a quarter of 2^31 per second.

So, I can't answer your question - but you triggered my curiosity :-)


I'm just answering in one of the posts ...

Ok, talked again to the admin and he was wrong with 3 zeros and per 
minute  :-)
So corrected data are: 1 Mio transaction per minute. 1Mio/60s=1 
transactions/s


2^32*60/1E6=257698s to get to the XID overflow which are 2.98 days (~1.5 
days when 2^31 with signed int is the border)


So in that time autovacuum is triggered. Nevertheless we are getting 
into the area where XID overflow is an issue in the near future.


In your example with 5E8 transactions per second overflow will be in 4s 
(2^31) or 8s (2^32) ...


So I think XID overflow should be planned for one of the next PostgreSQL 
releases.


Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs


On 23.03.2012 11:16, Jan Kesten wrote:

On 23.03.2012 06:45, Gerhard Wiesinger wrote:


With a database admin of a commercial database system I've discussed
that they have to provide and they also achieve 2^31 transactions per
SECOND!
As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they
would turn around in about one second.

Wow. What application issues that much transactions? And what is the
database system that can handle that? I can't think of a single machine
capable of this - and hardy believe postgresql can came close. 2^31
transactions mean that a single one lasts 0.5ns. Even the fastest
DDR3-2133 has cycle times of 4ns.

I have seen a database monster in action - 43 trillion (academic)
transactions per day, but that's only 5*10^8 transactions per second,
under a quarter of 2^31 per second.

So, I can't answer your question - but you triggered my curiosity :-)


I'm just answering in one of the posts ...

Ok, talked again to the admin and he was wrong with 3 zeros and per 
minute  :-)
So corrected data are: 1 Mio transaction per minute. 1Mio/60s=1 
transactions/s


2^32*60/1E6=257698s to get to the XID overflow which are 2.98 days (~1.5 
days when 2^31 with signed int is the border)


So in that time autovacuum is triggered. Nevertheless we are getting 
into the area where XID overflow is an issue in the near future.


In your example with 5E8 transactions per second overflow will be in 4s 
(2^31) or 8s (2^32) ...


So I think XID overflow should be planned for one of the next PostgreSQL 
releases.


Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs

2012-03-22 Thread Gerhard Wiesinger


Hello,

With a database admin of a commercial database system I've discussed 
that they have to provide and they also achieve 2^31 transactions per 
SECOND!
As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they 
would turn around in about one second.


How can one achieve this with PostgreSQL?
What is the status and plan of 64 Bits XIDs?

I saw that 64 Bit XIDs were already discussed in 2005 but never found 
into the code:

http://postgresql.1045698.n5.nabble.com/A-couple-of-patches-for-PostgreSQL-64bit-support-td2214264.html

Thnx.

Ciao,
Gerhard


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Double iteration in the buffer cache code

2010-09-01 Thread Gerhard Wiesinger


Hello,

Did someone fix the double iteration in the buffer cache code as discussed 
at in the meantime:

http://www.mail-archive.com/pgsql-general@postgresql.org/msg137230.html

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] OT: Re: [sfpug] 10/13 SFPUG meeting, "The Mighty GUCS," video now available

2009-10-20 Thread Gerhard Wiesinger


On Tue, 20 Oct 2009, Christophe Pettus wrote:



On Oct 20, 2009, at 8:32 AM, Viktor Rosenfeld wrote:

@Christophe, I enjoyed your talk very much, particularly because I
learned about pgfouine, which from the looks of it, will make my current
project vastly simpler.  So, thank you.


You should really thank Josh Berkus; he's the one who gave the talk.  I just 
run the camera. :)


Josh has the slides and sample files, and I'm sure he'll post them shortly. 
I know he's been busy with a client emergency and the PostgreSQL conference 
this weekend.




They are already available from:
http://www.pgexperts.com/presentations.html
http://www.pgexperts.com/document.html?id=34
http://www.pgexperts.com/document.html?id=36

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Function returning 2 columns evaluated twice when both columns are needed

2009-10-19 Thread Gerhard Wiesinger


On Mon, 19 Oct 2009, Tom Lane wrote:


Gerhard Wiesinger  writes:

On Sun, 18 Oct 2009, Tom Lane wrote:

The OFFSET bit is a kluge, but is needed to keep the planner from
flattening the subquery and undoing your work.



Thnx Tom. It also works without the OFFSET kludge. Any ideas why?


Probably because you have the function declared VOLATILE.



None of the function is declared VOLATILE. Any other idea?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Function returning 2 columns evaluated twice when both columns are needed

2009-10-18 Thread Gerhard Wiesinger


On Sun, 18 Oct 2009, Tom Lane wrote:


Gerhard Wiesinger  writes:

Since getSums() is a cursor and is complex and takes long time getSums
should only be evaluated once. Is there a better solution available to
get both columns from the function in the select?


You need a sub-select, along the lines of

SELECT
 cur_date,
 (gs).sum_m1,
 (gs).sum_m2
 FROM
 (
  SELECT
cur_date,
getSums(start_ts, stop_ts) AS gs
  FROM
getDatesTimestamps($1, $2)
  OFFSET 0
 ) AS ss
  ;

The OFFSET bit is a kluge, but is needed to keep the planner from
flattening the subquery and undoing your work.



Thnx Tom. It also works without the OFFSET kludge. Any ideas why?
Looks also reasonable to me because there is no SETOF returned.

BTW: Why is the function in the original statement evaluated twice? On 
"SELECT table.*" I guess query is also executed once and not n times (for 
each column).


PG is version 8.3.8.

Thnx.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Function returning 2 columns evaluated twice when both columns are needed

2009-10-18 Thread Gerhard Wiesinger


Hello,

I'm having a problem with the following:
CREATE TYPE Sums AS (sum_m1 double precision, sum_m2 double precision);
CREATE TYPE date_m1_m2 AS (cur_date date, sum_m1 double precision, sum_m2 
double precision);


CREATE OR REPLACE FUNCTION getSums(IN start_ts timestamp with time 
zone, IN stop_ts timestamp with time zone) RETURNS Sums AS $$

...

CREATE OR REPLACE FUNCTION getsumInterval(date, date) RETURNS SETOF date_m1_m2 
AS $$
  SELECT
cur_date,
(getSums(start_ts, stop_ts)).* -- No optimal since function is evaluated 2 
times => 24s
-- getSums(start_ts, stop_ts) -- in one column and not usable as I need 2 
columns, but takes only 12s
  FROM
getDatesTimestamps($1, $2)
  ;
$$ LANGUAGE SQL;

Since getSums() is a cursor and is complex and takes long time getSums 
should only be evaluated once. Is there a better solution available to 
get both columns from the function in the select?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 10/13 SFPUG meeting, "The Mighty GUCS," video now available

2009-10-17 Thread Gerhard Wiesinger


On Fri, 16 Oct 2009, Christophe Pettus wrote:

The video archive for the 10/13 SFPUG meeting, "The Mighty GUCS: A guide to 
the essential PostgreSQL settings you need to know," is now available:


http://thebuild.com/blog/2009/10/16/the-mighty-gucs/

It's also available on Vimeo:

http://vimeo.com/7109722


Hello,

Can you also upload the sample config files and the presentation.

Thnx.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 10/13 SFPUG meeting, "The Mighty GUCS," video now available

2009-10-17 Thread Gerhard Wiesinger


On Fri, 16 Oct 2009, Christophe Pettus wrote:

The video archive for the 10/13 SFPUG meeting, "The Mighty GUCS: A guide to 
the essential PostgreSQL settings you need to know," is now available:


http://thebuild.com/blog/2009/10/16/the-mighty-gucs/

It's also available on Vimeo:

http://vimeo.com/7109722


Hello,

Can you also upload the sample config files and the presentation.

Thnx.

Ciao,
Gerhard


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-09 Thread Gerhard Wiesinger


On Fri, 9 Oct 2009, Greg Smith wrote:


On Sat, 3 Oct 2009, Gerhard Wiesinger wrote:


I wouldn't read 128k blocks all the time. I would do the following:
When e.g. B0, B127, B256 should be read I would read in 8k random block 
I/O.


When B1, B2, B3, B4, B5, B7, B8, B9, B10 are needed I would make 2 requests 
with the largest possible blocksize:

1.) B1-B5: 5*8k=40k
2.) B7-B10: 4*8k=32k


I see what you mean now.  This is impossible in the current buffer manager 
implementation because blocks are requested one at a time, and there are few 
situations where you can predict which are going to be wanted next. The hash 
index and sequential scan are two that were possible to predict in that way.


The fadvise patches already committed didn't change the way blocks were read 
in, they just used knowledge about what was coming next to advise the OS. 
That's quite a bit different from actually asking for things in larger chunks 
and only using what you need.


Implementing larger chunking reads or similar asynchronous batch I/O is a big 
project, because you'd have to rearchitect the whole way buffers are managed 
in the database to do it right.  Greg Stark's earliest proof of concept 
prototype for async I/O included a Solaris implementation that used the AIO 
library.  It wasn't feasible to actually use that underlying implemention in 
the database in the end though, because the model AIO uses expects you'll 
fire off a bunch of I/O and then retrieve blocks as they come in.  That's 
really not easy to align with the model for how blocks are read into 
shared_buffers right now.  He had some ideas for that and I've thought 
briefly about the problem, but it would be a major overhaul to some scary to 
touch database internals to pull off.


Given that the OS and/or RAID implementations tend to do what we want in a 
lot of these cases, where smarter/chunkier read-ahead is what we you need, 
the payback on accelerating those cases hasn't been perceived as that great. 
There is a major win for the hash index reads, which Solaris systems can't 
take advantage of, so somebody who uses those heavily on that OS might be 
motivated enough produce improvements for that use case. Once the buffer 
cache at large understood how to handle batching async reads, Solaris AIO 
would be possible, fancier stuff with Linux AIO would be possible, and the 
type of chunking reads you're suggesting would be too.  But none of that is 
happening without some major rearchitecting first.


Unfortunately there aren't that many people with the right knowledge and 
motivation to start tinkering around with the buffer cache internals to the 
extent that would be required to do better here, and pretty much of them I'm 
aware of are hacking on projects with a much clearer payback instead.




I've one idea, which is not ideal, but may work and shouldn't be much 
effort to implement:
As in the example above we read B1-B5 and B7-B10 on a higher level outside 
of normal buffer management with large request sizes (e.g. where hash 
index scans and sequential scans are done). As the blocks are now in cache 
normal buffer management is very fast:

1.) B1-B5: 5*8k=40k
2.) B7-B10: 4*8k=32k

So we are reading for 1.):
B1-B5 in one 40k block (typically from disk), afterwards we read B1, B2, 
B3, B4, B5 in 8k chunks from cache again.


The disadvantage is of course that we have more read requests more maybe 
performance is even better because the normal buffer requests are from 
cache. A second disadvantage is the "bad" design.


But I think performance will be even better. And a configuration option to 
enable this might also be interesting.


Maybe I will try that with pgiosim whether performance is better or not.

What do you think about it?
Is the idea clear?

Thnx.

Ciao
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-05 Thread Gerhard Wiesinger


On Mon, 5 Oct 2009, Greg Smith wrote:


On Sun, 4 Oct 2009, Gerhard Wiesinger wrote:


On Fri, 2 Oct 2009, Scott Marlowe wrote:


I found that lowering checkpoint completion target was what helped.
Does that seem counter-intuitive to you?




I set it to 0.0 now.


If you set that to 0.0, the whole checkpoing spreading logic doesn't apply 
like it's supposed to.  I'm not sure what the results you posted mean now. If 
you had it set to 0 and saw a bad spike (which is how I read your message), 
I'd say "yes, that's what happens when you do reduce that parameter, so don't 
do that".  If you meant something else please clarify.


I think the problem is, that it is done on checkpoint time (whether spread 
or not). I should have been already be done by bgwriter.




Thanks for the dtrace example, I suggested we add those checkpoint probes in 
there and someone did, but I hadn't seen anybody use them for anything yet.




I think more probes (e.g. on different writing conditions like writing 
from bgwriter or on a checkpoint) would be interesting here.



Bug1: usage_count is IHMO not consistent


It's a bit hack-ish, but the changes made to support multiple buffer use 
strategies introduced by the "Make large sequential scans and VACUUMs work in 
a limited-size ring" commit are reasonable even if they're not as consistent 
as we'd like.  Those changes were supported by benchmarks proving their 
utility, which always trump theoretical "that shouldn't work better!" claims 
when profiling performance.


Also, they make sense to me, but I've spent a lot of time staring at 
pg_buffercache output to get a feel for what shows up in there under various 
circumstances.  That's where I'd suggest you go if this doesn't seem right to 
you; run some real database tests and use pg_buffercache to see what's inside 
the cache when you're done.  What's in there and what I expected to be in 
there weren't always the same thing, and it's interesting to note how that 
changes as shared_buffers increases.  I consider some time studying that a 
pre-requisite to analyzing performance of this code.




I have analyzed pg_buffercache (query every second, see below) in parallel 
to see what happens but I didn't see expected results in some ways with 
the usage_counts. Therefore I analyzed the code and found IHMO the problem 
with the usage_count and buffer reallocation. Since the code change is 
also new (I think it way 05/2009) it might be that you tested before ...


BTW: Is it possible to get everything in pg_class over all databases as 
admin?



Bug2: Double iteration of buffers
As you can seen in the calling tree below there is double iteration with 
buffers involved. This might be a major performance bottleneck.


Hmmm, this might be a real bug causing scans through the buffer cache to go 
twice as fast as intended.


That's not twice O(2*n)=O(n) that's a factor n*n (outer and inner loop 
iteration) which means overall is O(n^2) which is IHMO too much.


 Since the part you suggest is doubled isn't very 
intensive or called all that often, there's no way it can be a major issue 
though.


It is a major issue since it is O(n^2) and not O(n). E.g. with 2GB share 
buffer we have 262144 blocks and 68719476736 calls which is far too much.


 That's based on knowing what the code does and how much it was 
called, as well as some confidence that if it were really a *major* problem, 
it would have shown up on the extensive benchmarks done on all the code paths 
you're investigating.




The problem might be hidden for the following reasons:
1.) Buffers values are too low that even n^2 is low for today's machines
2.) Code is not often called in that way
3.) backend writes out pages so that the code is never executed
4.) ...

BTW: Are there some tests available how fast a buffer cache hit is and a 
disk cache hit is (not in the buffer cache but in the disk cache)? I'll 
asked, because a lot of locking is involved in the code.


I did some once but didn't find anything particularly interesting about the 
results.  Since you seem to be on a research tear here, it would be helpful 
to have a script to test that out available, I wasn't able to release mine 
and something dtrace based would probably be better than the approach I used 
(I threw a bunch of gettimeofdata calls into the logs and post-processed them 
with a script).




Do you have an where one should set tracepoints inside and outside 
PostgreSQL?


BTW2: Oracle buffercache and background writer strategy is also 
interesting.


As a rule, we don't post links to other database implementation details here, 
as those can have patented design details we'd prefer not to intentionally 
re-implement.  Much of Oracle's design here doesn't apply here anyway, as it 
was done in the

Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-04 Thread Gerhard Wiesinger


On Fri, 2 Oct 2009, Greg Smith wrote:


On Fri, 2 Oct 2009, Scott Marlowe wrote:


I found that lowering checkpoint completion target was what helped.
Does that seem counter-intuitive to you?




I set it to 0.0 now.

Generally, but there are plenty of ways you can get into a state where a 
short but not immediate checkpoint is better.  For example, consider a case 
where your buffer cache is filled with really random stuff.  There's a 
sorting horizon in effect, where your OS and/or controller makes decisions 
about what order to write things based on the data it already has around, not 
really knowing what's coming in the near future.




Ok, if checkpoint doesn't block anything on normal operation time doesn't 
really matter.


Let's say you've got 256MB of cache in the disk controller, you have 1GB of 
buffer cache to write out, and there's 8GB of RAM in the server so it can 
cache the whole write.  If you wrote it out in a big burst, the OS would 
elevator sort things and feed them to the controller in disk order. Very 
efficient, one pass over the disk to write everything out.


But if you broke that up into 256MB write pieces instead on the database 
side, pausing after each chunk was written, the OS would only be sorting 
across 256MB at a time, and would basically fill the controller cache up with 
that before it saw the larger picture.  The disk controller can end up making 
seek decisions with that small of a planning window now that are not really 
optimal, making more passes over the disk to write the same data out.  If the 
timing between the DB write cache and the OS is pathologically out of sync 
here, the result can end up being slower than had you just written out in 
bigger chunks instead.  This is one reason I'd like to see fsync calls happen 
earlier and more evenly than they do now, to reduce these edge cases.


The usual approach I take in this situation is to reduce the amount of write 
caching the OS does, so at least things get more predictable.  A giant write 
cache always gives the best average performance, but the worst-case behavior 
increases at the same time.


There was a patch floating around at one point that sorted all the checkpoint 
writes by block order, which would reduce how likely it is you'll end up in 
one of these odd cases.  That turned out to be hard to nail down the benefit 
of though, because in a typical case the OS caching here trumps any I/O 
scheduling you try to do in user land, and it's hard to repeatibly generate 
scattered data in a benchmark situation.




Ok, on a basic insert test and a systemtap script 
(http://www.wiesinger.com/opensource/systemtap/postgresql-checkpoint.stp) 
checkpoint is still a major I/O spike.



Buffers between : Sun Oct  4 18:29:50 2009, synced 55855 buffer(s), flushed 744 
buffer(s) between checkpoint
Checkpoint start: Sun Oct  4 18:29:50 2009
Checkpoint end  : Sun Oct  4 18:29:56 2009, synced 12031 buffer(s), flushed 
12031 buffer(s)

Buffers between : Sun Oct  4 18:30:20 2009, synced 79000 buffer(s), flushed 0 
buffer(s) between checkpoint
Checkpoint start: Sun Oct  4 18:30:20 2009
Checkpoint end  : Sun Oct  4 18:30:26 2009, synced 10753 buffer(s), flushed 
10753 buffer(s)

Buffers between : Sun Oct  4 18:30:50 2009, synced 51120 buffer(s), flushed 
1007 buffer(s) between checkpoint
Checkpoint start: Sun Oct  4 18:30:50 2009
Checkpoint end  : Sun Oct  4 18:30:56 2009, synced 11899 buffer(s), flushed 
11912 buffer(s)


Ok, I further had a look at the code to understand the behavior of the 
buffercache and the background writer since that wasn't logically.


So as far as I saw the basic algorithm is:
1.) Normally (non checkpoints) only dirty and non recently used pages 
(usage_count == 0) are flushed to disk. I think that's basically fine as a 
strategy as indexes might update blocks more than once. It's also ok that 
blocks are written and not flushed (well be done on checkpoint time).
2.) At checkpoints write out all dirty buffers and flush all previously 
written and newly written. Also spreading I/O seems also ok to me now.


BUT: I think I've found 2 major bugs in the implementation (or I didn't 
understand something correctly). Codebase analyzed is 8.3.8 since I 
currently use it.


##
Bug1: usage_count is IHMO not consistent
##
I think this has been introduced with:
http://git.postgresql.org/gitweb?p=postgresql.git;a=blobdiff;f=src/backend/storage/buffer/bufmgr.c;h=6e6b862273afea40241e410e18fd5d740c2b1643;hp=97f7822077de683989a064cdc624a025f85e54ab;hb=ebf3d5b66360823edbdf5ac4f9a119506fccd4c0;hpb=98ffa4e9bd75c81243

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-03 Thread Gerhard Wiesinger


On Fri, 2 Oct 2009, Simon Riggs wrote:



On Sun, 2009-09-27 at 18:05 +0200, Gerhard Wiesinger wrote:


So I saw, that even on sequential reads (and also on bitmap heap scan acces)
PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck.

A commercial software database vendor solved the problem by reading multiple
continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 5
seconds on an equivalent "sequence scan":


Is systemtap counting actual I/Os or just requests to access 8192 blocks
once in OS cache? Postgres doesn't read more than one block at a time
into its buffer pool, so those numbers of requests look about right.



As far as I know these are VFS reads. So some reads might be from cache 
but since I did all requests should be from disk:

echo 3 > /proc/sys/vm/drop_caches;service postgresql restart
do benchmark

Same for all benchmarks because I don't want to measure cache performance 
of OS or of the DB to benchmark.


Therefore all requests (except reaing twice or more but that shouldn't be 
the case and would also be fine as cache hit) are from disk and not from 
the cache.



There is belief here that multi-block I/O was introduced prior to OS
doing this as a standard mechanism. Linux expands its read ahead window
in response to sequential scans and so this seems like something we
don't want to do in the database.


I played even with large values on block device readaheads of /dev/md*, 
/dev/sd* and /dev/dm-* as well as stripe_cache_size of /dev/md* but 
without any performance improvements in the benmark scenarios.


=> All readaheads/caches don't seem to work in at least in the HEAP 
BITMAP SCAN scenarios on nearly latest Linux kernels.


But I think such block issues (reading in largest blocks as possible) have 
to be optimized on application level (in our case DB level) because

1.) We can't assume that OS and even storage works well in such scenarios
2.) We can't assume that OS/storage is intelligent enough to reduce number 
of IOPS when 2 random blocks are at random 2 sequential blocks and that 
therefore the number of IOPS is reduced.
3.) I think such a logic should be very easy to integrate and even has 
been done with some patches.




It's possible this is wrong. Is the table being scanned fairly sizable
and was it allocated contiguously? i.e. was it a large table loaded via
COPY?

I also wonder if more L2 cache effects exist.



What do you mean with "table being scanned fairly sizable"? I don't get 
it.


Table was filled with a lot of inserts but at one time point.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-03 Thread Gerhard Wiesinger


On Fri, 2 Oct 2009, Greg Smith wrote:


On Fri, 2 Oct 2009, Gerhard Wiesinger wrote:

Larger blocksizes also reduce IOPS (I/Os per second) which might be a 
critial threshold on storage systems (e.g. Fibre Channel systems).


True to some extent, but don't forget that IOPS is always relative to a block 
size in the first place.  If you're getting 200 IOPS with 8K blocks, 
increasing your block size to 128K will not result in your getting 200 IOPS 
at that larger size; the IOPS number at the larger block size is going to 
drop too.  And you'll pay the penalty for that IOPS number dropping every 
time you're accessing something that would have only been an 8K bit of I/O 
before.




Yes, there will be some (very small) drop in IOPS, when blocksize is 
higher but today disks have a lot of throughput when IOPS*128k are 
compared to e.g. 100MB/s. I've done some Excel calculations which support 
this.


The trade-off is very application dependent.  The position you're advocating, 
preferring larger blocks, only makes sense if your workload consists mainly 
of larger scans.  Someone who is pulling scattered records from throughout a 
larger table will suffer with that same change, because they'll be reading a 
minimum of 128K even if all they really needed with a few bytes.  That 
penalty ripples all the way from the disk I/O upwards through the buffer 
cache.




I wouldn't read 128k blocks all the time. I would do the following:
When e.g. B0, B127, B256 should be read I would read in 8k random block 
I/O.


When B1, B2, B3, B4, B5, B7, B8, B9, B10 are needed I would make 2 
requests with the largest possible blocksize:

1.) B1-B5: 5*8k=40k
2.) B7-B10: 4*8k=32k

In this case when B5 and B7 are only one block away we could also discuss 
whether we should read B1-B10=10*8k=80k in one read request and don't use 
B6.


That would reduce the IOPS of a factor of 4-5 in that scenario and 
therefore throughput would go up.


It's easy to generate a synthetic benchmark workload that models some 
real-world applications and see performance plunge with a larger block size. 
There certainly are others where a larger block would work better. Testing 
either way is complicated by the way RAID devices usually have their own 
stripe sizes to consider on top of the database block size.




Yes, there are block device read ahead buffers and also RAID stripe 
caches. But both don't seem to work well with the tested HEAP BITMAP SCAN 
scenario and also in practical PostgreSQL performance measurement 
scenarios.


But the modelled pgiosim isn't a synthetic benchmark it is the same as a 
real work HEAP BITMAP SCAN scenario in PostgreSQL where some blocks are 
read directly consecutive at least logically in the filesystem (and with 
some propability also physically on disk) but currently only with each 8k 
block read even when 2 or more blocks could be read with one request.


BTW: I would also limit the blocksize to some upper limit on such requests 
(e.g. 1MB).


Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Gerhard Wiesinger


On Fri, 2 Oct 2009, Greg Smith wrote:


On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

Lowering bgwriter_delay is possible, but I think overhead is too much and 
still there is a limit of 800MB/s involved:


Stuff written by the background writer turns into largely random I/O. 800MB/s 
of random writes is so large of a number it's only recently become remotely 
possible; a RAID0 of SSD devices might manage it.  No hardware available 
until very recently had any hope of getting that sort of performance.


In any case, I would wager you'll run into one of many other bottlenecks in 
PostgreSQL and/or currently available system/disk hardware long before the 
background writer limit gets important.




Of course, 800MB/s are a theoretical max. limit I could thought of. But 
with SSDs this might be possible.


So in fact I think bgwriter_lru_maxpages should be limited to 10 if 
limited at all.


The current limit is based on the assumption that people will set it to 
values way too high if allowed, to the point where it's counterproductive. 
That's exactly what people used to do with early background writer designs. 
I think you're wandering down the same road, where what it actually does and 
what you think it does are not the same thing at all. Much of the important 
disk I/O coming out of the database should be related to checkpoints, not the 
background writer, and there is no limit on that I/O.




In my experience flushing I/O as soon as possible is the best solution. 
Think of the following scenario: You currently limit bgwriter at 4MB/s but 
you would have about 10MB/s random I/O capacity (a normal low cost 
system). So utilitzzation would be only 40% and you could write even more. 
At checkpoint time you would get a spike which the I/O system couldn't 
handle at all and performance goes down to nearly zero because of the I/O 
spike (e.g. 500% of available I/O needed). IHMO such scenarios should be 
avoided.


If you think you've got a situation where the current limits are not 
sufficient, the path to argue that would start with showing what you're 
seeing in pg_stat_bgwriter.  I can imagine some results from there on a 
system with a very high rate of I/O available that would suggest the current 
limits are too small.  I've never come close to actually seeing such results 
in the real world though, and if you're not already monitoring those numbers 
on a real system I'd suggest you start there rather than presuming there's a 
design limitation here.




On an nearly idle database with sometimes some performance tests:

SELECT
  buffers_checkpoint/buffers_clean AS checkpoint_spike,
  ROUND(100.0*buffers_checkpoint/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS checkpoint_percentage,
  ROUND(100.0*buffers_clean/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS pg_writer_percentage,
  ROUND(100.0*buffers_backend/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS backend_percentage

FROM
  pg_stat_bgwriter
;

 checkpoint_spike | checkpoint_percentage | pg_writer_percentage | 
backend_percentage
--+---+--+
   31 | 90.58 | 2.92 |  
 6.50

So flushing happens typically at checkpoint time. In 6.5%of all blocks 
were put by the backend on disk which says IHMO: pgwriter is to slow, 
backend has to do the work now.


So I'd like to do some tests with new statistics. Any fast way to reset 
statistics for all databases for pg_stat_pgwriter?


Thnx.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Gerhard Wiesinger


On Fri, 2 Oct 2009, Greg Smith wrote:


On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

I think this is one of the most critical performance showstopper of 
PostgreSQL on the I/O side.


I wish, this is an easy problem compared to the real important ones that need 
to be resolved.  Situations where the OS is capable of faster sequential I/O 
performance than PostgreSQL appears to deliver doing reads are often caused 
by something other than what the person doing said benchmarking believes they 
are.  For example, the last time I thought I had a smoking gun situation just 
like the one you're describing, it turns out the background operation I 
didn't know was going on that slowed things down were hint bit updates: 
http://wiki.postgresql.org/wiki/Hint_Bits


Background checkpoints can also cause this, typically if you set 
checkpoint_segments really high and watch when they're happening you can 
avoid that interfering with results too.


It's hard to isolate out the cause of issues like this.  Since most people 
seem to get something close to real disk speed from sequential scans when 
measured properly, I would suggest starting with the assumption there's 
something wrong with your test case rather than PostgreSQL.  The best way to 
do that is to construct a test case others can run that shows the same 
problem on other systems using the database itself.  The easiest way to build 
one of those is using generate_series to create some bogus test data, SELECT 
everything in there with \timing on, and then use the size of the relation on 
disk to estimate MB/s.


Regardless, it's easy enough to build PostgreSQL with larger block sizes if 
you think that really matters for your situation.  You're never going to see 
that in the mainstream version though, because there are plenty of downsides 
to using larger blocks.  And since the database doesn't actually know where 
on disk things are at, it's not really in a good position to make decisions 
about I/O scheduling anyway.  More on that below.


What's the current status of the patch of Gregory Stark? Any timeframes to 
integrate?


There needs to be a fairly major rearchitecting of how PostgreSQL handles 
incoming disk I/O for that to go anywhere else, and I don't believe that's 
expected to be ready in the near future.


Does it also work for sequence scans? Any plans for a generic "multi block 
read count" solution?


There was a similar patch for sequential scans submitted by someone else 
based on that work.  It was claimed to help performance on a Linux system 
with a rather poor disk I/O setup.  No one else was able to replicate any 
performance improvement using the patch though.  As far as I've been able to 
tell, the read-ahead logic being done by the Linux kernel and in some 
hardware is already doing this sort of optimization for you on that OS, 
whether or not your app knows enough to recognize it's sequentially scanning 
the disk it's working against.


I forgot to mention:
Larger blocksizes also reduce IOPS (I/Os per second) which might be a 
critial threshold on storage systems (e.g. Fibre Channel systems). You 
would get e.g. the throughput from the storage with large block sizes 
(less IOPS) but with small block sizes the IOPS limit is reached and 
throughput performance goes down.


Example:
With 100MB/s and 8k blocks you need 12500 IOPS which is a lot (e.g. at 
least 90 disks with 140 IOPS)!
When blocks can be read with e.g. 128k block size 781 IOPS are sufficient 
(6 disks are sufficient)!


So this makes a major difference.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:

On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

I think this is one of the most critical performance showstopper of
PostgreSQL on the I/O side.

I wish, this is an easy problem compared to the real important ones that need
to be resolved. Situations where the OS is capable of faster sequential I/O
performance than PostgreSQL appears to deliver doing reads are often caused
by something other than what the person doing said benchmarking believes they
are. For example, the last time I thought I had a smoking gun situation just
like the one you're describing, it turns out the background operation I
didn't know was going on that slowed things down were hint bit updates:
http://wiki.postgresql.org/wiki/Hint_Bits

Background checkpoints can also cause this, typically if you set
checkpoint_segments really high and watch when they're happening you can
avoid that interfering with results too.

It's hard to isolate out the cause of issues like this. Since most people
seem to get something close to real disk speed from sequential scans when
measured properly, I would suggest starting with the assumption there's
something wrong with your test case rather than PostgreSQL. The best way to
do that is to construct a test case others can run that shows the same
problem on other systems using the database itself. The easiest way to build
one of those is using generate_series to create some bogus test data, SELECT
everything in there with \timing on, and then use the size of the relation on
disk to estimate MB/s.

Regardless, it's easy enough to build PostgreSQL with larger block sizes if
you think that really matters for your situation. You're never going to see
that in the mainstream version though, because there are plenty of downsides
to using larger blocks. And since the database doesn't actually know where
on disk things are at, it's not really in a good position to make decisions
about I/O scheduling anyway. More on that below.

What's the current status of the patch of Gregory Stark? Any timeframes to
integrate?

There needs to be a fairly major rearchitecting of how PostgreSQL handles
incoming disk I/O for that to go anywhere else, and I don't believe that's
expected to be ready in the near future.

Does it also work for sequence scans? Any plans for a generic "multi block
read count" solution?

There was a similar patch for sequential scans submitted by someone else
based on that work. It was claimed to help performance on a Linux system
with a rather poor disk I/O setup. No one else was able to replicate any
performance improvement using the patch though. As far as I've been able to
tell, the read-ahead logic being done by the Linux kernel and in some
hardware is already doing this sort of optimization for you on that OS,
whether or not your app knows enough to recognize it's sequentially scanning
the disk it's working against.

I've enhanced the pgiosim project http://pgfoundry.org/projects/pgiosim/
with a patch for larger blocksizes independent from PostgreSQL: http://www.wiesinger.com/opensource/pgiosim/pgiosim-0.2-blocksizes.diff

You'll find some detailed results below and can verify this on your
platforms with the patch above. Maybe someone can verify this on different
HW/SW plattforms. If you have any questions regarding the pgiosim and the
patch just feel free to ask.

Summary:
RANDOM I/O of blocksizes of e.g. 128k (e.g. BITMAP HEAP SCAN) has better
performance than reading the same blocks with 8k block sizes (factor 1.5).

Conclusio:
In the test scenario the proposed solution would have a performance gain
of a factor of 1.5 for typical BITMAP HEAP SCANS. For other scenarios no
performance gain with larger block sizes of continuous blocks could be
measured. Therefore I'm assuming that prefetching works well on Linux with
sequential I/O but not with random I/O.

I hope I can convince someone that such optimizations make sense as
commercial database venders have implemented such features for performance
reasons.

BTW: Prefetch is enabled on the raid and blockdevices.

Ciao,
Gerhard

--
http://www.wiesinger.com/

# RANDOM I/O 8k blocksize
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 test.txt
Arg: 1
Added test.txt
blocksize=8192, reading block as a whole
Elapsed: 135.92
Read 1 blocks Wrote 0 blocks
73.57 op/sec, 588.60kB/sec

# RANDOM I/O 8k blocksize (for verification only), in fact same test as below
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -r test.txt
Arg: 1
Added test.txt
blocksize=8192, doing single read requests with chunk size of 8192 bytes
Elapsed: 136.30
Read 1 blocks Wrote 0 blocks
73.37 op/sec, 586.94kB/sec

# RANDOM I/O 128k blocksize, read as one 128k block
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -o 131072 test.txt
Arg: 1
Added test

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans



On Sun, 27 Sep 2009, Sam Mason wrote:


On Sun, Sep 27, 2009 at 06:05:51PM +0200, Gerhard Wiesinger wrote:

A google research has shown that Gregory Stark already worked on that issue
(see references below) but as far as I saw only on bitmap heap scans.


Greg Stark's patches are about giving the IO subsystem enough
information about where the random accesses will be ending up next.
This is important, but almost completely independent from the case
where you know you're doing sequential IO, which is what you seem to be
talking about.



I'm talking about 2 cases
1.) Sequential scans
2.) Bitmap index scans
which both hopefully end physically in blocks which are after each other 
and were larger block sizes can benefit.



I think this is one of the most critical performance showstopper of
PostgreSQL on the I/O side.


PG's been able to handle data as fast as it can come back from the disk
in my tests.  When you start doing calculations then it will obviously
slow down, but what you were talking about wouldn't help here either.

Then again, I don't have a particularly amazing IO subsystem.  What
sort of performance do your disks give you and at what rate is PG doing
sequential scans for you?



Hello Sam,

Detailed benchmarks are below, the original one from PostgreSQL have 
already been posted. So i would expect at least about 60-80MB in reading 
for PostgreSQL (when larger block sizes are read)in practical issues on 
sequence scans but they are at about 30MB/s. See also pgiosim below.


Setup is:
Disk Setup: SW RAID 5 with 3x1TB SATA 7200 RPM disks
Linux Kernel: 2.6.30.5-43.fc11.x86_64
CPU: Quad Core: AMD Phenom(tm) II X4 940 Processor, 3GHz
RAM: 8GB

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


http://pgfoundry.org/projects/pgiosim/

#
# Performance benchmarks:
#
dd if=/dev/zero of=test.txt bs=8192 count=1310720 conv=fdatasync
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 169.482 s, 63.4 MB/s

dd if=test.txt of=/dev/null bs=8192
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 86.4457 s, 124 MB/s

#
# RANDOM
#

# Random 8k block reads
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 test.txt
Arg: 1
Added test.txt
Elapsed: 148.22
Read 1 blocks Wrote 0 blocks
67.47 op/sec, 539.75kB/sec

# Random 8k block reads & writes
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -w 100 test.txt
Write Mode: 100%
Arg: 1
Added test.txt
Elapsed: 201.44
Read 1 blocks Wrote 1 blocks
49.64 op/sec, 397.14kB/sec

# Random 8k block reads & writes, sync after each block
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -w 100 -y test.txt
Write Mode: 100%
fsync after each write
Arg: 1
Added test.txt
Elapsed: 282.30
Read 1 blocks Wrote 1 blocks
35.42 op/sec, 283.39kB/sec

#
# SEQUENTIAL
#

# Sequential 8k block reads
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 100 test.txt
Seq Scan
Arg: 1
Added test.txt
Elapsed: 71.88
Read 100 blocks Wrote 0 blocks
13911.40 op/sec, 111291.17kB/sec

# Sequential 8k block reads & writes
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 100 -w 100 test.txt
Seq Scan
Write Mode: 100%
Arg: 1
Added test.txt
Elapsed: 261.24
Read 100 blocks Wrote 100 blocks
3827.90 op/sec, 30623.18kB/sec

# Sequential 8k block reads & writes, sync after each block
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 1 -w 100 -y test.txt
Seq Scan
Write Mode: 100%
fsync after each write
Arg: 1
Added test.txt
Elapsed: 27.03
Read 1 blocks Wrote 1 blocks
369.96 op/sec, 2959.68kB/sec

#


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans


Hello,

As blocksizes, random I/O and linear I/O are critical I/O performance 
parameters I had a look on PostgreSQL and a commercial software vendor.


Therefore I enhanced the system tap script: 
http://www.wiesinger.com/opensource/systemtap/disktop_gw.stp


Output per 5 seconds on a sequence scan:
UID  PID PPID   CMD   DEVICETBYTES  
   REQUESTSBYTES/REQ
 26 4263 4166postmaster dm-1R168542208  
  20574 8192
=> 32MB/s

So I saw, that even on sequential reads (and also on bitmap heap scan 
acces) PostgreSQL uses only 8k blocks. I think that's a major I/O 
bottleneck.


A commercial software database vendor solved the problem by reading 
multiple continuous blocks by multiple 8k blocks up to a maximum 
threshold. Output per 5 seconds on an equivalent "sequence scan":

UID  PID PPID   CMD   DEVICETBYTES  
   REQUESTSBYTES/REQ
   1001 53811   process dm-1R277754638  
   2338   118800
=> 53 MB/s

A google research has shown that Gregory Stark already worked on that 
issue (see references below) but as far as I saw only on bitmap heap 
scans.


I think this is one of the most critical performance showstopper of 
PostgreSQL on the I/O side.


What's the current status of the patch of Gregory Stark? Any timeframes 
to integrate?
Does it also work for sequence scans? Any plans for a generic "multi block 
read count" solution?


Any comments?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

http://wiki.postgresql.org/wiki/Todo#Concurrent_Use_of_Resources
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00395.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00092.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00098.php

http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php

http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:hz7uzhwxtkbzncy2+state:results
http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:a5osy4qptxk2jgu3+state:results

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans


Hello,

As blocksizes, random I/O and linear I/O are critical I/O performance 
parameters I had a look on PostgreSQL and a commercial software vendor.


Therefore I enhanced the system tap script: 
http://www.wiesinger.com/opensource/systemtap/disktop_gw.stp


Output per 5 seconds on a sequence scan:
UID  PID PPID   CMD   DEVICETBYTES 
REQUESTSBYTES/REQ
 26 4263 4166postmaster dm-1R168542208 
20574 8192
=> 32MB/s

So I saw, that even on sequential reads (and also on bitmap heap scan acces) 
PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck.


A commercial software database vendor solved the problem by reading multiple 
continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 5 
seconds on an equivalent "sequence scan":

UID  PID PPID   CMD   DEVICETBYTES 
REQUESTSBYTES/REQ
   1001 53811   process dm-1R277754638 
2338   118800
=> 53 MB/s

A google research has shown that Gregory Stark already worked on that issue 
(see references below) but as far as I saw only on bitmap heap scans.


I think this is one of the most critical performance showstopper of PostgreSQL 
on the I/O side.


What's the current status of the patch of Gregory Stark? Any timeframes to 
integrate?
Does it also work for sequence scans? Any plans for a generic "multi block read 
count" solution?


Any comments?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

http://wiki.postgresql.org/wiki/Todo#Concurrent_Use_of_Resources
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00395.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00092.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00098.php

http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php

http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:hz7uzhwxtkbzncy2+state:results
http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:a5osy4qptxk2jgu3+state:results

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?


Hello,

I think the limit of bgwriter_lru_maxpages of max. 1000 doesn't make any 
sense because in fact it limits performance of the database (version 8.3).


On heavy write operations buffer cached must be freed. With the default 
config this is practically limited to:

bgwriter_delay=200ms
bgwriter_lru_maxpages=100
8k*bgwriter_lru_maxpages*1000/bgwriter_delay=
=8k*100*1000/200=4000k=4MB/s
Isn't that a major performancd bottleneck in default config?

bgwriter_delay=200ms
bgwriter_lru_maxpages=1000
8k*bgwriter_lru_maxpages*1000/bgwriter_delay=
=8k*1000*1000/200=4k=40MB/s
Still not a very high number for current I/O loads.

Lowering bgwriter_delay is possible, but I think overhead is too much and 
still there is a limit of 800MB/s involved:

bgwriter_delay=10ms
bgwriter_lru_maxpages=1000
8k*bgwriter_lru_maxpages*1000/bgwriter_delay=
=8k*1000*1000/10=80k=800MB/s

So I think it would be better to have such a configuration:
bgwriter_delay=50ms
bgwriter_lru_maxpages=10
8k*bgwriter_lru_maxpages*1000/bgwriter_delay=
=8k*10*1000/50=1600k=16000MB/s

So in fact I think bgwriter_lru_maxpages should be limited to 10 if 
limited at all.


Are my argumentations correct?
Any comments?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Problem with data corruption and psql memory usage


Hello Tom,

Late answer, but answer :-) :
Finally, it was a very strange hardware problem, where a very small part 
of RAM was defect but kernel never crashed.


I had also a very strange behavior when verifying rpm packages with rpm 
-V. First I had the harddisk under suspicion. But then I flushed the OS caches:

echo 3 > /proc/sys/vm/drop_caches
and rpm -V was correct. => RAM issue.

A memtest86+ showed very fast a defect RAM.

So PostgreSQL didn't have any issue :-)

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 9 May 2007, Gerhard Wiesinger wrote:


Hello Tom!

I don't think this is a hardware problem. Machine runs 24/7 for around 4 
years without any problems, daily backup with GBs of data to it, uptimes to 
the next kernel security patch, etc.


The only problem I could believe is:
I'm running the FC7 test packages of postgresql in FC6 and maybe there is a 
slight glibc library conflict or any other incompatibility.


Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 9 May 2007, Tom Lane wrote:


Gerhard Wiesinger  writes:

LOG:  could not fsync segment 0 of relation 1663/16386/42726: Input/output
error


[ raised eyebrow... ]  I think your machine is flakier than you believe.
This error is particularly damning, but the general pattern of weird
failures all over the place seems to me to fit the idea of hardware
problems much better than any other explanation.  FC6 and PG 8.2.3 are
both pretty darn stable for most people, so there's *something* wrong
with your installation, and unstable hardware is the first thing that
comes to mind.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sum of multiplied deltas


Hello,

Finally I used a function below which works well. Only one problem is 
left: It polutes the buffer cache because of the cursor. Any idea to get 
rid of this behavior?


BTW: WINDOWING FUNCTION of 8.4 should help but noone could provide an 
examples how this could work. Any further comments how to implement it?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

CREATE TYPE PS AS
(
  sum_m1 double precision,
  sum_m2 double precision
);

DROP FUNCTION getSum();
CREATE OR REPLACE FUNCTION getSum(IN start_ts timestamp with time 
zone, IN stop_ts timestamp with time zone) RETURNS PS AS $$

DECLARE
  curs CURSOR FOR
  SELECT
*
  FROM
log_entries
  WHERE
datetime >= start_ts
AND datetime <= stop_ts
  ORDER BY
datetime
  ;
  row log_entries%ROWTYPE;
  i bigint = 0;
  datetime_old timestamp with time zone;
  old double precision;
  sum_m1 double precision = 0;
  sum_m2 double precision = 0;
  psum PS;
BEGIN
  OPEN curs;
  LOOP
FETCH curs INTO row;
EXIT WHEN NOT FOUND;
IF row.col IS NOT NULL THEN
  IF i > 0 THEN
sum_m1 = sum_m1 + (row.col - old) * 0.01 * row.col2;
sum_m2 = sum_m2 + EXTRACT('epoch' FROM row.datetime - datetime_old) * 
row.col3;
  END IF;
  i = i + 1;
  old = row.old;
  datetime_old = row.datetime;
END IF;
  END LOOP;
  CLOSE curs;
  psum.sum_m1 = sum_m1;
  psum.sum_m2 = sum_m2;
  RETURN psum;
END;
$$ LANGUAGE plpgsql;


On Mon, 8 Jun 2009, Gerhard Wiesinger wrote:


Hello!

I've the following data:
datetime | val1 | val2
time1|4 | 40%
time2|7 | 30%
time3|   12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_buffercache - A lot of "unnamed" relfilenodes?


On Sat, 26 Sep 2009, Tom Lane wrote:


Gerhard Wiesinger  writes:

Research on the pg_bufferache showed that a lot of pages don't have any
assigned relfilenodesin pg_class, even when they are dirty (in this case
inserts are done).


There should never be any buffered pages for nonexistent tables.  The
method you are using will only show relnames for pages belonging to
tables of the current database ... maybe there are a lot of dirty pages
for other databases in the cluster?  Or maybe the tables are newly
created (not committed) and so you can't see their pg_class rows yet?



OK, this is a visibility problem of the databases. Is it possible to GRANT 
visibility for e.g. one user to all databases for pg_class?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_buffercache - A lot of "unnamed" relfilenodes?