Re: [GENERAL] vacuumdb --analyze-only scans all pages?

2016-12-29 Thread Gerhard Wiesinger

On 29.12.2016 16:10, Tom Lane wrote:

Adrian Klaver  writes:

On 12/28/2016 11:54 PM, Gerhard Wiesinger wrote:

vacuumdb --analyze-only --all --verbose
INFO:  analyzing "public.log"
INFO:  "log": scanned 3 of 30851 pages, containing 3599899 live rows
and 0 dead rows; 3 rows in sample, 3702016 estimated total rows
INFO:  analyzing "public.log_details"
INFO:  "log_details": scanned 2133350 of 2133350 pages, containing
334935843 live rows and 0 dead rows; 300 rows in sample, 334935843
estimated total rows
INFO:  analyzing "public.log_details_str"
INFO:  "log_details_str": scanned 3 of 521126 pages, containing
3601451 live rows and 0 dead rows; 3 rows in sample, 62560215
estimated total rows

Any ideas why?

I would say because the '300 rows in sample' where spread out over
all 2133350 pages.

Worth pointing out here is that you must have a custom statistics target
set on log_details to make it want a sample so much larger than the
default.  If you feel ANALYZE is taking too long, you should reconsider
whether you need such a large target.


Thanx Tom and Adrian

Yes, there is a custom statistic target of 1 set, I guess for some 
reasons some time ago to overcome a performance problem after upgrade 
from 8.3 to 8.4.

Thanx Tom for pointing that out.

Good query to find it out:
SELECT
  n.nspname AS schemaname,
  CASE
   WHEN cl.relkind = 'r' THEN CAST('TABLE' AS VARCHAR(20))
   WHEN cl.relkind = 'i' THEN CAST('INDEX' AS VARCHAR(20))
   WHEN cl.relkind = 'S' THEN CAST('SEQUENCE' AS VARCHAR(20))
   WHEN cl.relkind = 's' THEN CAST('SPECIAL' AS VARCHAR(20))
   WHEN cl.relkind = 'v' THEN CAST('VIEW' AS VARCHAR(20))
   WHEN cl.relkind = 't' THEN CAST('TOAST TABLE' AS VARCHAR(18))
   ELSE null
  END AS object_type,
  cl.relname,
  attname,
  attstattarget
FROM
  pg_attribute a
LEFT OUTER JOIN pg_class cl ON a.attrelid = cl.oid
LEFT OUTER JOIN pg_namespace n ON n.oid = cl.relnamespace
WHERE
  (cl.relkind = 'r' OR cl.relkind = 'i' OR cl.relkind = 't')
  AND attnum > 0 -- only regular columns
  AND n.nspname = 'public'  -- public schema only
  AND NOT(relname ILIKE 'pgstatspack_%')
  AND cl.relkind = 'r' -- TABLE
  AND attstattarget <> -1 -- non default values only
ORDER BY
  n.nspname,
  cl.relname,
  attnum
;

BTW: It looks like that the statistics target is multiplied by 300 to 
get the number of rows, is that true (didn't find any documentation 
about that)?

https://www.postgresql.org/docs/current/static/planner-stats.html
-- ALTER TABLE public.log_details ALTER COLUMN fk_id SET STATISTICS 1;
-- ALTER TABLE public.log_details ALTER COLUMN fk_keyid SET STATISTICS 
1;

-- Default is 100, means 300*100=3 rows (30k)
-- Max ss 1, means 300*1=300 rows (3 Mio)
ALTER TABLE public.log_details ALTER COLUMN fk_id SET STATISTICS -1;
ALTER TABLE public.log_details ALTER COLUMN fk_keyid SET STATISTICS -1;
https://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET

Thnx.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] vacuumdb --analyze-only scans all pages?

2016-12-28 Thread Gerhard Wiesinger

Hello,

PostgreSQl  9.6.1: after a pg_dump/restore procedure it scans all pages 
(at least for some of the tables, analyze-only switch is specified).


I would expect that only the sample rows are scanned.

"log_details": scanned 2133350 of 2133350 pages

vacuumdb --analyze-only --all --verbose
INFO:  analyzing "public.log"
INFO:  "log": scanned 3 of 30851 pages, containing 3599899 live rows 
and 0 dead rows; 3 rows in sample, 3702016 estimated total rows

INFO:  analyzing "public.log_details"
INFO:  "log_details": scanned 2133350 of 2133350 pages, containing 
334935843 live rows and 0 dead rows; 300 rows in sample, 334935843 
estimated total rows

INFO:  analyzing "public.log_details_str"
INFO:  "log_details_str": scanned 3 of 521126 pages, containing 
3601451 live rows and 0 dead rows; 3 rows in sample, 62560215 
estimated total rows


Any ideas why?

Thnx.

Ciao,

Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] CTE and function

2016-02-25 Thread Gerhard Wiesinger

Hello,

I'm trying to convert a select after a CTE into a function for generic 
use. The CTE is normally a complex query but I want to capsulate then 
the calculation of the Gini coefficient it into a function:
Based on: 
http://www.heckler.com.br/blog/2010/06/15/gini-coeficient-having-fun-in-both-sql-and-python/

Details at: https://en.wikipedia.org/wiki/Gini_coefficient

= OK

WITH tab AS (
  SELECT unnest(ARRAY[1,2,3,4]) AS col
)
SELECT
   ((N+1.0)/(N-1.0)) - ((PiXi_sum/(N*(N-1)*u))*2.0) AS gini_coefficient
FROM
  ( SELECT
  SUM(PiXi) AS PiXi_sum,
  COUNT(*) AS N,
  (SELECT AVG(col) FROM tab) AS u
FROM
  ( SELECT
  row_number() OVER() * col AS PiXi
FROM
  (SELECT col FROM tab ORDER BY col DESC) t1
  ) t2
  ) t3
;


= OK: Create function

CREATE OR REPLACE FUNCTION gini_coefficient(IN table_name text, IN 
column_name text, OUT gini_coefficient DOUBLE PRECISION) AS $$

BEGIN
EXECUTE format('
SELECT
   ((N+1.0)/(N-1.0)) - ((PiXi_sum/(N*(N-1)*u))*2.0) AS gini_coefficient
FROM
  ( SELECT
  SUM(PiXi) AS PiXi_sum,
  COUNT(*) AS N,
  (SELECT AVG(%s) FROM %s) AS u
FROM
  ( SELECT
  row_number() OVER() * col AS PiXi
FROM
  (SELECT %s FROM %s ORDER BY %s DESC) t1
  ) t2
  ) t3
;
', column_name, table_name, column_name, table_name, column_name)
INTO gini_coefficient;
END
$$ LANGUAGE plpgsql;


= NOT OK:

WITH tab AS (
  SELECT unnest(ARRAY[1,2,3,4]) AS col
)
SELECT gini_coefficient('tab', 'col');

ERROR:  relation "tab" does not exist
LINE 13:   (SELECT col FROM tab ORDER BY col DESC) t1


= NOT OK:

WITH tab AS (
  SELECT unnest(ARRAY[1,2,3,4]) AS col
)
SELECT * FROM gini_coefficient('tab', 'col');

ERROR:  relation "tab" does not exist
LINE 13:   (SELECT col FROM tab ORDER BY col DESC) t1

So it looks like the table tab from the CTE is not available in the 
function.


Any ideas how to solve it and an explaination would be fine?

Thank you.

Ciao,
Gerhard

--
https://www.wiesinger.com/



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Gerhard Wiesinger

On 13.12.2015 21:14, Bill Moran wrote:

Wait ... this is a combined HTTP/Postgres server? You didn't mention that
earlier, and it's kind of important.

What evidence do you have that Postgres is actually the part of
this system running out of memory?


For me the complete picture doesn't look consistent.


I don't see any such evidence in any of
your emails, and (based on experience) I find it pretty likely that whatever
is running under node is doing something in a horrifically memory-inefficient
manner. Since you mention that you see nothing in the PG logs, that makes it
even more likely (to me) that you're looking entirely in the wrong place.

I'd be willing to bet a steak dinner that if you put the web server on a
different server than the DB, that the memory problems would follow the
web server and not the DB server.


Changes in config:
track_activity_query_size = 102400
work_mem = 100MB

Ok, we restarted PostgreSQL and had it stopped for seconds, and logged 
top every second:


When PostgreSQL was down nearly all memory was freed, looks good to me. 
So it is likely that node and other processes are not the cause.

Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
Swap: 512M Total, 477M Used, 35M Free, 93% Inuse

When PostgreSQL restarted, Inactive was growing fast (~1min):
Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M 
Free

Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

After some few minutes we are back again at the same situation:
Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

The steak dinner is mine :-) Donating to the PostgreSQL community :-)

Any further ideas, I don't think this is normal system behaviour.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Gerhard Wiesinger

On 13.12.2015 18:17, Tom Lane wrote:

Gerhard Wiesinger  writes:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

Just judging from the name of the function, I would bet this is a direct
result of having only 512M of swap configured.  As Bill already pointed
out, that's a pretty useless choice on a system with 32G of RAM.  As soon
as the kernel tries to push out any significant amount of idle processes,
it's gonna be out of swap space.  The numbers you show above prove that
it is almost out of free swap already.


The system wasn't designed by me, I wouldn't do it either that way. Does 
swapoff help?




Also, while that 20G of "inactive" pages may be candidates for reuse,
they probably can't actually be reused without swapping them out ...
and there's noplace for that data to go.


There is no log in syslog (where postgres log) when 
swap_pager_getswapspace is logged.


But why do we have 20G of Inactive pages? They are still allocated by 
kernel or user space. As you can see below (top output) NON Postgres 
processes are around 9G in virtual size, resident even lower. The system 
is nearly idle, and the queries typically aren't active after one second 
agin. Therefore where does the rest of the 11G of Inactive pages come 
from (if it isn't a Postgres/FreeBSD memory leak)?

I read that Postgres has it's own memory allocator:
https://www.reddit.com/r/programming/comments/18zija/github_got_30_better_performance_using_tcmalloc/
Might that be an issue with double allocation/freeing and the "cheese 
hole" topic with memory fragmentation?


https://www.opennet.ru/base/dev/fbsdvm.txt.html
inactivepages not actively used by programs which are
dirty and (at some point) need to be written
to their backing store (typically disk).
These pages are still associated with objects and
can be reclaimed if a program references them.
Pages can be moved from the active to the inactive
queue at any time with little adverse effect.
Moving pages to the cache queue has bigger
consequences (note 1)

https://unix.stackexchange.com/questions/134862/what-do-the-different-memory-counters-in-freebsd-mean
Active: Memory currently being used by a process
Inactive: Memory that has been freed but is still cached since it 
may be used again. If more Free memory is required, this memory can be 
cleared and become free. This memory is not cleared before it is needed, 
because "free memory is wasted memory", it doesn't cost anything to keep 
the old data around in case it is needed again.
Wired: Memory in use by the Kernel. This memory cannot be swapped 
out (GW: including ZFS cache!!!)
Cache: Memory being used to cache data, can be freed immediately if 
required

Buffers: Disk cache
Free: Memory that is completely free and ready to use. Inactive, 
Cache and Buffers can become free if they are cleaned up.


Thnx.

Ciao,
Gerhard


last pid:  7277;  load averages:  0.91,  0.96,  1.02  up 18+06:22:31
18:57:54

135 processes: 2 running, 132 sleeping, 1 waiting

Mem: 8020M Active, 19G Inact, 3537M Wired, 299M Cache, 1679M Buf, 38M Free
Swap: 512M Total, 501M Used, 12M Free, 97% Inuse

  PID USERNAMETHR PRI NICE   SIZERES STATE   C TIMEWCPU COMMAND
77941 pgsql 5  200  7921M  7295M usem7 404:32  10.25% 
postgres

79570 pgsql 1  200  7367M  6968M sbwait  6 4:24   0.59% postgres
73449 pgsql 1  270  7367M  6908M sbwait  4 8:08   5.08% postgres
74209 pgsql 1  270  7367M  6803M sbwait  0 6:37   1.46% postgres
74207 pgsql 1  250  7367M  6748M sbwait  6 6:34   1.76% postgres
74206 pgsql 1  220  7367M  6548M sbwait  5 6:44   1.86% postgres
73380 pgsql 1  230  7367M  7265M sbwait  4 8:15   1.17% postgres
74208 pgsql 1  240  7367M  7223M sbwait  1 6:30   4.59% postgres
79569 pgsql 1  240  7367M  7105M sbwait  3 4:36   1.17% postgres
74210 pgsql 1  290  7363M  7182M sbwait  5 6:41   5.47% postgres
73479 pgsql 1  220  7363M  6560M sbwait  6 7:14   3.56% postgres
83030 pgsql 1  200  7329M   193M sbwait  5 0:00   0.00% postgres
76178 pgsql 1  200  7323M  7245M sbwait  2 3:44   0.00% postgres
75867 pgsql 1  200  7323M  7245M sbwait  2 3:45   0.00% postgres
75869 pgsql 1  200  7323M  7245M sbwait  2 3:46   0.00% postgres
75883 pgsql 1  200  73

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Gerhard Wiesinger

Hello Bill,

Thank you for your response, comments inline:

On 13.12.2015 16:05, Bill Moran wrote:

On Sun, 13 Dec 2015 09:57:21 +0100
Gerhard Wiesinger  wrote:

some further details from the original FreeBSD 10.1 machine:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU COMMAND
77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98%
postgres: username dbnamee 127.0.0.1(43367)  (postgres)



I see no evidence of an actual leak here. Each process is basically using
the 7G of shared_buffers you have allocated in the config (which is only
7G _total_ for all processes, since it's shared memory)


OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?


Out of memory:
kernel: swap_pager_getswapspace(4): failed
kernel: swap_pager_getswapspace(8): failed
kernel: swap_pager_getswapspace(3): failed

Main issue is IHMO (as far as I understood the FreeBSD Memory system)
that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
should be available, but they are still allocated but inactive
(INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
of memory situations it is likely that the memory is dirty (otherwise it
would have been reused).

Not quite correct. Inactive memory is _always_ available for re-use.


Are you sure that's true?

Monitoring inactive memory:
cat vm_stat.sh
#!/usr/bin/env bash

while [ 1 ]; do
  date +%Y.%m.%d.%H.%M.%S
  sysctl -a | grep vm.stats.vm.
  sleep 1
done

And even we get out of memory with swap_pager_getswapspace Inactive 
Memory (from the log file) is around 20GB (doesn't go down or up)

vm.stats.vm.v_inactive_count: 5193530 (*4k pages is around 20GB)

Then we have 20GB inactive memory, but we still get out of memory with 
kernel: swap_pager_getswapspace(4): failed. Any ideas why?





Config:
Memory: 32GB, Swap: 512MB

Probably not the cause of this problem, but this is a non-optimal layout.
If you're going to use swap at all, it should generally be 2x the size of
your RAM for optimal performance under memory pressure.


Yes, we will try to disable it totally. Nevertheless why do we get out 
of memory/Swap?





maintenance_work_mem = 512MB
effective_cache_size = 10GB
work_mem = 892MB

I expect that this value is the cause of the problem. The scenario you
describe below is sorting a large table on an unindexed column, meaning
it will have to use all that work_mem. I'd be interested to see the
output of:

EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 10;


That was only a test query, has nothing to do with production based 
query. They are mostly SELECT/INSERTS/UPDATES on primary keys.



But even without that information, I'd recommend you reduce work_mem
to about 16M or so.


Why so low? E.g. sorting on reporting or some long running queries are 
then done on disk and not in memory.



wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7080MB
max_connections = 80
autovacuum_max_workers = 3

[snip]


We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple
worker processes connected via persistent connections to PostgreSQL,
they perform just simple queries with SELECT on primary keys and
simple INSERTS/UPDATES.

That's not at all the test scenario you show below. The scenario below
is a large sort operation on a non-indexed column, which is vastly
different than a single-row fetch based on an index.


Yes, that non indexed select was just for testing purporeses.




Normally nearly all the workers are idle but
they still consume the maximum configured work mem on the PostgreSQL
server and the memory is also resident.

I see no evidence of that in your top output. Each process has a
reference to the 7G of shared_buffers you allocated, which is memory
shared by all processes, and is expected. I'm not as familiar with
Linux top, but the results appear to be the same.


Yes, might be the case, but if it is n times 7G shared memory then we 
have ~20GB Inactive Memory available, so plenty of memory. And why are 
we getting: kernel: swap_pager_getswapspace(4): failed?


Thnx.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Gerhard Wiesinger

Hello,

some further details from the original FreeBSD 10.1 machine:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

  PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU COMMAND
77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98% 
postgres: username dbnamee 127.0.0.1(43367)  (postgres)
75862 pgsql 1  200  7321M  7266M select  4   7:04 0.00% 
postgres: checkpointer process(postgres)
75863 pgsql 1  200  7321M  7260M select  4   3:34 0.00% 
postgres: writer process(postgres)
75867 pgsql 1  200  7323M  7237M sbwait  0   3:19 0.00% 
postgres: username dbnamee ipaddress(58977)  (postgres)
76178 pgsql 1  200  7323M  7237M sbwait  2   3:18 0.00% 
postgres: username dbnamee ipaddress(35807)  (postgres)
75868 pgsql 1  200  7323M  7237M sbwait  0   3:20 0.00% 
postgres: username dbnamee ipaddress(58978)  (postgres)
75884 pgsql 1  200  7323M  7236M sbwait  3   3:20 0.00% 
postgres: username dbnamee ipaddress(59908)  (postgres)
75869 pgsql 1  200  7323M  7236M sbwait  3   3:20 0.00% 
postgres: username dbnamee ipaddress(58982)  (postgres)
76181 pgsql 1  200  7323M  7236M sbwait  2   3:18 0.00% 
postgres: username dbnamee ipaddress(35813)  (postgres)
75883 pgsql 1  200  7323M  7236M sbwait  0   3:20 0.00% 
postgres: username dbnamee ipaddress(59907)  (postgres)
76180 pgsql 1  200  7323M  7236M sbwait  1   3:19 0.00% 
postgres: username dbnamee ipaddress(35811)  (postgres)
76177 pgsql 1  200  7323M  7236M sbwait  1   3:18 0.00% 
postgres: username dbnamee ipaddress(35712)  (postgres)
76179 pgsql 1  200  7323M  7236M sbwait  5   3:18 0.00% 
postgres: username dbnamee ipaddress(35810)  (postgres)
64951 pgsql 1  750  7375M   662M CPU11   0:01 11.47% 
postgres: username dbnamee 127.0.0.1(32073)  (postgres)
64950 pgsql 1  770  7325M   598M CPU66   0:02 16.55% 
postgres: username dbnamee 127.0.0.1(44251)  (postgres)


Out of memory:
kernel: swap_pager_getswapspace(4): failed
kernel: swap_pager_getswapspace(8): failed
kernel: swap_pager_getswapspace(3): failed

Main issue is IHMO (as far as I understood the FreeBSD Memory system) 
that 20G are INACTIVE. When I subtract the shared memory, even ~13GB 
should be available, but they are still allocated but inactive 
(INACTIVE). INACTIVE memory might be clean or dirty. As we get into out 
of memory situations it is likely that the memory is dirty (otherwise it 
would have been reused).


Config:
Memory: 32GB, Swap: 512MB

maintenance_work_mem = 512MB
effective_cache_size = 10GB
work_mem = 892MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7080MB
max_connections = 80
autovacuum_max_workers = 3

Thnx.

Ciao,
Gerhard


On 13.12.2015 08:49, Gerhard Wiesinger wrote:

Hello,

We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple 
worker processes connected via persistent connections to PostgreSQL, 
they perform just simple queries with SELECT on primary keys and 
simple INSERTS/UPDATES. Normally nearly all the workers are idle but 
they still consume the maximum configured work mem on the PostgreSQL 
server and the memory is also resident. If some other queries get in 
we get into out of memory situations. So it looks like PostgreSQL has 
memory leaks.


I found a test scenario to reproduce it also on a newer FreeBSD 10.2 
VM as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):


Executions in psql with one persisent connection:
-- Create the table
CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM 
generate_Series(1,1) s;

-- Create the index
CREATE INDEX ON t_random(s);

-- Restart psql with a new connection:

-- Memory goes slighty up after each execution even after canceling:
-- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on 
cancel the query or multiple execution

SELECT * FROM t_random ORDER BY md5 LIMIT 10;

-- Therefore I created a function:
CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
RETURNS void AS $$
BEGIN
  -- RAISE NOTICE 'num=%', num;
  FOR i IN 1..num LOOP
PERFORM * FROM t_random WHERE s = i;
  END LOOP;
END;
$$  LANGUAGE plpgsql;

-- Test it several times
SELECT execmultiplei(1000);

-- Linux testing (FreeBSD is similar), relevant part is RES (resident 
memory):

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
-- after startup of psql
26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00 
postgres: postgres postgres [local] idle

-- Memory goes up, ok so far
26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90 
postgres

[GENERAL] Memory Leak executing small queries without closing the connection

2015-12-12 Thread Gerhard Wiesinger

Hello,

We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple worker 
processes connected via persistent connections to PostgreSQL, they 
perform just simple queries with SELECT on primary keys and simple 
INSERTS/UPDATES. Normally nearly all the workers are idle but they still 
consume the maximum configured work mem on the PostgreSQL server and the 
memory is also resident. If some other queries get in we get into out of 
memory situations. So it looks like PostgreSQL has memory leaks.


I found a test scenario to reproduce it also on a newer FreeBSD 10.2 VM 
as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):


Executions in psql with one persisent connection:
-- Create the table
CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM 
generate_Series(1,1) s;

-- Create the index
CREATE INDEX ON t_random(s);

-- Restart psql with a new connection:

-- Memory goes slighty up after each execution even after canceling:
-- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on cancel 
the query or multiple execution

SELECT * FROM t_random ORDER BY md5 LIMIT 10;

-- Therefore I created a function:
CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
RETURNS void AS $$
BEGIN
  -- RAISE NOTICE 'num=%', num;
  FOR i IN 1..num LOOP
PERFORM * FROM t_random WHERE s = i;
  END LOOP;
END;
$$  LANGUAGE plpgsql;

-- Test it several times
SELECT execmultiplei(1000);

-- Linux testing (FreeBSD is similar), relevant part is RES (resident 
memory):

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
-- after startup of psql
26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00 
postgres: postgres postgres [local] idle

-- Memory goes up, ok so far
26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81 
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90 
postgres: postgres postgres [local] SELECT
-- Function execmultiplei and transaction terminated, but memory still 
allocated!!!
26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 0:41.40 
postgres: postgres postgres [local] idle

-- Calling it again
26851 postgres  20   0 2365732 920668 918748 R  99.0 22.7 0:46.51 
postgres: postgres postgres [local] SELECT

-- idle again, memory still allocated
26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 1:22.54 
postgres: postgres postgres [local] idle


Memory will only be released if psql is exited. According to the 
PostgreSQL design memory should be freed when the transaction completed.


top commands on FreeBSD: top -SaPz -o res -s 1
top commands on Linux: top -o RES d1

Config: VMs with 4GB of RAM, 2 vCPUs
shared_buffers = 2048MB # min 128kB
effective_cache_size = 2GB
work_mem = 892MB
wal_buffers = 8MB
checkpoint_segments = 16

Any ideas?

Thank you.

Ciao,
Gerhard



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs

2012-03-23 Thread Gerhard Wiesinger

On Fri, 23 Mar 2012, Tom Lane wrote:


Merlin Moncure  writes:

On Fri, Mar 23, 2012 at 11:35 AM, Gerhard Wiesinger  wrote:

So I think XID overflow should be planned for one of the next PostgreSQL
releases.



two mitigating factors:
1. read only transactions do not increment xid counter


Yes.  Ask your admin what his throughput of *data modifying*
transactions is.  I'll bet that number is a few orders of magnitude
smaller again.


That were all small writing transactions benchmarking the database (with 
the corrected values discussed).


Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs

2012-03-23 Thread Gerhard Wiesinger

On Fri, 23 Mar 2012, John R Pierce wrote:


On 03/23/12 9:17 AM, Gerhard Wiesinger wrote:

So in that time autovacuum is triggered.


autovacuum runs pretty much continuously in the background, its not an on/off 
thing.


Yes, I know. I ment that it runs at least once in 1.5 days.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs

2012-03-23 Thread Gerhard Wiesinger

On 23.03.2012 11:16, Jan Kesten wrote:

On 23.03.2012 06:45, Gerhard Wiesinger wrote:


With a database admin of a commercial database system I've discussed
that they have to provide and they also achieve 2^31 transactions per
SECOND!
As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they
would turn around in about one second.

Wow. What application issues that much transactions? And what is the
database system that can handle that? I can't think of a single machine
capable of this - and hardy believe postgresql can came close. 2^31
transactions mean that a single one lasts 0.5ns. Even the fastest
DDR3-2133 has cycle times of 4ns.

I have seen a database monster in action - 43 trillion (academic)
transactions per day, but that's only 5*10^8 transactions per second,
under a quarter of 2^31 per second.

So, I can't answer your question - but you triggered my curiosity :-)


I'm just answering in one of the posts ...

Ok, talked again to the admin and he was wrong with 3 zeros and per 
minute  :-)
So corrected data are: 1 Mio transaction per minute. 1Mio/60s=1 
transactions/s


2^32*60/1E6=257698s to get to the XID overflow which are 2.98 days (~1.5 
days when 2^31 with signed int is the border)


So in that time autovacuum is triggered. Nevertheless we are getting 
into the area where XID overflow is an issue in the near future.


In your example with 5E8 transactions per second overflow will be in 4s 
(2^31) or 8s (2^32) ...


So I think XID overflow should be planned for one of the next PostgreSQL 
releases.


Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs

2012-03-23 Thread Gerhard Wiesinger

On 23.03.2012 11:16, Jan Kesten wrote:

On 23.03.2012 06:45, Gerhard Wiesinger wrote:


With a database admin of a commercial database system I've discussed
that they have to provide and they also achieve 2^31 transactions per
SECOND!
As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they
would turn around in about one second.

Wow. What application issues that much transactions? And what is the
database system that can handle that? I can't think of a single machine
capable of this - and hardy believe postgresql can came close. 2^31
transactions mean that a single one lasts 0.5ns. Even the fastest
DDR3-2133 has cycle times of 4ns.

I have seen a database monster in action - 43 trillion (academic)
transactions per day, but that's only 5*10^8 transactions per second,
under a quarter of 2^31 per second.

So, I can't answer your question - but you triggered my curiosity :-)


I'm just answering in one of the posts ...

Ok, talked again to the admin and he was wrong with 3 zeros and per 
minute  :-)
So corrected data are: 1 Mio transaction per minute. 1Mio/60s=1 
transactions/s


2^32*60/1E6=257698s to get to the XID overflow which are 2.98 days (~1.5 
days when 2^31 with signed int is the border)


So in that time autovacuum is triggered. Nevertheless we are getting 
into the area where XID overflow is an issue in the near future.


In your example with 5E8 transactions per second overflow will be in 4s 
(2^31) or 8s (2^32) ...


So I think XID overflow should be planned for one of the next PostgreSQL 
releases.


Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs

2012-03-22 Thread Gerhard Wiesinger

Hello,

With a database admin of a commercial database system I've discussed 
that they have to provide and they also achieve 2^31 transactions per 
SECOND!
As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they 
would turn around in about one second.


How can one achieve this with PostgreSQL?
What is the status and plan of 64 Bits XIDs?

I saw that 64 Bit XIDs were already discussed in 2005 but never found 
into the code:

http://postgresql.1045698.n5.nabble.com/A-couple-of-patches-for-PostgreSQL-64bit-support-td2214264.html

Thnx.

Ciao,
Gerhard


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Double iteration in the buffer cache code

2010-09-01 Thread Gerhard Wiesinger

Hello,

Did someone fix the double iteration in the buffer cache code as discussed 
at in the meantime:

http://www.mail-archive.com/pgsql-general@postgresql.org/msg137230.html

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] OT: Re: [sfpug] 10/13 SFPUG meeting, "The Mighty GUCS," video now available

2009-10-20 Thread Gerhard Wiesinger

On Tue, 20 Oct 2009, Christophe Pettus wrote:



On Oct 20, 2009, at 8:32 AM, Viktor Rosenfeld wrote:

@Christophe, I enjoyed your talk very much, particularly because I
learned about pgfouine, which from the looks of it, will make my current
project vastly simpler.  So, thank you.


You should really thank Josh Berkus; he's the one who gave the talk.  I just 
run the camera. :)


Josh has the slides and sample files, and I'm sure he'll post them shortly. 
I know he's been busy with a client emergency and the PostgreSQL conference 
this weekend.




They are already available from:
http://www.pgexperts.com/presentations.html
http://www.pgexperts.com/document.html?id=34
http://www.pgexperts.com/document.html?id=36

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Function returning 2 columns evaluated twice when both columns are needed

2009-10-19 Thread Gerhard Wiesinger

On Mon, 19 Oct 2009, Tom Lane wrote:


Gerhard Wiesinger  writes:

On Sun, 18 Oct 2009, Tom Lane wrote:

The OFFSET bit is a kluge, but is needed to keep the planner from
flattening the subquery and undoing your work.



Thnx Tom. It also works without the OFFSET kludge. Any ideas why?


Probably because you have the function declared VOLATILE.



None of the function is declared VOLATILE. Any other idea?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Function returning 2 columns evaluated twice when both columns are needed

2009-10-18 Thread Gerhard Wiesinger

On Sun, 18 Oct 2009, Tom Lane wrote:


Gerhard Wiesinger  writes:

Since getSums() is a cursor and is complex and takes long time getSums
should only be evaluated once. Is there a better solution available to
get both columns from the function in the select?


You need a sub-select, along the lines of

SELECT
 cur_date,
 (gs).sum_m1,
 (gs).sum_m2
 FROM
 (
  SELECT
cur_date,
getSums(start_ts, stop_ts) AS gs
  FROM
getDatesTimestamps($1, $2)
  OFFSET 0
 ) AS ss
  ;

The OFFSET bit is a kluge, but is needed to keep the planner from
flattening the subquery and undoing your work.



Thnx Tom. It also works without the OFFSET kludge. Any ideas why?
Looks also reasonable to me because there is no SETOF returned.

BTW: Why is the function in the original statement evaluated twice? On 
"SELECT table.*" I guess query is also executed once and not n times (for 
each column).


PG is version 8.3.8.

Thnx.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Function returning 2 columns evaluated twice when both columns are needed

2009-10-18 Thread Gerhard Wiesinger

Hello,

I'm having a problem with the following:
CREATE TYPE Sums AS (sum_m1 double precision, sum_m2 double precision);
CREATE TYPE date_m1_m2 AS (cur_date date, sum_m1 double precision, sum_m2 
double precision);


CREATE OR REPLACE FUNCTION getSums(IN start_ts timestamp with time 
zone, IN stop_ts timestamp with time zone) RETURNS Sums AS $$

...

CREATE OR REPLACE FUNCTION getsumInterval(date, date) RETURNS SETOF date_m1_m2 
AS $$
  SELECT
cur_date,
(getSums(start_ts, stop_ts)).* -- No optimal since function is evaluated 2 
times => 24s
-- getSums(start_ts, stop_ts) -- in one column and not usable as I need 2 
columns, but takes only 12s
  FROM
getDatesTimestamps($1, $2)
  ;
$$ LANGUAGE SQL;

Since getSums() is a cursor and is complex and takes long time getSums 
should only be evaluated once. Is there a better solution available to 
get both columns from the function in the select?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 10/13 SFPUG meeting, "The Mighty GUCS," video now available

2009-10-17 Thread Gerhard Wiesinger

On Fri, 16 Oct 2009, Christophe Pettus wrote:

The video archive for the 10/13 SFPUG meeting, "The Mighty GUCS: A guide to 
the essential PostgreSQL settings you need to know," is now available:


http://thebuild.com/blog/2009/10/16/the-mighty-gucs/

It's also available on Vimeo:

http://vimeo.com/7109722


Hello,

Can you also upload the sample config files and the presentation.

Thnx.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 10/13 SFPUG meeting, "The Mighty GUCS," video now available

2009-10-17 Thread Gerhard Wiesinger

On Fri, 16 Oct 2009, Christophe Pettus wrote:

The video archive for the 10/13 SFPUG meeting, "The Mighty GUCS: A guide to 
the essential PostgreSQL settings you need to know," is now available:


http://thebuild.com/blog/2009/10/16/the-mighty-gucs/

It's also available on Vimeo:

http://vimeo.com/7109722


Hello,

Can you also upload the sample config files and the presentation.

Thnx.

Ciao,
Gerhard


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-09 Thread Gerhard Wiesinger

On Fri, 9 Oct 2009, Greg Smith wrote:


On Sat, 3 Oct 2009, Gerhard Wiesinger wrote:


I wouldn't read 128k blocks all the time. I would do the following:
When e.g. B0, B127, B256 should be read I would read in 8k random block 
I/O.


When B1, B2, B3, B4, B5, B7, B8, B9, B10 are needed I would make 2 requests 
with the largest possible blocksize:

1.) B1-B5: 5*8k=40k
2.) B7-B10: 4*8k=32k


I see what you mean now.  This is impossible in the current buffer manager 
implementation because blocks are requested one at a time, and there are few 
situations where you can predict which are going to be wanted next. The hash 
index and sequential scan are two that were possible to predict in that way.


The fadvise patches already committed didn't change the way blocks were read 
in, they just used knowledge about what was coming next to advise the OS. 
That's quite a bit different from actually asking for things in larger chunks 
and only using what you need.


Implementing larger chunking reads or similar asynchronous batch I/O is a big 
project, because you'd have to rearchitect the whole way buffers are managed 
in the database to do it right.  Greg Stark's earliest proof of concept 
prototype for async I/O included a Solaris implementation that used the AIO 
library.  It wasn't feasible to actually use that underlying implemention in 
the database in the end though, because the model AIO uses expects you'll 
fire off a bunch of I/O and then retrieve blocks as they come in.  That's 
really not easy to align with the model for how blocks are read into 
shared_buffers right now.  He had some ideas for that and I've thought 
briefly about the problem, but it would be a major overhaul to some scary to 
touch database internals to pull off.


Given that the OS and/or RAID implementations tend to do what we want in a 
lot of these cases, where smarter/chunkier read-ahead is what we you need, 
the payback on accelerating those cases hasn't been perceived as that great. 
There is a major win for the hash index reads, which Solaris systems can't 
take advantage of, so somebody who uses those heavily on that OS might be 
motivated enough produce improvements for that use case. Once the buffer 
cache at large understood how to handle batching async reads, Solaris AIO 
would be possible, fancier stuff with Linux AIO would be possible, and the 
type of chunking reads you're suggesting would be too.  But none of that is 
happening without some major rearchitecting first.


Unfortunately there aren't that many people with the right knowledge and 
motivation to start tinkering around with the buffer cache internals to the 
extent that would be required to do better here, and pretty much of them I'm 
aware of are hacking on projects with a much clearer payback instead.




I've one idea, which is not ideal, but may work and shouldn't be much 
effort to implement:
As in the example above we read B1-B5 and B7-B10 on a higher level outside 
of normal buffer management with large request sizes (e.g. where hash 
index scans and sequential scans are done). As the blocks are now in cache 
normal buffer management is very fast:

1.) B1-B5: 5*8k=40k
2.) B7-B10: 4*8k=32k

So we are reading for 1.):
B1-B5 in one 40k block (typically from disk), afterwards we read B1, B2, 
B3, B4, B5 in 8k chunks from cache again.


The disadvantage is of course that we have more read requests more maybe 
performance is even better because the normal buffer requests are from 
cache. A second disadvantage is the "bad" design.


But I think performance will be even better. And a configuration option to 
enable this might also be interesting.


Maybe I will try that with pgiosim whether performance is better or not.

What do you think about it?
Is the idea clear?

Thnx.

Ciao
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-05 Thread Gerhard Wiesinger

On Mon, 5 Oct 2009, Greg Smith wrote:


On Sun, 4 Oct 2009, Gerhard Wiesinger wrote:


On Fri, 2 Oct 2009, Scott Marlowe wrote:


I found that lowering checkpoint completion target was what helped.
Does that seem counter-intuitive to you?




I set it to 0.0 now.


If you set that to 0.0, the whole checkpoing spreading logic doesn't apply 
like it's supposed to.  I'm not sure what the results you posted mean now. If 
you had it set to 0 and saw a bad spike (which is how I read your message), 
I'd say "yes, that's what happens when you do reduce that parameter, so don't 
do that".  If you meant something else please clarify.


I think the problem is, that it is done on checkpoint time (whether spread 
or not). I should have been already be done by bgwriter.




Thanks for the dtrace example, I suggested we add those checkpoint probes in 
there and someone did, but I hadn't seen anybody use them for anything yet.




I think more probes (e.g. on different writing conditions like writing 
from bgwriter or on a checkpoint) would be interesting here.



Bug1: usage_count is IHMO not consistent


It's a bit hack-ish, but the changes made to support multiple buffer use 
strategies introduced by the "Make large sequential scans and VACUUMs work in 
a limited-size ring" commit are reasonable even if they're not as consistent 
as we'd like.  Those changes were supported by benchmarks proving their 
utility, which always trump theoretical "that shouldn't work better!" claims 
when profiling performance.


Also, they make sense to me, but I've spent a lot of time staring at 
pg_buffercache output to get a feel for what shows up in there under various 
circumstances.  That's where I'd suggest you go if this doesn't seem right to 
you; run some real database tests and use pg_buffercache to see what's inside 
the cache when you're done.  What's in there and what I expected to be in 
there weren't always the same thing, and it's interesting to note how that 
changes as shared_buffers increases.  I consider some time studying that a 
pre-requisite to analyzing performance of this code.




I have analyzed pg_buffercache (query every second, see below) in parallel 
to see what happens but I didn't see expected results in some ways with 
the usage_counts. Therefore I analyzed the code and found IHMO the problem 
with the usage_count and buffer reallocation. Since the code change is 
also new (I think it way 05/2009) it might be that you tested before ...


BTW: Is it possible to get everything in pg_class over all databases as 
admin?



Bug2: Double iteration of buffers
As you can seen in the calling tree below there is double iteration with 
buffers involved. This might be a major performance bottleneck.


Hmmm, this might be a real bug causing scans through the buffer cache to go 
twice as fast as intended.


That's not twice O(2*n)=O(n) that's a factor n*n (outer and inner loop 
iteration) which means overall is O(n^2) which is IHMO too much.


 Since the part you suggest is doubled isn't very 
intensive or called all that often, there's no way it can be a major issue 
though.


It is a major issue since it is O(n^2) and not O(n). E.g. with 2GB share 
buffer we have 262144 blocks and 68719476736 calls which is far too much.


 That's based on knowing what the code does and how much it was 
called, as well as some confidence that if it were really a *major* problem, 
it would have shown up on the extensive benchmarks done on all the code paths 
you're investigating.




The problem might be hidden for the following reasons:
1.) Buffers values are too low that even n^2 is low for today's machines
2.) Code is not often called in that way
3.) backend writes out pages so that the code is never executed
4.) ...

BTW: Are there some tests available how fast a buffer cache hit is and a 
disk cache hit is (not in the buffer cache but in the disk cache)? I'll 
asked, because a lot of locking is involved in the code.


I did some once but didn't find anything particularly interesting about the 
results.  Since you seem to be on a research tear here, it would be helpful 
to have a script to test that out available, I wasn't able to release mine 
and something dtrace based would probably be better than the approach I used 
(I threw a bunch of gettimeofdata calls into the logs and post-processed them 
with a script).




Do you have an where one should set tracepoints inside and outside 
PostgreSQL?


BTW2: Oracle buffercache and background writer strategy is also 
interesting.


As a rule, we don't post links to other database implementation details here, 
as those can have patented design details we'd prefer not to intentionally 
re-implement.  Much of Oracle's design here doesn't apply here anyway, as it 
was done in the 

Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-04 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:


On Fri, 2 Oct 2009, Scott Marlowe wrote:


I found that lowering checkpoint completion target was what helped.
Does that seem counter-intuitive to you?




I set it to 0.0 now.

Generally, but there are plenty of ways you can get into a state where a 
short but not immediate checkpoint is better.  For example, consider a case 
where your buffer cache is filled with really random stuff.  There's a 
sorting horizon in effect, where your OS and/or controller makes decisions 
about what order to write things based on the data it already has around, not 
really knowing what's coming in the near future.




Ok, if checkpoint doesn't block anything on normal operation time doesn't 
really matter.


Let's say you've got 256MB of cache in the disk controller, you have 1GB of 
buffer cache to write out, and there's 8GB of RAM in the server so it can 
cache the whole write.  If you wrote it out in a big burst, the OS would 
elevator sort things and feed them to the controller in disk order. Very 
efficient, one pass over the disk to write everything out.


But if you broke that up into 256MB write pieces instead on the database 
side, pausing after each chunk was written, the OS would only be sorting 
across 256MB at a time, and would basically fill the controller cache up with 
that before it saw the larger picture.  The disk controller can end up making 
seek decisions with that small of a planning window now that are not really 
optimal, making more passes over the disk to write the same data out.  If the 
timing between the DB write cache and the OS is pathologically out of sync 
here, the result can end up being slower than had you just written out in 
bigger chunks instead.  This is one reason I'd like to see fsync calls happen 
earlier and more evenly than they do now, to reduce these edge cases.


The usual approach I take in this situation is to reduce the amount of write 
caching the OS does, so at least things get more predictable.  A giant write 
cache always gives the best average performance, but the worst-case behavior 
increases at the same time.


There was a patch floating around at one point that sorted all the checkpoint 
writes by block order, which would reduce how likely it is you'll end up in 
one of these odd cases.  That turned out to be hard to nail down the benefit 
of though, because in a typical case the OS caching here trumps any I/O 
scheduling you try to do in user land, and it's hard to repeatibly generate 
scattered data in a benchmark situation.




Ok, on a basic insert test and a systemtap script 
(http://www.wiesinger.com/opensource/systemtap/postgresql-checkpoint.stp) 
checkpoint is still a major I/O spike.



Buffers between : Sun Oct  4 18:29:50 2009, synced 55855 buffer(s), flushed 744 
buffer(s) between checkpoint
Checkpoint start: Sun Oct  4 18:29:50 2009
Checkpoint end  : Sun Oct  4 18:29:56 2009, synced 12031 buffer(s), flushed 
12031 buffer(s)

Buffers between : Sun Oct  4 18:30:20 2009, synced 79000 buffer(s), flushed 0 
buffer(s) between checkpoint
Checkpoint start: Sun Oct  4 18:30:20 2009
Checkpoint end  : Sun Oct  4 18:30:26 2009, synced 10753 buffer(s), flushed 
10753 buffer(s)

Buffers between : Sun Oct  4 18:30:50 2009, synced 51120 buffer(s), flushed 
1007 buffer(s) between checkpoint
Checkpoint start: Sun Oct  4 18:30:50 2009
Checkpoint end  : Sun Oct  4 18:30:56 2009, synced 11899 buffer(s), flushed 
11912 buffer(s)


Ok, I further had a look at the code to understand the behavior of the 
buffercache and the background writer since that wasn't logically.


So as far as I saw the basic algorithm is:
1.) Normally (non checkpoints) only dirty and non recently used pages 
(usage_count == 0) are flushed to disk. I think that's basically fine as a 
strategy as indexes might update blocks more than once. It's also ok that 
blocks are written and not flushed (well be done on checkpoint time).
2.) At checkpoints write out all dirty buffers and flush all previously 
written and newly written. Also spreading I/O seems also ok to me now.


BUT: I think I've found 2 major bugs in the implementation (or I didn't 
understand something correctly). Codebase analyzed is 8.3.8 since I 
currently use it.


##
Bug1: usage_count is IHMO not consistent
##
I think this has been introduced with:
http://git.postgresql.org/gitweb?p=postgresql.git;a=blobdiff;f=src/backend/storage/buffer/bufmgr.c;h=6e6b862273afea40241e410e18fd5d740c2b1643;hp=97f7822077de683989a064cdc624a025f85e54ab;hb=ebf3d5b66360823edbdf5ac4f9a119506fccd4c0;hpb=98ffa4e9bd75c81243

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-03 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Simon Riggs wrote:



On Sun, 2009-09-27 at 18:05 +0200, Gerhard Wiesinger wrote:


So I saw, that even on sequential reads (and also on bitmap heap scan acces)
PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck.

A commercial software database vendor solved the problem by reading multiple
continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 5
seconds on an equivalent "sequence scan":


Is systemtap counting actual I/Os or just requests to access 8192 blocks
once in OS cache? Postgres doesn't read more than one block at a time
into its buffer pool, so those numbers of requests look about right.



As far as I know these are VFS reads. So some reads might be from cache 
but since I did all requests should be from disk:

echo 3 > /proc/sys/vm/drop_caches;service postgresql restart
do benchmark

Same for all benchmarks because I don't want to measure cache performance 
of OS or of the DB to benchmark.


Therefore all requests (except reaing twice or more but that shouldn't be 
the case and would also be fine as cache hit) are from disk and not from 
the cache.



There is belief here that multi-block I/O was introduced prior to OS
doing this as a standard mechanism. Linux expands its read ahead window
in response to sequential scans and so this seems like something we
don't want to do in the database.


I played even with large values on block device readaheads of /dev/md*, 
/dev/sd* and /dev/dm-* as well as stripe_cache_size of /dev/md* but 
without any performance improvements in the benmark scenarios.


=> All readaheads/caches don't seem to work in at least in the HEAP 
BITMAP SCAN scenarios on nearly latest Linux kernels.


But I think such block issues (reading in largest blocks as possible) have 
to be optimized on application level (in our case DB level) because

1.) We can't assume that OS and even storage works well in such scenarios
2.) We can't assume that OS/storage is intelligent enough to reduce number 
of IOPS when 2 random blocks are at random 2 sequential blocks and that 
therefore the number of IOPS is reduced.
3.) I think such a logic should be very easy to integrate and even has 
been done with some patches.




It's possible this is wrong. Is the table being scanned fairly sizable
and was it allocated contiguously? i.e. was it a large table loaded via
COPY?

I also wonder if more L2 cache effects exist.



What do you mean with "table being scanned fairly sizable"? I don't get 
it.


Table was filled with a lot of inserts but at one time point.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-03 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:


On Fri, 2 Oct 2009, Gerhard Wiesinger wrote:

Larger blocksizes also reduce IOPS (I/Os per second) which might be a 
critial threshold on storage systems (e.g. Fibre Channel systems).


True to some extent, but don't forget that IOPS is always relative to a block 
size in the first place.  If you're getting 200 IOPS with 8K blocks, 
increasing your block size to 128K will not result in your getting 200 IOPS 
at that larger size; the IOPS number at the larger block size is going to 
drop too.  And you'll pay the penalty for that IOPS number dropping every 
time you're accessing something that would have only been an 8K bit of I/O 
before.




Yes, there will be some (very small) drop in IOPS, when blocksize is 
higher but today disks have a lot of throughput when IOPS*128k are 
compared to e.g. 100MB/s. I've done some Excel calculations which support 
this.


The trade-off is very application dependent.  The position you're advocating, 
preferring larger blocks, only makes sense if your workload consists mainly 
of larger scans.  Someone who is pulling scattered records from throughout a 
larger table will suffer with that same change, because they'll be reading a 
minimum of 128K even if all they really needed with a few bytes.  That 
penalty ripples all the way from the disk I/O upwards through the buffer 
cache.




I wouldn't read 128k blocks all the time. I would do the following:
When e.g. B0, B127, B256 should be read I would read in 8k random block 
I/O.


When B1, B2, B3, B4, B5, B7, B8, B9, B10 are needed I would make 2 
requests with the largest possible blocksize:

1.) B1-B5: 5*8k=40k
2.) B7-B10: 4*8k=32k

In this case when B5 and B7 are only one block away we could also discuss 
whether we should read B1-B10=10*8k=80k in one read request and don't use 
B6.


That would reduce the IOPS of a factor of 4-5 in that scenario and 
therefore throughput would go up.


It's easy to generate a synthetic benchmark workload that models some 
real-world applications and see performance plunge with a larger block size. 
There certainly are others where a larger block would work better. Testing 
either way is complicated by the way RAID devices usually have their own 
stripe sizes to consider on top of the database block size.




Yes, there are block device read ahead buffers and also RAID stripe 
caches. But both don't seem to work well with the tested HEAP BITMAP SCAN 
scenario and also in practical PostgreSQL performance measurement 
scenarios.


But the modelled pgiosim isn't a synthetic benchmark it is the same as a 
real work HEAP BITMAP SCAN scenario in PostgreSQL where some blocks are 
read directly consecutive at least logically in the filesystem (and with 
some propability also physically on disk) but currently only with each 8k 
block read even when 2 or more blocks could be read with one request.


BTW: I would also limit the blocksize to some upper limit on such requests 
(e.g. 1MB).


Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:


On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

Lowering bgwriter_delay is possible, but I think overhead is too much and 
still there is a limit of 800MB/s involved:


Stuff written by the background writer turns into largely random I/O. 800MB/s 
of random writes is so large of a number it's only recently become remotely 
possible; a RAID0 of SSD devices might manage it.  No hardware available 
until very recently had any hope of getting that sort of performance.


In any case, I would wager you'll run into one of many other bottlenecks in 
PostgreSQL and/or currently available system/disk hardware long before the 
background writer limit gets important.




Of course, 800MB/s are a theoretical max. limit I could thought of. But 
with SSDs this might be possible.


So in fact I think bgwriter_lru_maxpages should be limited to 10 if 
limited at all.


The current limit is based on the assumption that people will set it to 
values way too high if allowed, to the point where it's counterproductive. 
That's exactly what people used to do with early background writer designs. 
I think you're wandering down the same road, where what it actually does and 
what you think it does are not the same thing at all. Much of the important 
disk I/O coming out of the database should be related to checkpoints, not the 
background writer, and there is no limit on that I/O.




In my experience flushing I/O as soon as possible is the best solution. 
Think of the following scenario: You currently limit bgwriter at 4MB/s but 
you would have about 10MB/s random I/O capacity (a normal low cost 
system). So utilitzzation would be only 40% and you could write even more. 
At checkpoint time you would get a spike which the I/O system couldn't 
handle at all and performance goes down to nearly zero because of the I/O 
spike (e.g. 500% of available I/O needed). IHMO such scenarios should be 
avoided.


If you think you've got a situation where the current limits are not 
sufficient, the path to argue that would start with showing what you're 
seeing in pg_stat_bgwriter.  I can imagine some results from there on a 
system with a very high rate of I/O available that would suggest the current 
limits are too small.  I've never come close to actually seeing such results 
in the real world though, and if you're not already monitoring those numbers 
on a real system I'd suggest you start there rather than presuming there's a 
design limitation here.




On an nearly idle database with sometimes some performance tests:

SELECT
  buffers_checkpoint/buffers_clean AS checkpoint_spike,
  ROUND(100.0*buffers_checkpoint/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS checkpoint_percentage,
  ROUND(100.0*buffers_clean/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS pg_writer_percentage,
  ROUND(100.0*buffers_backend/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS backend_percentage

FROM
  pg_stat_bgwriter
;

 checkpoint_spike | checkpoint_percentage | pg_writer_percentage | 
backend_percentage
--+---+--+
   31 | 90.58 | 2.92 |  
 6.50

So flushing happens typically at checkpoint time. In 6.5%of all blocks 
were put by the backend on disk which says IHMO: pgwriter is to slow, 
backend has to do the work now.


So I'd like to do some tests with new statistics. Any fast way to reset 
statistics for all databases for pg_stat_pgwriter?


Thnx.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:


On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

I think this is one of the most critical performance showstopper of 
PostgreSQL on the I/O side.


I wish, this is an easy problem compared to the real important ones that need 
to be resolved.  Situations where the OS is capable of faster sequential I/O 
performance than PostgreSQL appears to deliver doing reads are often caused 
by something other than what the person doing said benchmarking believes they 
are.  For example, the last time I thought I had a smoking gun situation just 
like the one you're describing, it turns out the background operation I 
didn't know was going on that slowed things down were hint bit updates: 
http://wiki.postgresql.org/wiki/Hint_Bits


Background checkpoints can also cause this, typically if you set 
checkpoint_segments really high and watch when they're happening you can 
avoid that interfering with results too.


It's hard to isolate out the cause of issues like this.  Since most people 
seem to get something close to real disk speed from sequential scans when 
measured properly, I would suggest starting with the assumption there's 
something wrong with your test case rather than PostgreSQL.  The best way to 
do that is to construct a test case others can run that shows the same 
problem on other systems using the database itself.  The easiest way to build 
one of those is using generate_series to create some bogus test data, SELECT 
everything in there with \timing on, and then use the size of the relation on 
disk to estimate MB/s.


Regardless, it's easy enough to build PostgreSQL with larger block sizes if 
you think that really matters for your situation.  You're never going to see 
that in the mainstream version though, because there are plenty of downsides 
to using larger blocks.  And since the database doesn't actually know where 
on disk things are at, it's not really in a good position to make decisions 
about I/O scheduling anyway.  More on that below.


What's the current status of the patch of Gregory Stark? Any timeframes to 
integrate?


There needs to be a fairly major rearchitecting of how PostgreSQL handles 
incoming disk I/O for that to go anywhere else, and I don't believe that's 
expected to be ready in the near future.


Does it also work for sequence scans? Any plans for a generic "multi block 
read count" solution?


There was a similar patch for sequential scans submitted by someone else 
based on that work.  It was claimed to help performance on a Linux system 
with a rather poor disk I/O setup.  No one else was able to replicate any 
performance improvement using the patch though.  As far as I've been able to 
tell, the read-ahead logic being done by the Linux kernel and in some 
hardware is already doing this sort of optimization for you on that OS, 
whether or not your app knows enough to recognize it's sequentially scanning 
the disk it's working against.


I forgot to mention:
Larger blocksizes also reduce IOPS (I/Os per second) which might be a 
critial threshold on storage systems (e.g. Fibre Channel systems). You 
would get e.g. the throughput from the storage with large block sizes 
(less IOPS) but with small block sizes the IOPS limit is reached and 
throughput performance goes down.


Example:
With 100MB/s and 8k blocks you need 12500 IOPS which is a lot (e.g. at 
least 90 disks with 140 IOPS)!
When blocks can be read with e.g. 128k block size 781 IOPS are sufficient 
(6 disks are sufficient)!


So this makes a major difference.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:


On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

I think this is one of the most critical performance showstopper of 
PostgreSQL on the I/O side.


I wish, this is an easy problem compared to the real important ones that need 
to be resolved.  Situations where the OS is capable of faster sequential I/O 
performance than PostgreSQL appears to deliver doing reads are often caused 
by something other than what the person doing said benchmarking believes they 
are.  For example, the last time I thought I had a smoking gun situation just 
like the one you're describing, it turns out the background operation I 
didn't know was going on that slowed things down were hint bit updates: 
http://wiki.postgresql.org/wiki/Hint_Bits


Background checkpoints can also cause this, typically if you set 
checkpoint_segments really high and watch when they're happening you can 
avoid that interfering with results too.


It's hard to isolate out the cause of issues like this.  Since most people 
seem to get something close to real disk speed from sequential scans when 
measured properly, I would suggest starting with the assumption there's 
something wrong with your test case rather than PostgreSQL.  The best way to 
do that is to construct a test case others can run that shows the same 
problem on other systems using the database itself.  The easiest way to build 
one of those is using generate_series to create some bogus test data, SELECT 
everything in there with \timing on, and then use the size of the relation on 
disk to estimate MB/s.


Regardless, it's easy enough to build PostgreSQL with larger block sizes if 
you think that really matters for your situation.  You're never going to see 
that in the mainstream version though, because there are plenty of downsides 
to using larger blocks.  And since the database doesn't actually know where 
on disk things are at, it's not really in a good position to make decisions 
about I/O scheduling anyway.  More on that below.


What's the current status of the patch of Gregory Stark? Any timeframes to 
integrate?


There needs to be a fairly major rearchitecting of how PostgreSQL handles 
incoming disk I/O for that to go anywhere else, and I don't believe that's 
expected to be ready in the near future.


Does it also work for sequence scans? Any plans for a generic "multi block 
read count" solution?


There was a similar patch for sequential scans submitted by someone else 
based on that work.  It was claimed to help performance on a Linux system 
with a rather poor disk I/O setup.  No one else was able to replicate any 
performance improvement using the patch though.  As far as I've been able to 
tell, the read-ahead logic being done by the Linux kernel and in some 
hardware is already doing this sort of optimization for you on that OS, 
whether or not your app knows enough to recognize it's sequentially scanning 
the disk it's working against.




I've enhanced the pgiosim project http://pgfoundry.org/projects/pgiosim/ 
with a patch for larger blocksizes independent from PostgreSQL: http://www.wiesinger.com/opensource/pgiosim/pgiosim-0.2-blocksizes.diff


You'll find some detailed results below and can verify this on your 
platforms with the patch above. Maybe someone can verify this on different 
HW/SW plattforms. If you have any questions regarding the pgiosim and the 
patch just feel free to ask.


Summary:
RANDOM I/O of blocksizes of e.g. 128k (e.g. BITMAP HEAP SCAN) has better 
performance than reading the same blocks with 8k block sizes (factor 1.5).


Conclusio:
In the test scenario the proposed solution would have a performance gain 
of a factor of 1.5 for typical BITMAP HEAP SCANS. For other scenarios no 
performance gain with larger block sizes of continuous blocks could be 
measured. Therefore I'm assuming that prefetching works well on Linux with 
sequential I/O but not with random I/O.


I hope I can convince someone that such optimizations make sense as 
commercial database venders have implemented such features for performance 
reasons.


BTW: Prefetch is enabled on the raid and blockdevices.

Ciao,
Gerhard

--
http://www.wiesinger.com/

# RANDOM I/O 8k blocksize
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 test.txt
Arg: 1
Added test.txt
blocksize=8192, reading block as a whole
Elapsed: 135.92
Read 1 blocks Wrote 0 blocks
73.57 op/sec, 588.60kB/sec

# RANDOM I/O 8k blocksize (for verification only), in fact same test as below
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -r test.txt
Arg: 1
Added test.txt
blocksize=8192, doing single read requests with chunk size of 8192 bytes
Elapsed: 136.30
Read 1 blocks Wrote 0 blocks
73.37 op/sec, 586.94kB/sec

# RANDOM I/O 128k blocksize, read as one 128k block
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -o 131072 test.txt
Arg: 1
Added test

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-09-27 Thread Gerhard Wiesinger


On Sun, 27 Sep 2009, Sam Mason wrote:


On Sun, Sep 27, 2009 at 06:05:51PM +0200, Gerhard Wiesinger wrote:

A google research has shown that Gregory Stark already worked on that issue
(see references below) but as far as I saw only on bitmap heap scans.


Greg Stark's patches are about giving the IO subsystem enough
information about where the random accesses will be ending up next.
This is important, but almost completely independent from the case
where you know you're doing sequential IO, which is what you seem to be
talking about.



I'm talking about 2 cases
1.) Sequential scans
2.) Bitmap index scans
which both hopefully end physically in blocks which are after each other 
and were larger block sizes can benefit.



I think this is one of the most critical performance showstopper of
PostgreSQL on the I/O side.


PG's been able to handle data as fast as it can come back from the disk
in my tests.  When you start doing calculations then it will obviously
slow down, but what you were talking about wouldn't help here either.

Then again, I don't have a particularly amazing IO subsystem.  What
sort of performance do your disks give you and at what rate is PG doing
sequential scans for you?



Hello Sam,

Detailed benchmarks are below, the original one from PostgreSQL have 
already been posted. So i would expect at least about 60-80MB in reading 
for PostgreSQL (when larger block sizes are read)in practical issues on 
sequence scans but they are at about 30MB/s. See also pgiosim below.


Setup is:
Disk Setup: SW RAID 5 with 3x1TB SATA 7200 RPM disks
Linux Kernel: 2.6.30.5-43.fc11.x86_64
CPU: Quad Core: AMD Phenom(tm) II X4 940 Processor, 3GHz
RAM: 8GB

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


http://pgfoundry.org/projects/pgiosim/

#
# Performance benchmarks:
#
dd if=/dev/zero of=test.txt bs=8192 count=1310720 conv=fdatasync
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 169.482 s, 63.4 MB/s

dd if=test.txt of=/dev/null bs=8192
1310720+0 records in
1310720+0 records out
10737418240 bytes (11 GB) copied, 86.4457 s, 124 MB/s

#
# RANDOM
#

# Random 8k block reads
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 test.txt
Arg: 1
Added test.txt
Elapsed: 148.22
Read 1 blocks Wrote 0 blocks
67.47 op/sec, 539.75kB/sec

# Random 8k block reads & writes
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -w 100 test.txt
Write Mode: 100%
Arg: 1
Added test.txt
Elapsed: 201.44
Read 1 blocks Wrote 1 blocks
49.64 op/sec, 397.14kB/sec

# Random 8k block reads & writes, sync after each block
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -w 100 -y test.txt
Write Mode: 100%
fsync after each write
Arg: 1
Added test.txt
Elapsed: 282.30
Read 1 blocks Wrote 1 blocks
35.42 op/sec, 283.39kB/sec

#
# SEQUENTIAL
#

# Sequential 8k block reads
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 100 test.txt
Seq Scan
Arg: 1
Added test.txt
Elapsed: 71.88
Read 100 blocks Wrote 0 blocks
13911.40 op/sec, 111291.17kB/sec

# Sequential 8k block reads & writes
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 100 -w 100 test.txt
Seq Scan
Write Mode: 100%
Arg: 1
Added test.txt
Elapsed: 261.24
Read 100 blocks Wrote 100 blocks
3827.90 op/sec, 30623.18kB/sec

# Sequential 8k block reads & writes, sync after each block
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 1 -w 100 -y test.txt
Seq Scan
Write Mode: 100%
fsync after each write
Arg: 1
Added test.txt
Elapsed: 27.03
Read 1 blocks Wrote 1 blocks
369.96 op/sec, 2959.68kB/sec

#


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-09-27 Thread Gerhard Wiesinger

Hello,

As blocksizes, random I/O and linear I/O are critical I/O performance 
parameters I had a look on PostgreSQL and a commercial software vendor.


Therefore I enhanced the system tap script: 
http://www.wiesinger.com/opensource/systemtap/disktop_gw.stp


Output per 5 seconds on a sequence scan:
UID  PID PPID   CMD   DEVICETBYTES  
   REQUESTSBYTES/REQ
 26 4263 4166postmaster dm-1R168542208  
  20574 8192
=> 32MB/s

So I saw, that even on sequential reads (and also on bitmap heap scan 
acces) PostgreSQL uses only 8k blocks. I think that's a major I/O 
bottleneck.


A commercial software database vendor solved the problem by reading 
multiple continuous blocks by multiple 8k blocks up to a maximum 
threshold. Output per 5 seconds on an equivalent "sequence scan":

UID  PID PPID   CMD   DEVICETBYTES  
   REQUESTSBYTES/REQ
   1001 53811   process dm-1R277754638  
   2338   118800
=> 53 MB/s

A google research has shown that Gregory Stark already worked on that 
issue (see references below) but as far as I saw only on bitmap heap 
scans.


I think this is one of the most critical performance showstopper of 
PostgreSQL on the I/O side.


What's the current status of the patch of Gregory Stark? Any timeframes 
to integrate?
Does it also work for sequence scans? Any plans for a generic "multi block 
read count" solution?


Any comments?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

http://wiki.postgresql.org/wiki/Todo#Concurrent_Use_of_Resources
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00395.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00092.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00098.php

http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php

http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:hz7uzhwxtkbzncy2+state:results
http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:a5osy4qptxk2jgu3+state:results

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-09-27 Thread Gerhard Wiesinger

Hello,

As blocksizes, random I/O and linear I/O are critical I/O performance 
parameters I had a look on PostgreSQL and a commercial software vendor.


Therefore I enhanced the system tap script: 
http://www.wiesinger.com/opensource/systemtap/disktop_gw.stp


Output per 5 seconds on a sequence scan:
UID  PID PPID   CMD   DEVICETBYTES 
REQUESTSBYTES/REQ
 26 4263 4166postmaster dm-1R168542208 
20574 8192
=> 32MB/s

So I saw, that even on sequential reads (and also on bitmap heap scan acces) 
PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck.


A commercial software database vendor solved the problem by reading multiple 
continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 5 
seconds on an equivalent "sequence scan":

UID  PID PPID   CMD   DEVICETBYTES 
REQUESTSBYTES/REQ
   1001 53811   process dm-1R277754638 
2338   118800
=> 53 MB/s

A google research has shown that Gregory Stark already worked on that issue 
(see references below) but as far as I saw only on bitmap heap scans.


I think this is one of the most critical performance showstopper of PostgreSQL 
on the I/O side.


What's the current status of the patch of Gregory Stark? Any timeframes to 
integrate?
Does it also work for sequence scans? Any plans for a generic "multi block read 
count" solution?


Any comments?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

http://wiki.postgresql.org/wiki/Todo#Concurrent_Use_of_Resources
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00395.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00092.php
http://archives.postgresql.org/pgsql-hackers/2007-12/msg00098.php

http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php

http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:hz7uzhwxtkbzncy2+state:results
http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:a5osy4qptxk2jgu3+state:results

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-09-27 Thread Gerhard Wiesinger

Hello,

I think the limit of bgwriter_lru_maxpages of max. 1000 doesn't make any 
sense because in fact it limits performance of the database (version 8.3).


On heavy write operations buffer cached must be freed. With the default 
config this is practically limited to:

bgwriter_delay=200ms
bgwriter_lru_maxpages=100
8k*bgwriter_lru_maxpages*1000/bgwriter_delay=
=8k*100*1000/200=4000k=4MB/s
Isn't that a major performancd bottleneck in default config?

bgwriter_delay=200ms
bgwriter_lru_maxpages=1000
8k*bgwriter_lru_maxpages*1000/bgwriter_delay=
=8k*1000*1000/200=4k=40MB/s
Still not a very high number for current I/O loads.

Lowering bgwriter_delay is possible, but I think overhead is too much and 
still there is a limit of 800MB/s involved:

bgwriter_delay=10ms
bgwriter_lru_maxpages=1000
8k*bgwriter_lru_maxpages*1000/bgwriter_delay=
=8k*1000*1000/10=80k=800MB/s

So I think it would be better to have such a configuration:
bgwriter_delay=50ms
bgwriter_lru_maxpages=10
8k*bgwriter_lru_maxpages*1000/bgwriter_delay=
=8k*10*1000/50=1600k=16000MB/s

So in fact I think bgwriter_lru_maxpages should be limited to 10 if 
limited at all.


Are my argumentations correct?
Any comments?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem with data corruption and psql memory usage

2009-09-26 Thread Gerhard Wiesinger

Hello Tom,

Late answer, but answer :-) :
Finally, it was a very strange hardware problem, where a very small part 
of RAM was defect but kernel never crashed.


I had also a very strange behavior when verifying rpm packages with rpm 
-V. First I had the harddisk under suspicion. But then I flushed the OS caches:

echo 3 > /proc/sys/vm/drop_caches
and rpm -V was correct. => RAM issue.

A memtest86+ showed very fast a defect RAM.

So PostgreSQL didn't have any issue :-)

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 9 May 2007, Gerhard Wiesinger wrote:


Hello Tom!

I don't think this is a hardware problem. Machine runs 24/7 for around 4 
years without any problems, daily backup with GBs of data to it, uptimes to 
the next kernel security patch, etc.


The only problem I could believe is:
I'm running the FC7 test packages of postgresql in FC6 and maybe there is a 
slight glibc library conflict or any other incompatibility.


Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 9 May 2007, Tom Lane wrote:


Gerhard Wiesinger  writes:

LOG:  could not fsync segment 0 of relation 1663/16386/42726: Input/output
error


[ raised eyebrow... ]  I think your machine is flakier than you believe.
This error is particularly damning, but the general pattern of weird
failures all over the place seems to me to fit the idea of hardware
problems much better than any other explanation.  FC6 and PG 8.2.3 are
both pretty darn stable for most people, so there's *something* wrong
with your installation, and unstable hardware is the first thing that
comes to mind.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sum of multiplied deltas

2009-09-26 Thread Gerhard Wiesinger

Hello,

Finally I used a function below which works well. Only one problem is 
left: It polutes the buffer cache because of the cursor. Any idea to get 
rid of this behavior?


BTW: WINDOWING FUNCTION of 8.4 should help but noone could provide an 
examples how this could work. Any further comments how to implement it?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

CREATE TYPE PS AS
(
  sum_m1 double precision,
  sum_m2 double precision
);

DROP FUNCTION getSum();
CREATE OR REPLACE FUNCTION getSum(IN start_ts timestamp with time 
zone, IN stop_ts timestamp with time zone) RETURNS PS AS $$

DECLARE
  curs CURSOR FOR
  SELECT
*
  FROM
log_entries
  WHERE
datetime >= start_ts
AND datetime <= stop_ts
  ORDER BY
datetime
  ;
  row log_entries%ROWTYPE;
  i bigint = 0;
  datetime_old timestamp with time zone;
  old double precision;
  sum_m1 double precision = 0;
  sum_m2 double precision = 0;
  psum PS;
BEGIN
  OPEN curs;
  LOOP
FETCH curs INTO row;
EXIT WHEN NOT FOUND;
IF row.col IS NOT NULL THEN
  IF i > 0 THEN
sum_m1 = sum_m1 + (row.col - old) * 0.01 * row.col2;
sum_m2 = sum_m2 + EXTRACT('epoch' FROM row.datetime - datetime_old) * 
row.col3;
  END IF;
  i = i + 1;
  old = row.old;
  datetime_old = row.datetime;
END IF;
  END LOOP;
  CLOSE curs;
  psum.sum_m1 = sum_m1;
  psum.sum_m2 = sum_m2;
  RETURN psum;
END;
$$ LANGUAGE plpgsql;


On Mon, 8 Jun 2009, Gerhard Wiesinger wrote:


Hello!

I've the following data:
datetime | val1 | val2
time1|4 | 40%
time2|7 | 30%
time3|   12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_buffercache - A lot of "unnamed" relfilenodes?

2009-09-26 Thread Gerhard Wiesinger

On Sat, 26 Sep 2009, Tom Lane wrote:


Gerhard Wiesinger  writes:

Research on the pg_bufferache showed that a lot of pages don't have any
assigned relfilenodesin pg_class, even when they are dirty (in this case
inserts are done).


There should never be any buffered pages for nonexistent tables.  The
method you are using will only show relnames for pages belonging to
tables of the current database ... maybe there are a lot of dirty pages
for other databases in the cluster?  Or maybe the tables are newly
created (not committed) and so you can't see their pg_class rows yet?



OK, this is a visibility problem of the databases. Is it possible to GRANT 
visibility for e.g. one user to all databases for pg_class?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_buffercache - A lot of "unnamed" relfilenodes?

2009-09-26 Thread Gerhard Wiesinger

Hello,

Research on the pg_bufferache showed that a lot of pages don't have any 
assigned relfilenodesin pg_class, even when they are dirty (in this case 
inserts are done).


SELECT
  relname IS NULL AS relame_is_null,
  isdirty,
  COUNT(*) AS count
FROM
  pg_buffercache b
LEFT OUTER JOIN pg_class c ON b.relfilenode=c.relfilenode
GROUP BY
  relame_is_null,
  isdirty
;

relame_is_null isdirty count
false  false   8914
true   false   7347
true   true123

Any ideas of this behavior because inserts should have assigned relnames, 
shouldn't they?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problems with postgres online backup - restore

2009-06-26 Thread Gerhard Wiesinger

Hello Richard,

OK, understood it and looks to me a good system with whole block overwrite 
and then the deltas.


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Fri, 26 Jun 2009, Richard Huxton wrote:


Gerhard Wiesinger wrote:

Hello,

OK, what's then the difference doing a pg_start_backup() or just doing the 
backup?


pg_start_backup() forces a checkpoint (and logs a label for your backup too).

Isn't that a problem that the datablock are very inconsistent, even so 
inconsistent that they are corrupt:


E.g. A part of a datablock is written when e.g. the file is tarred. => 
Datablock on backup is corrupt => An then even the WAL can't be applied.


Why does it work correctly? Or is there some design problem?


It works because the WAL doesn't hold a list of row updates ("update row 
12345 set field 4 = true") but block updates. Any update to a disk block is 
recorded - table or index. The WAL doesn't really know anything about 
tables, columns, primary keys etc - just disk blocks.


One small optimisation is that the first time a block is touched after a 
checkpoint the value of the whole block is written to WAL and after that only 
updates.


So - if you have a checkpointed system (all updates guaranteed written to 
disk) and a complete set of WAL files from that point on you can always 
recreate the writes to any point in time after that.


http://www.postgresql.org/docs/8.3/static/continuous-archiving.html

--
 Richard Huxton
 Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problems with postgres online backup - restore

2009-06-25 Thread Gerhard Wiesinger

Hello,

OK, what's then the difference doing a pg_start_backup() or just 
doing the backup?


Isn't that a problem that the datablock are very inconsistent, even so 
inconsistent that they are corrupt:


E.g. A part of a datablock is written when e.g. the file is tarred. => 
Datablock on backup is corrupt => An then even the WAL can't be applied.


Why does it work correctly? Or is there some design problem?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Thu, 25 Jun 2009, Richard Huxton wrote:


Gerhard Wiesinger wrote:

Hello,

I'd like to understand the PostgreSQL internals in "backup mode".

When I understood it correctly pg_start_backup() make a checkpoint and 
stops writing to the data/ directory except the WAL.


All new transaction go into WAL which is also logical. But how is data 
consistency done when the written/changed blocks don't fit into the buffer 
cache?


The data isn't kept consistent. Which is why you need the WAL. Restoring from 
a PITR backup is basically the same idea as recovering from a crash. Any 
blocks that might have been updated since you called pg_start_backup() will 
be rewritten from the WAL.


--
 Richard Huxton
 Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problems with postgres online backup - restore

2009-06-25 Thread Gerhard Wiesinger

Hello,

I'd like to understand the PostgreSQL internals in "backup mode".

When I understood it correctly pg_start_backup() make a checkpoint and 
stops writing to the data/ directory except the WAL.


All new transaction go into WAL which is also logical. But how is data 
consistency done when the written/changed blocks don't fit into the buffer 
cache?


E.g.
A lot of updates which exceed buffer cache (Where is data written except 
WAL since data should be kept constant?)

SELECT FROM updated data: WHERE is the new data fetched from?

Thnx for any explainations.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Thu, 25 Jun 2009, Chris Barnes wrote:



SELECT pg_switch_xlog();

SELECT pg_start_backup('postres_full_backup_June222009');

tar -czvf pgprd01_June22_2009_production.dmp.tar.gz data/

SELECT pg_stop_backup();



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Implicit conversion/comparision of timestamp with and without timezone

2009-06-21 Thread Gerhard Wiesinger


On Sun, 21 Jun 2009, Richard Huxton wrote:


Gerhard Wiesinger wrote:

Hello,

It is unclear to me how implicit conversion/comparision of timestamp with 
and without timezone works.


It's not entirely clear where the problem is. You don't say the results 
you're getting or what you thought they should be.




Ok, the "problem" is:
--
  datetime >= '2009-03-09 00:00:00.0'
  AND datetime  < '2009-03-10 00:00:00.0'
-- Index Cond: ((datetime >= '2009-03-09 00:00:00+01'::timestamp with time zone) 
AND (datetime < '2009-03-10 00:00:00+01'::timestamp with time zone))
--
  datetime >= '2009-04-01 00:00:00.0'
  AND datetime  < '2009-04-02 00:00:00.0'
-- Index Cond: ((datetime >= '2009-04-01 00:00:00+02'::timestamp with time zone) 
AND (datetime < '2009-04-02 00:00:00+02'::timestamp with time zone))

I would have expected that the 1st conversion is done with my current 
timezone of the query time (CEST=+02). I don't see any problem with this 
since all ugly timezone calculation are already done by PostgreSQL, but it 
is nowhere (?) documented and was quite confusing me.


So for me documentation looks like:
A timestamp without any timezone information is converted into a 
timestamp with time zone information (explicitly or implicitly) with the 
with the time zone from the timestamp which has to be converted (and not 
from the time zone from now()). The date and time information is not touched.


And:
A timestamp with timezone information is converted into a timestamp 
without timezone information by removing just the timezone information. So 
the date and time information is not touched.


Correct?

BTW:
I'm in CET (now CEST) timezone which is changed on last sunday in march 
2:00 CET (and last sunday in October 3:00 CEST).


show timezone;
   TimeZone
---
 Europe/Vienna


--
-- datetime TIMESTAMP WITH TIME ZONE
-- datetime entries are with UTC+01 and UTC+02 done
-- 2009-03-09: UTC+01
-- 2009-06-12: UTC+02
-- current timezone: UTC+02


OK, so I've got this:
=> show timezone;
TimeZone
--
UTC+02

SELECT * FROM tstest;
  d

2009-03-08 23:00:00-02
2009-06-12 00:00:00-02
(2 rows)

So - it's showing the timestamp with timezones you mentioned but in the 
current timezone. The UTC+02 actually being a -02 offset is just part of the 
standards afaik. Note that midnight 2009-03-09 is 2009-03-08 in the current 
timezone.




SELECT datetime FROM table ORDER BY datetime DESC LIMIT 1;
datetime

 2009-06-21 14:54:00+02

So it should also be +02 at your test. Why is it -02 at your test?

BTW:
2009-03-09 and 2009-03-08 are both in CET timezone in Europe (see above).

So - if we run EXPLAIN on your queries that should show us how the values are 
getting typecast.


=> EXPLAIN SELECT * FROM tstest WHERE d >= '2009-03-09 00:00:00.0' AND d < 
'2009-03-10 00:00:00.0';
QUERY PLAN 


Seq Scan on tstest  (cost=0.00..42.10 rows=11 width=8)
  Filter: ((d >= '2009-03-09 00:00:00-02'::timestamp with time zone) AND (d 
< '2009-03-10 00:00:00-02'::timestamp with time zone))



This comes back with zero rows because without an explicit timezone it 
assumes our current one.



=> EXPLAIN SELECT * FROM tstest WHERE d >= '2009-06-12 00:00:00.0' AND d < 
'2009-06-13 00:00:00.0';
QUERY PLAN 


Seq Scan on tstest  (cost=0.00..42.10 rows=11 width=8)
  Filter: ((d >= '2009-06-12 00:00:00-02'::timestamp with time zone) AND (d 
< '2009-06-13 00:00:00-02'::timestamp with time zone))


This will match one row.

Repeating the EXPLAIN on your other queries should show you what's happening. 
If I've missed the point of your question, can you say what results you get 
and what you think they should be.




See above.

For completeness:
--
  datetime >= TIMESTAMP WITHOUT TIME ZONE '2009-03-09 00:00:00.0'
  AND datetime  < TIMESTAMP WITHOUT TIME ZONE '2009-03-10 00:00:00.0'
-- Index Cond: ((datetime >= '2009-03-09 00:00:00'::timestamp without time zone) 
AND (datetime < '2009-03-10 00:00:00'::timestamp without time zone))

=> Afterwars an implicit conversation to timestamp with tim

[GENERAL] Implicit conversion/comparision of timestamp with and without timezone

2009-06-20 Thread Gerhard Wiesinger

Hello,

It is unclear to me how implicit conversion/comparision of timestamp with 
and without timezone works.


--
-- datetime TIMESTAMP WITH TIME ZONE
-- datetime entries are with UTC+01 and UTC+02 done
-- 2009-03-09: UTC+01
-- 2009-06-12: UTC+02
-- current timezone: UTC+02

SELECT
  *
FROM
  table
WHERE
-- currently in UTC+02 timezone, entries in UTC+01 timezone
-- => works well with UTC+01 entry timezone, but why?
  datetime >= '2009-03-09 00:00:00.0'
  AND datetime  < '2009-03-10 00:00:00.0'
;

-- currently in UTC+02 timezone, entries in UTC+02 timezone
-- => works well with UTC+02 entry timezone, but why?
  datetime >= '2009-06-12 00:00:00.0'
  AND datetime  < '2009-06-13 00:00:00.0'
;

-- Same result, unclear why
  datetime >= TIMESTAMP WITHOUT TIME ZONE '2009-03-09 00:00:00.0'
  AND datetime  < TIMESTAMP WITHOUT TIME ZONE '2009-03-10 00:00:00.0'

-- Same result, unclear why
  datetime >= TIMESTAMP WITH TIME ZONE '2009-03-09 00:00:00.0'
  AND datetime  < TIMESTAMP WITH TIME ZONE '2009-03-10 00:00:00.0'

-- Same result2, unclear why
  datetime >= TIMESTAMP WITHOUT TIME ZONE '2009-06-12 00:00:00.0'
  AND datetime  < TIMESTAMP WITHOUT TIME ZONE '2009-06-13 00:00:00.0'

-- Same result2, unclear why
  datetime >= TIMESTAMP WITH TIME ZONE '2009-06-12 00:00:00.0'
  AND datetime  < TIMESTAMP WITH TIME ZONE '2009-06-13 00:00:00.0'

How is implicit conversion done? With timezone of datetime or timezone of 
now() or timezone?


Would it make a difference when datetime would be declared with TIMESTAMP 
WITHOUT TIME ZONE?

--

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sum of multiplied deltas

2009-06-08 Thread Gerhard Wiesinger


Any hints for an 8.3 environment (currently)?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Mon, 8 Jun 2009, David Fetter wrote:


On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote:

Hello!

I've the following data:
datetime | val1 | val2
time1|4 | 40%
time2|7 | 30%
time3|   12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...


This is best done in 8.4 using Windowing.  Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.

Cheers,
David.


datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sum of multiplied deltas

2009-06-07 Thread Gerhard Wiesinger

Hello!

I've the following data:
datetime | val1 | val2
time1|4 | 40%
time2|7 | 30%
time3|   12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...

datetime is ordered (and unique and has also an id).

Rows are in the area of millions.

How is it done best?
1.) Self join with one row shift?
2.) function?

Any hint?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] HOT question - insert/delete

2009-05-21 Thread Gerhard Wiesinger
The data isn't really historical, but some data have to be for e.g. some 
regulations after a period of time. But all the available data should be 
available for e.g. reporting. So partitioning doesn't make any sense 
in this case, right?


Ciao,
Gerhard

--
http://www.wiesinger.com/


On Thu, 21 May 2009, Simon Riggs wrote:



On Wed, 2009-05-20 at 16:01 -0400, Merlin Moncure wrote:

On Wed, May 20, 2009 at 3:11 PM, Gerhard Wiesinger  wrote:

Hello!

Is HOT in 8.3 used in the following scenario:
INSERT ...
DELETE ...
at the same ratio.

So for example current data is added every minute and all old data older
than 2 years are deleted.

Would the heap be bloated or is this avoided by the HOT feature of 8.3 and
nearly kept constant?


HOT doesn't help here...it only helps with updates and then only if
you are updating fields that are not indexed.


Partial vacuum, in 8.4, will deal with this situation, though
partitioning does sound best for such clearly historical data.

--
Simon Riggs   www.2ndQuadrant.com
PostgreSQL Training, Services and Support


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] HOT question - insert/delete

2009-05-21 Thread Gerhard Wiesinger


Are there any plans to support this kind of scenario?

Ciao,
Gerhard


--
http://www.wiesinger.com/


On Wed, 20 May 2009, Merlin Moncure wrote:


On Wed, May 20, 2009 at 3:11 PM, Gerhard Wiesinger  wrote:

Hello!

Is HOT in 8.3 used in the following scenario:
INSERT ...
DELETE ...
at the same ratio.

So for example current data is added every minute and all old data older
than 2 years are deleted.

Would the heap be bloated or is this avoided by the HOT feature of 8.3 and
nearly kept constant?


HOT doesn't help here...it only helps with updates and then only if
you are updating fields that are not indexed.  if your table has a
rolling set of data, for example a log file...you probably want to
look at table partitioning (either manual or built in).

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] HOT question - insert/delete

2009-05-20 Thread Gerhard Wiesinger

Hello!

Is HOT in 8.3 used in the following scenario:
INSERT ...
DELETE ...
at the same ratio.

So for example current data is added every minute and all old data 
older than 2 years are deleted.


Would the heap be bloated or is this avoided by the HOT feature of 8.3 and 
nearly kept constant?


Or is vacuum still needed for this scenario?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full backup - pg_dumpall sufficient?

2009-01-29 Thread Gerhard Wiesinger

Hello Ray,

Yes, that's clear. But there was even some stuff which isn't dumped with 
pg_dumpall (as far as I read).


So it was like to run 2 statements like:
1.) Run pg_dumpall
2.) Run pg_dump additionally ...

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Thu, 29 Jan 2009, Raymond O'Donnell wrote:


On 29/01/2009 16:31, Gerhard Wiesinger wrote:


I recently read some Mail on the mailinglist where some parts of
PostgreSQL were not dumped with pg_dumpall and additionally some pg_dump
was necessary (it was something like internals, catalog, etc.)


It's the other way around - pg_dump dumps just the specified database,
but not cluster-wide stuff like login roles; you need to do a pg_dumpall
to get those as well.

Ray.

--
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
r...@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
--

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Full backup - pg_dumpall sufficient?

2009-01-29 Thread Gerhard Wiesinger

Hello!

I recently read some Mail on the mailinglist where some parts of 
PostgreSQL were not dumped with pg_dumpall and additionally some pg_dump 
was necessary (it was something like internals, catalog, etc.)


Any ideas what additionally has to be dumped to pg_dumpall for a full 
backup?


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Description of transaction model for indexes

2009-01-04 Thread Gerhard Wiesinger

Hello!

The transaction model is discussed in several areas:
http://www.packtpub.com/article/transaction-model-of-postgresql
The POSTGRES Data Model (1987)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.8578
The Design of POSTGRES (1986)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1422

However: Is there a paper available, how indexes are handled inside the 
transaction (e.g. visibility (not handled), concurrency, etc.)?

E.g. What happens on SELECT/INSERT/UPDATE/DELETE?

Some aspects can be found in:
http://www.pgcon.org/2008/schedule/attachments/54_PGCon2008-HOT.ppt

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Conditional commit inside functions

2008-12-26 Thread Gerhard Wiesinger

Hello,

Aren't there any drawbacks in postgrs on such large transaction (like in 
Oracle), e.g if I would use 500.000.000 or even more?


Ciao,
Gerhard

--
http://www.wiesinger.com/


On Fri, 26 Dec 2008, Pavel Stehule wrote:


Hello

why do you need commit?

pavel

2008/12/26 Gerhard Wiesinger :

Hello!

I tried the following, but still one transaction:

SELECT insert_1Mio();

(parallel select count(id) from employee; is done)

CREATE OR REPLACE FUNCTION insert_some(start_i INTEGER, end_i INTEGER)
RETURNS void
AS $func$
DECLARE
BEGIN
 FOR i IN start_i..end_i LOOP
   INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i,
'John' || i, 'Smith' || i);
 END LOOP;
END;
$func$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION insert_1Mio() RETURNS void
AS $func$
DECLARE
 maxcommit INTEGER;
 start_i INTEGER;
 end_i INTEGER;
 now_i INTEGER;
BEGIN
 maxcommit := 1;
 start_i :=1;
 end_i := 100;

 now_i := start_i;

 FOR i IN start_i..end_i LOOP
   IF MOD(i, maxcommit) = 0 THEN
 PERFORM insert_some(now_i, i);
 now_i := i + 1;
   END IF;
 END LOOP;
 PERFORM insert_some(now_i, end_i);
END;
$func$ LANGUAGE plpgsql;

Any ideas?

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Thu, 25 Dec 2008, Gerhard Wiesinger wrote:


Hello!

I want to translate the following Oracle PL/SQL script into plpgsql.
Especially I'm having problems with the transaction thing. i tried START
TRANSACTION and COMMIT without success.

Any ideas?

Thanx.

Ciao,
Gerhard

CREATE OR REPLACE PROCEDURE insert_1Mio
IS
 maxcommit NUMBER;
BEGIN
 maxcommit := 1;

 FOR i IN 1..100 LOOP
  INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i,
'John' || to_char(i), 'Smith' || to_char(i));
  IF MOD(i, maxcommit) = 0 THEN
COMMIT;
  END IF;
 END LOOP;

 COMMIT;
END;



--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Information about Pages, row versions of tables, indices

2008-12-26 Thread Gerhard Wiesinger
  WHEN cl.relkind = 'r' THEN relname
  WHEN cl.relkind = 'i' THEN (SELECT relname FROM pg_index pi, 
pg_class cx WHERE cl.oid=pi.indexrelid and pi.indrelid=cx.oid)

  WHEN cl.relkind = 't' THEN relname
  ELSE null
 END AS tablename,
 reltoastrelid as reltoastrelid,
 reltoastidxid as reltoastidxid,
 reltype AS reltype,
 reltablespace AS reltablespace,
 CASE
  WHEN cl.relkind = 'i' THEN 0.0
  ELSE pg_relation_size(cl.oid)
 END AS tablesize,
 pg_relation_size(cl.oid),
-- pg_relation_size(cl.relname) AS tablesize,
 CASE
  WHEN cl.relkind = 'r' THEN CAST('TABLE' AS VARCHAR(20))
  WHEN cl.relkind = 'i' THEN
CASE WHEN cl.oid in (SELECT pi.indexrelid FROM pg_index 
pi, pg_class pc, pg_class pt WHERE pi.indexrelid = pt.oid AND 
pi.indexrelid = pc.reltoastidxid ORDER BY pi.indexrelid)

  THEN CAST('INDEX OF TOAST TABLE' AS VARCHAR(20))
  ELSE CAST('INDEX' AS VARCHAR(20))
END
  WHEN cl.relkind = 'S' THEN CAST('SEQUENCE' AS VARCHAR(20))
  WHEN cl.relkind = 's' THEN CAST('SPECIAL' AS VARCHAR(20))
  WHEN cl.relkind = 'v' THEN CAST('VIEW' AS VARCHAR(20))
  WHEN cl.relkind = 't' THEN CAST('TOAST TABLE' AS 
VARCHAR(18))

  ELSE null
 END AS object_type,
 CASE
  WHEN cl.relkind = 'r' THEN
 COALESCE((SELECT 
SUM(pg_relation_size(indexrelid))::bigint

   FROM pg_index WHERE cl.oid=indrelid), 0)
  ELSE pg_relation_size(cl.oid)
 END AS indexsize,
 CASE
  WHEN reltoastrelid=0 THEN 0
  ELSE pg_relation_size(reltoastrelid)
 END AS toastsize,
 CASE
  WHEN reltoastrelid=0 THEN 0
  ELSE pg_relation_size((SELECT reltoastidxid FROM pg_class ct
 WHERE cl.reltoastrelid = ct.oid))
 END AS toastindexsize
FROM pg_class cl
LEFT OUTER JOIN pg_namespace n ON n.oid = cl.relnamespace
) ss
WHERE schemaname='public'
  AND object_type='INDEX'
ORDER BY
  schemaname, tablename, object_type DESC, relname;

On Thu, 25 Dec 2008, Pavel Stehule wrote:


Hello

look on contrib module pg_stat_tuple
http://www.postgresql.org/docs/8.3/interactive/pgstattuple.html

regards
Pavel Stehule

2008/12/25 Gerhard Wiesinger :

Hello!

Is there some information in meta tables available about the number of pages
currently unused, row versions of tables and indices which are unused?

I'm asking because I want to measure how efficient HOT is working and
whether vacuum should be run or not saving diskspace (I know this is done
automatically).

Thanx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Conditional commit inside functions

2008-12-26 Thread Gerhard Wiesinger

Hello!

I tried the following, but still one transaction:

SELECT insert_1Mio();

(parallel select count(id) from employee; is done)

CREATE OR REPLACE FUNCTION insert_some(start_i INTEGER, end_i INTEGER) 
RETURNS void

AS $func$
DECLARE
BEGIN
  FOR i IN start_i..end_i LOOP
INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i, 
'John' || i, 'Smith' || i);
  END LOOP;
END;
$func$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION insert_1Mio() RETURNS void
AS $func$
DECLARE
  maxcommit INTEGER;
  start_i INTEGER;
  end_i INTEGER;
  now_i INTEGER;
BEGIN
  maxcommit := 1;
  start_i :=1;
  end_i := 100;

  now_i := start_i;

  FOR i IN start_i..end_i LOOP
IF MOD(i, maxcommit) = 0 THEN
  PERFORM insert_some(now_i, i);
  now_i := i + 1;
END IF;
  END LOOP;
  PERFORM insert_some(now_i, end_i);
END;
$func$ LANGUAGE plpgsql;

Any ideas?

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Thu, 25 Dec 2008, Gerhard Wiesinger wrote:


Hello!

I want to translate the following Oracle PL/SQL script into plpgsql. 
Especially I'm having problems with the transaction thing. i tried START 
TRANSACTION and COMMIT without success.


Any ideas?

Thanx.

Ciao,
Gerhard

CREATE OR REPLACE PROCEDURE insert_1Mio
IS
 maxcommit NUMBER;
BEGIN
 maxcommit := 1;

 FOR i IN 1..100 LOOP
   INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i, 
'John' || to_char(i), 'Smith' || to_char(i));

   IF MOD(i, maxcommit) = 0 THEN
 COMMIT;
   END IF;
 END LOOP;

 COMMIT;
END;



--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Conditional commit inside functions

2008-12-25 Thread Gerhard Wiesinger

Hello!

I want to translate the following Oracle PL/SQL script into plpgsql. 
Especially I'm having problems with the transaction thing. i tried START 
TRANSACTION and COMMIT without success.


Any ideas?

Thanx.

Ciao,
Gerhard

CREATE OR REPLACE PROCEDURE insert_1Mio
IS
  maxcommit NUMBER;
BEGIN
  maxcommit := 1;

  FOR i IN 1..100 LOOP
INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i, 
'John' || to_char(i), 'Smith' || to_char(i));
IF MOD(i, maxcommit) = 0 THEN
  COMMIT;
END IF;
  END LOOP;

  COMMIT;
END;



--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Information about Pages, row versions of tables, indices

2008-12-25 Thread Gerhard Wiesinger

Hello!

Is there some information in meta tables available about the number of 
pages currently unused, row versions of tables and indices which are 
unused?


I'm asking because I want to measure how efficient HOT is working and 
whether vacuum should be run or not saving diskspace (I know this is done 
automatically).


Thanx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tsearch2 Upgrade from 8.2 to 8.3.1 with mediawiki

2008-05-30 Thread Gerhard Wiesinger

Hello!

I found the solution:
Normal export.
Normal upgrade procedure.

su - postgres

# Upgrade tsearch2
# 
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
createdb wikidb
psql wikidb < /usr/share/pgsql/contrib/tsearch2.sql

psql < pg_dumpall.sql postgres

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Thu, 15 May 2008, Gerhard Wiesinger wrote:


Hello!

I want to upgrade from 8.2 to 8.3.1 but I've problems:
I did a pg_dumpall but this doesn't work. I found the migration guide with a 
trick to load the new contrib/tsearch2 module. But how is this done exactly?

-
http://www.postgresql.org/docs/8.3/interactive/textsearch-migration.html
The old contrib/tsearch2 functions and other objects must be suppressed when 
loading pg_dump output from a pre-8.3 database. While many of them won't load 
anyway, a few will and then cause problems. One simple way to deal with this 
is to load the new contrib/tsearch2 module before restoring the dump; then it 
will block the old objects from being loaded.

-

Some other link I've found:
http://sql-info.de/postgresql/notes/converting-tsearch2-to-8.3.html
http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00509.php

http://translate.google.com/translate?u=http%3A%2F%2Fwww.nabble.com%2FAtualiza%25C3%25A7%25C3%25A3o-de-Mediawiki-para-8.3-td15722654.html&hl=en&ie=UTF8&sl=es&tl=en
http://www.postgresql.org/docs/8.3/interactive/textsearch-migration.html
http://www.postgresql.org/docs/current/static/tsearch2.html#AEN102824
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
http://people.planetpostgresql.org/greg/index.php?/archives/123-MediaWiki-is-Postgres-8.3-compatible.html

Thank you.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Tsearch2 Upgrade from 8.2 to 8.3.1

2008-05-15 Thread Gerhard Wiesinger

Hello!

I want to upgrade from 8.2 to 8.3.1 but I've problems:
I did a pg_dumpall but this doesn't work. I found the migration guide with 
a trick to load the new contrib/tsearch2 module. But how is this done 
exactly?

-
http://www.postgresql.org/docs/8.3/interactive/textsearch-migration.html
The old contrib/tsearch2 functions and other objects must be suppressed 
when loading pg_dump output from a pre-8.3 database. While many of them 
won't load anyway, a few will and then cause problems. One simple way to 
deal with this is to load the new contrib/tsearch2 module before restoring 
the dump; then it will block the old objects from being loaded.

-

Some other link I've found:
http://sql-info.de/postgresql/notes/converting-tsearch2-to-8.3.html
http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00509.php

http://translate.google.com/translate?u=http%3A%2F%2Fwww.nabble.com%2FAtualiza%25C3%25A7%25C3%25A3o-de-Mediawiki-para-8.3-td15722654.html&hl=en&ie=UTF8&sl=es&tl=en
http://www.postgresql.org/docs/8.3/interactive/textsearch-migration.html
http://www.postgresql.org/docs/current/static/tsearch2.html#AEN102824
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
http://people.planetpostgresql.org/greg/index.php?/archives/123-MediaWiki-is-Postgres-8.3-compatible.html

Thank you.

Ciao,
Gerhard

--
http://www.wiesinger.com/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tools for dumping pg_xlog, pg_clog, etc?

2007-11-25 Thread Gerhard Wiesinger

Hello!

Since the pg_filedump utility manages only the database file I'm currently 
writing a pg_xlog/pg_clog viewer.


I'm looking for the documentation in:
// See src/include/access/xlog.h
// See src/include/access/xlog_internal.h
// See src/include/access/xlogdefs.h
// See src/backend/access/transam/xlog.c

// See src/include/access/clog.h
// See src/backend/access/transam/clog.c

// See src/include/access/rmgr.h
// See src/backend/access/transam/rmgr.c

But I'm still having problems to get a suitable output:
1.) pg_xlog:
a.) Reading XLogPageHeaderData or XLogLongPageHeaderData works well
b.) Afterwards a XLogRecord should be expected but the output doesn't look 
suitable (e.g. CRC has 00 in it in different files which is not 
suitable)

I'll expect in b.):
 * The overall layout of an XLOG record is:
 *  Fixed-size header (XLogRecord struct)
 *  rmgr-specific data
 *  BkpBlock
 *  backup block data
 *  BkpBlock
 *  backup block data
 *  ...

Any ideas of the correct structure?


2.) pg_clog:
I'm not sure what the correct structures of pg_clog are.
Any ideas?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Sat, 26 May 2007, Gerhard Wiesinger wrote:


Hello!

I think I found a proper utility for that: pg_filedump
http://sources.redhat.com/rhdb/utilities.html

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Fri, 18 May 2007, Gerhard Wiesinger wrote:


Hello!

Are there any tools available to dump the files of the pg_xlog, pg_clog, 
... directories in human readable format to understand how transaction 
handling is done?


Thanx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Tools for dumping pg_xlog, pg_clog, etc?

2007-05-26 Thread Gerhard Wiesinger

Hello!

I think I found a proper utility for that: pg_filedump
http://sources.redhat.com/rhdb/utilities.html

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Fri, 18 May 2007, Gerhard Wiesinger wrote:


Hello!

Are there any tools available to dump the files of the pg_xlog, pg_clog, ... 
directories in human readable format to understand how transaction handling 
is done?


Thanx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] Tools for dumping pg_xlog, pg_clog, etc?

2007-05-18 Thread Gerhard Wiesinger

Hello!

Are there any tools available to dump the files of the pg_xlog, pg_clog, 
... directories in human readable format to understand how transaction 
handling is done?


Thanx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] Internals of PostgreSQL - Documentation or presentations

2007-05-13 Thread Gerhard Wiesinger

Hello!

Are there some presentations or documents of the internals of PostgreSQL 
available?


Especially I'm looking for the concepts and detailed internals of general 
transaction handling, internals of commit log, transaction logs, 
pg_multixact, pg_subtrans, pg_tblspc and pg_twophase.


Also some comments about concurrent access with multiple processes, 
locking and shared memory concepts (or other communication systems used) 
would be nice.


I already found
http://www.postgresql.org/files/developer/internalpics.pdf
but some comments are missing to understand it well.

Thanx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Problem with data corruption and psql memory usage

2007-05-09 Thread Gerhard Wiesinger

Hello Tom!

I don't think this is a hardware problem. Machine runs 24/7 for around 4 
years without any problems, daily backup with GBs of data to it, 
uptimes to the next kernel security patch, etc.


The only problem I could believe is:
I'm running the FC7 test packages of postgresql in FC6 and maybe there is 
a slight glibc library conflict or any other incompatibility.


Ciao,
Gerhard

--
http://www.wiesinger.com/


On Wed, 9 May 2007, Tom Lane wrote:


Gerhard Wiesinger <[EMAIL PROTECTED]> writes:

LOG:  could not fsync segment 0 of relation 1663/16386/42726: Input/output
error


[ raised eyebrow... ]  I think your machine is flakier than you believe.
This error is particularly damning, but the general pattern of weird
failures all over the place seems to me to fit the idea of hardware
problems much better than any other explanation.  FC6 and PG 8.2.3 are
both pretty darn stable for most people, so there's *something* wrong
with your installation, and unstable hardware is the first thing that
comes to mind.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] Problem with data corruption and psql memory usage

2007-05-09 Thread Gerhard Wiesinger

Hello!

I'm new to Postgresql and I did make some import with about 2.8 
Mio with normal insert commands.


Config was (difference from default config):
listen_addresses = '*'
temp_buffers = 20MB# min 800kB
work_mem = 20MB# min 64kB
maintenance_work_mem = 32MB# min 1MB
fsync = off# turns forced synchronization on or off
full_page_writes = off
wal_buffers = 20MB

It crashed with a core dump (ulimit -c 0):
LOG:  server process (PID 12720) was terminated by signal 11
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server 
process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server proc

ess exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.
WARNING:  terminating connection because of crash of another server 
process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server proc

ess exited abnormally and possibly corrupted shared memory.

Afterwards I got the following error messages:
WARNING:  index "table_pkey" contains 2572948 row versions, but 
table contains 2572949 row versions

HINT:  Rebuild the index with REINDEX.

LOG:  server process (PID 13794) was terminated by signal 11
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server 
process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server proc

ess exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.


LOG:  could not fsync segment 0 of relation 1663/16386/42726: Input/output 
error

ERROR:  storage sync failed on magnetic disk: Input/output error

ERROR:  could not access status of transaction 808464434
DETAIL:  Could not open file "pg_clog/0303": No such file or directory.

Afterwards I got:
ERROR:  could not access status of transaction 5526085

There were also some coredumps afterwards where I have a stack trace:
#0  0x0807d241 in heap_deform_tuple ()
#1  0x08095b8c in toast_delete ()
#2  0x0809432e in heap_delete ()
#3  0x0814bfa4 in ExecutorRun ()
#4  0x081d7ece in FreeQueryDesc ()
#5  0x081d80c1 in FreeQueryDesc ()
#6  0x081d8979 in PortalRun ()
#7  0x081d4480 in pg_parse_query ()
#8  0x081d5a57 in PostgresMain ()
#9  0x081ad4fe in ClosePostmasterPorts ()
#10 0x081ae307 in PostmasterMain ()
#11 0x0816dec0 in main ()

So my questions are:
1.) Are my settings to aggresive (fsync=off, full_page_writes=off)?
2.) Should PostgreSQL also recover with these 2 options enabled on a core 
dump or is data corruption normally with these settings?

3.) Any ideas for the reason of coredumps?

Write access was only from one session at a time. I only did select 
count(*) from table from other sessions.


Afterwards I cleaned up the tables, pg_dumpall/restore session, 
initdb and disabled these 2 settings and everything went fine.


I also had a problem with psql:
psql < file.sql
=> psql took around 2GB virtual memory with heavy swapping. After 
Ctrl-C, restarting, it worked well. Any ideas?


Machine is stable so I would say that a hardware failure is not the 
problem.


Postgresql version is 8.2.3 on FC6

Thank you for the answer.

Ciao,
Gerhard

--
http://www.wiesinger.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Fw: PostgreSQL Performance Tuning

2007-04-26 Thread Gerhard Wiesinger

Hello!

I would do the following (in that order):
1.) Check for a performant application logic and application design (e.g. 
degree of granularity of the Java Hibernate Mapping, are there some 
object iterators with hundreds of objects, etc.)
2.) Check the hibernate generated queries and whether the query is 
suitable or not. Also do a "explain query" do see the query plan.


Sometimes a manually generated is much more efficient than hibernate ones.

3.) Optimize the database e.g. postgresql.

Ciao,
Gerhard

--
http://www.wiesinger.com/


On Thu, 26 Apr 2007, Shohab Abdullah wrote:


Dear,
We are facing performance tuning problem while using PostgreSQL Database
over the network on a linux OS.
Our Database consists of more than 500 tables with an average of 10K
records per table with an average of 20 users accessing the database
simultaneously over the network. Each table has indexes and we are
querying the database using Hibernate.
The biggest problem is while insertion, updating and fetching of records,
ie the database performance is very slow. It take a long time to respond
in the above scenario.
Please provide me with the tuning of the database. I am attaching my
postgresql.conf file for the reference of our current configuration



Please replay me ASAP
Regards,
  Shohab Abdullah
  Software Engineer,
   Manufacturing SBU-POWAI
   Larsen and Toubro Infotech Ltd.| 4th floor, L&T Technology Centre,
Saki Vihar Road, Powai, Mumbai-400072
 (:  +91-22-67767366 | (:  +91-9870247322
   Visit us at : http://www.lntinfotech.com
ÿÿI cannot predict future, I cannot change past, I have just the present
moment, I must treat it as my last"

The information contained in this email has been classified:
[ X] L&T Infotech General Business
[] L&T Infotech Internal Use Only
[] L&T Infotech Confidential
[] L&T Infotech Proprietary
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message.

__


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq