Re: [GENERAL] vacuumdb --analyze-only scans all pages?
On 29.12.2016 16:10, Tom Lane wrote: Adrian Klaver writes: On 12/28/2016 11:54 PM, Gerhard Wiesinger wrote: vacuumdb --analyze-only --all --verbose INFO: analyzing "public.log" INFO: "log": scanned 3 of 30851 pages, containing 3599899 live rows and 0 dead rows; 3 rows in sample, 3702016 estimated total rows INFO: analyzing "public.log_details" INFO: "log_details": scanned 2133350 of 2133350 pages, containing 334935843 live rows and 0 dead rows; 300 rows in sample, 334935843 estimated total rows INFO: analyzing "public.log_details_str" INFO: "log_details_str": scanned 3 of 521126 pages, containing 3601451 live rows and 0 dead rows; 3 rows in sample, 62560215 estimated total rows Any ideas why? I would say because the '300 rows in sample' where spread out over all 2133350 pages. Worth pointing out here is that you must have a custom statistics target set on log_details to make it want a sample so much larger than the default. If you feel ANALYZE is taking too long, you should reconsider whether you need such a large target. Thanx Tom and Adrian Yes, there is a custom statistic target of 1 set, I guess for some reasons some time ago to overcome a performance problem after upgrade from 8.3 to 8.4. Thanx Tom for pointing that out. Good query to find it out: SELECT n.nspname AS schemaname, CASE WHEN cl.relkind = 'r' THEN CAST('TABLE' AS VARCHAR(20)) WHEN cl.relkind = 'i' THEN CAST('INDEX' AS VARCHAR(20)) WHEN cl.relkind = 'S' THEN CAST('SEQUENCE' AS VARCHAR(20)) WHEN cl.relkind = 's' THEN CAST('SPECIAL' AS VARCHAR(20)) WHEN cl.relkind = 'v' THEN CAST('VIEW' AS VARCHAR(20)) WHEN cl.relkind = 't' THEN CAST('TOAST TABLE' AS VARCHAR(18)) ELSE null END AS object_type, cl.relname, attname, attstattarget FROM pg_attribute a LEFT OUTER JOIN pg_class cl ON a.attrelid = cl.oid LEFT OUTER JOIN pg_namespace n ON n.oid = cl.relnamespace WHERE (cl.relkind = 'r' OR cl.relkind = 'i' OR cl.relkind = 't') AND attnum > 0 -- only regular columns AND n.nspname = 'public' -- public schema only AND NOT(relname ILIKE 'pgstatspack_%') AND cl.relkind = 'r' -- TABLE AND attstattarget <> -1 -- non default values only ORDER BY n.nspname, cl.relname, attnum ; BTW: It looks like that the statistics target is multiplied by 300 to get the number of rows, is that true (didn't find any documentation about that)? https://www.postgresql.org/docs/current/static/planner-stats.html -- ALTER TABLE public.log_details ALTER COLUMN fk_id SET STATISTICS 1; -- ALTER TABLE public.log_details ALTER COLUMN fk_keyid SET STATISTICS 1; -- Default is 100, means 300*100=3 rows (30k) -- Max ss 1, means 300*1=300 rows (3 Mio) ALTER TABLE public.log_details ALTER COLUMN fk_id SET STATISTICS -1; ALTER TABLE public.log_details ALTER COLUMN fk_keyid SET STATISTICS -1; https://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET Thnx. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] vacuumdb --analyze-only scans all pages?
Hello, PostgreSQl 9.6.1: after a pg_dump/restore procedure it scans all pages (at least for some of the tables, analyze-only switch is specified). I would expect that only the sample rows are scanned. "log_details": scanned 2133350 of 2133350 pages vacuumdb --analyze-only --all --verbose INFO: analyzing "public.log" INFO: "log": scanned 3 of 30851 pages, containing 3599899 live rows and 0 dead rows; 3 rows in sample, 3702016 estimated total rows INFO: analyzing "public.log_details" INFO: "log_details": scanned 2133350 of 2133350 pages, containing 334935843 live rows and 0 dead rows; 300 rows in sample, 334935843 estimated total rows INFO: analyzing "public.log_details_str" INFO: "log_details_str": scanned 3 of 521126 pages, containing 3601451 live rows and 0 dead rows; 3 rows in sample, 62560215 estimated total rows Any ideas why? Thnx. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] CTE and function
Hello, I'm trying to convert a select after a CTE into a function for generic use. The CTE is normally a complex query but I want to capsulate then the calculation of the Gini coefficient it into a function: Based on: http://www.heckler.com.br/blog/2010/06/15/gini-coeficient-having-fun-in-both-sql-and-python/ Details at: https://en.wikipedia.org/wiki/Gini_coefficient = OK WITH tab AS ( SELECT unnest(ARRAY[1,2,3,4]) AS col ) SELECT ((N+1.0)/(N-1.0)) - ((PiXi_sum/(N*(N-1)*u))*2.0) AS gini_coefficient FROM ( SELECT SUM(PiXi) AS PiXi_sum, COUNT(*) AS N, (SELECT AVG(col) FROM tab) AS u FROM ( SELECT row_number() OVER() * col AS PiXi FROM (SELECT col FROM tab ORDER BY col DESC) t1 ) t2 ) t3 ; = OK: Create function CREATE OR REPLACE FUNCTION gini_coefficient(IN table_name text, IN column_name text, OUT gini_coefficient DOUBLE PRECISION) AS $$ BEGIN EXECUTE format(' SELECT ((N+1.0)/(N-1.0)) - ((PiXi_sum/(N*(N-1)*u))*2.0) AS gini_coefficient FROM ( SELECT SUM(PiXi) AS PiXi_sum, COUNT(*) AS N, (SELECT AVG(%s) FROM %s) AS u FROM ( SELECT row_number() OVER() * col AS PiXi FROM (SELECT %s FROM %s ORDER BY %s DESC) t1 ) t2 ) t3 ; ', column_name, table_name, column_name, table_name, column_name) INTO gini_coefficient; END $$ LANGUAGE plpgsql; = NOT OK: WITH tab AS ( SELECT unnest(ARRAY[1,2,3,4]) AS col ) SELECT gini_coefficient('tab', 'col'); ERROR: relation "tab" does not exist LINE 13: (SELECT col FROM tab ORDER BY col DESC) t1 = NOT OK: WITH tab AS ( SELECT unnest(ARRAY[1,2,3,4]) AS col ) SELECT * FROM gini_coefficient('tab', 'col'); ERROR: relation "tab" does not exist LINE 13: (SELECT col FROM tab ORDER BY col DESC) t1 So it looks like the table tab from the CTE is not available in the function. Any ideas how to solve it and an explaination would be fine? Thank you. Ciao, Gerhard -- https://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD
On 13.12.2015 21:14, Bill Moran wrote: Wait ... this is a combined HTTP/Postgres server? You didn't mention that earlier, and it's kind of important. What evidence do you have that Postgres is actually the part of this system running out of memory? For me the complete picture doesn't look consistent. I don't see any such evidence in any of your emails, and (based on experience) I find it pretty likely that whatever is running under node is doing something in a horrifically memory-inefficient manner. Since you mention that you see nothing in the PG logs, that makes it even more likely (to me) that you're looking entirely in the wrong place. I'd be willing to bet a steak dinner that if you put the web server on a different server than the DB, that the memory problems would follow the web server and not the DB server. Changes in config: track_activity_query_size = 102400 work_mem = 100MB Ok, we restarted PostgreSQL and had it stopped for seconds, and logged top every second: When PostgreSQL was down nearly all memory was freed, looks good to me. So it is likely that node and other processes are not the cause. Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free Swap: 512M Total, 477M Used, 35M Free, 93% Inuse When PostgreSQL restarted, Inactive was growing fast (~1min): Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M Free Swap: 512M Total, 472M Used, 41M Free, 92% Inuse After some few minutes we are back again at the same situation: Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free Swap: 512M Total, 472M Used, 41M Free, 92% Inuse The steak dinner is mine :-) Donating to the PostgreSQL community :-) Any further ideas, I don't think this is normal system behaviour. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD
On 13.12.2015 18:17, Tom Lane wrote: Gerhard Wiesinger writes: Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed? Just judging from the name of the function, I would bet this is a direct result of having only 512M of swap configured. As Bill already pointed out, that's a pretty useless choice on a system with 32G of RAM. As soon as the kernel tries to push out any significant amount of idle processes, it's gonna be out of swap space. The numbers you show above prove that it is almost out of free swap already. The system wasn't designed by me, I wouldn't do it either that way. Does swapoff help? Also, while that 20G of "inactive" pages may be candidates for reuse, they probably can't actually be reused without swapping them out ... and there's noplace for that data to go. There is no log in syslog (where postgres log) when swap_pager_getswapspace is logged. But why do we have 20G of Inactive pages? They are still allocated by kernel or user space. As you can see below (top output) NON Postgres processes are around 9G in virtual size, resident even lower. The system is nearly idle, and the queries typically aren't active after one second agin. Therefore where does the rest of the 11G of Inactive pages come from (if it isn't a Postgres/FreeBSD memory leak)? I read that Postgres has it's own memory allocator: https://www.reddit.com/r/programming/comments/18zija/github_got_30_better_performance_using_tcmalloc/ Might that be an issue with double allocation/freeing and the "cheese hole" topic with memory fragmentation? https://www.opennet.ru/base/dev/fbsdvm.txt.html inactivepages not actively used by programs which are dirty and (at some point) need to be written to their backing store (typically disk). These pages are still associated with objects and can be reclaimed if a program references them. Pages can be moved from the active to the inactive queue at any time with little adverse effect. Moving pages to the cache queue has bigger consequences (note 1) https://unix.stackexchange.com/questions/134862/what-do-the-different-memory-counters-in-freebsd-mean Active: Memory currently being used by a process Inactive: Memory that has been freed but is still cached since it may be used again. If more Free memory is required, this memory can be cleared and become free. This memory is not cleared before it is needed, because "free memory is wasted memory", it doesn't cost anything to keep the old data around in case it is needed again. Wired: Memory in use by the Kernel. This memory cannot be swapped out (GW: including ZFS cache!!!) Cache: Memory being used to cache data, can be freed immediately if required Buffers: Disk cache Free: Memory that is completely free and ready to use. Inactive, Cache and Buffers can become free if they are cleaned up. Thnx. Ciao, Gerhard last pid: 7277; load averages: 0.91, 0.96, 1.02 up 18+06:22:31 18:57:54 135 processes: 2 running, 132 sleeping, 1 waiting Mem: 8020M Active, 19G Inact, 3537M Wired, 299M Cache, 1679M Buf, 38M Free Swap: 512M Total, 501M Used, 12M Free, 97% Inuse PID USERNAMETHR PRI NICE SIZERES STATE C TIMEWCPU COMMAND 77941 pgsql 5 200 7921M 7295M usem7 404:32 10.25% postgres 79570 pgsql 1 200 7367M 6968M sbwait 6 4:24 0.59% postgres 73449 pgsql 1 270 7367M 6908M sbwait 4 8:08 5.08% postgres 74209 pgsql 1 270 7367M 6803M sbwait 0 6:37 1.46% postgres 74207 pgsql 1 250 7367M 6748M sbwait 6 6:34 1.76% postgres 74206 pgsql 1 220 7367M 6548M sbwait 5 6:44 1.86% postgres 73380 pgsql 1 230 7367M 7265M sbwait 4 8:15 1.17% postgres 74208 pgsql 1 240 7367M 7223M sbwait 1 6:30 4.59% postgres 79569 pgsql 1 240 7367M 7105M sbwait 3 4:36 1.17% postgres 74210 pgsql 1 290 7363M 7182M sbwait 5 6:41 5.47% postgres 73479 pgsql 1 220 7363M 6560M sbwait 6 7:14 3.56% postgres 83030 pgsql 1 200 7329M 193M sbwait 5 0:00 0.00% postgres 76178 pgsql 1 200 7323M 7245M sbwait 2 3:44 0.00% postgres 75867 pgsql 1 200 7323M 7245M sbwait 2 3:45 0.00% postgres 75869 pgsql 1 200 7323M 7245M sbwait 2 3:46 0.00% postgres 75883 pgsql 1 200 73
Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD
Hello Bill, Thank you for your response, comments inline: On 13.12.2015 16:05, Bill Moran wrote: On Sun, 13 Dec 2015 09:57:21 +0100 Gerhard Wiesinger wrote: some further details from the original FreeBSD 10.1 machine: Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAND 77941 pgsql 5 200 7925M 7296M usem2 352:34 6.98% postgres: username dbnamee 127.0.0.1(43367) (postgres) I see no evidence of an actual leak here. Each process is basically using the 7G of shared_buffers you have allocated in the config (which is only 7G _total_ for all processes, since it's shared memory) OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed? Out of memory: kernel: swap_pager_getswapspace(4): failed kernel: swap_pager_getswapspace(8): failed kernel: swap_pager_getswapspace(3): failed Main issue is IHMO (as far as I understood the FreeBSD Memory system) that 20G are INACTIVE. When I subtract the shared memory, even ~13GB should be available, but they are still allocated but inactive (INACTIVE). INACTIVE memory might be clean or dirty. As we get into out of memory situations it is likely that the memory is dirty (otherwise it would have been reused). Not quite correct. Inactive memory is _always_ available for re-use. Are you sure that's true? Monitoring inactive memory: cat vm_stat.sh #!/usr/bin/env bash while [ 1 ]; do date +%Y.%m.%d.%H.%M.%S sysctl -a | grep vm.stats.vm. sleep 1 done And even we get out of memory with swap_pager_getswapspace Inactive Memory (from the log file) is around 20GB (doesn't go down or up) vm.stats.vm.v_inactive_count: 5193530 (*4k pages is around 20GB) Then we have 20GB inactive memory, but we still get out of memory with kernel: swap_pager_getswapspace(4): failed. Any ideas why? Config: Memory: 32GB, Swap: 512MB Probably not the cause of this problem, but this is a non-optimal layout. If you're going to use swap at all, it should generally be 2x the size of your RAM for optimal performance under memory pressure. Yes, we will try to disable it totally. Nevertheless why do we get out of memory/Swap? maintenance_work_mem = 512MB effective_cache_size = 10GB work_mem = 892MB I expect that this value is the cause of the problem. The scenario you describe below is sorting a large table on an unindexed column, meaning it will have to use all that work_mem. I'd be interested to see the output of: EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 10; That was only a test query, has nothing to do with production based query. They are mostly SELECT/INSERTS/UPDATES on primary keys. But even without that information, I'd recommend you reduce work_mem to about 16M or so. Why so low? E.g. sorting on reporting or some long running queries are then done on disk and not in memory. wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7080MB max_connections = 80 autovacuum_max_workers = 3 [snip] We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple worker processes connected via persistent connections to PostgreSQL, they perform just simple queries with SELECT on primary keys and simple INSERTS/UPDATES. That's not at all the test scenario you show below. The scenario below is a large sort operation on a non-indexed column, which is vastly different than a single-row fetch based on an index. Yes, that non indexed select was just for testing purporeses. Normally nearly all the workers are idle but they still consume the maximum configured work mem on the PostgreSQL server and the memory is also resident. I see no evidence of that in your top output. Each process has a reference to the 7G of shared_buffers you allocated, which is memory shared by all processes, and is expected. I'm not as familiar with Linux top, but the results appear to be the same. Yes, might be the case, but if it is n times 7G shared memory then we have ~20GB Inactive Memory available, so plenty of memory. And why are we getting: kernel: swap_pager_getswapspace(4): failed? Thnx. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD
Hello, some further details from the original FreeBSD 10.1 machine: Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAND 77941 pgsql 5 200 7925M 7296M usem2 352:34 6.98% postgres: username dbnamee 127.0.0.1(43367) (postgres) 75862 pgsql 1 200 7321M 7266M select 4 7:04 0.00% postgres: checkpointer process(postgres) 75863 pgsql 1 200 7321M 7260M select 4 3:34 0.00% postgres: writer process(postgres) 75867 pgsql 1 200 7323M 7237M sbwait 0 3:19 0.00% postgres: username dbnamee ipaddress(58977) (postgres) 76178 pgsql 1 200 7323M 7237M sbwait 2 3:18 0.00% postgres: username dbnamee ipaddress(35807) (postgres) 75868 pgsql 1 200 7323M 7237M sbwait 0 3:20 0.00% postgres: username dbnamee ipaddress(58978) (postgres) 75884 pgsql 1 200 7323M 7236M sbwait 3 3:20 0.00% postgres: username dbnamee ipaddress(59908) (postgres) 75869 pgsql 1 200 7323M 7236M sbwait 3 3:20 0.00% postgres: username dbnamee ipaddress(58982) (postgres) 76181 pgsql 1 200 7323M 7236M sbwait 2 3:18 0.00% postgres: username dbnamee ipaddress(35813) (postgres) 75883 pgsql 1 200 7323M 7236M sbwait 0 3:20 0.00% postgres: username dbnamee ipaddress(59907) (postgres) 76180 pgsql 1 200 7323M 7236M sbwait 1 3:19 0.00% postgres: username dbnamee ipaddress(35811) (postgres) 76177 pgsql 1 200 7323M 7236M sbwait 1 3:18 0.00% postgres: username dbnamee ipaddress(35712) (postgres) 76179 pgsql 1 200 7323M 7236M sbwait 5 3:18 0.00% postgres: username dbnamee ipaddress(35810) (postgres) 64951 pgsql 1 750 7375M 662M CPU11 0:01 11.47% postgres: username dbnamee 127.0.0.1(32073) (postgres) 64950 pgsql 1 770 7325M 598M CPU66 0:02 16.55% postgres: username dbnamee 127.0.0.1(44251) (postgres) Out of memory: kernel: swap_pager_getswapspace(4): failed kernel: swap_pager_getswapspace(8): failed kernel: swap_pager_getswapspace(3): failed Main issue is IHMO (as far as I understood the FreeBSD Memory system) that 20G are INACTIVE. When I subtract the shared memory, even ~13GB should be available, but they are still allocated but inactive (INACTIVE). INACTIVE memory might be clean or dirty. As we get into out of memory situations it is likely that the memory is dirty (otherwise it would have been reused). Config: Memory: 32GB, Swap: 512MB maintenance_work_mem = 512MB effective_cache_size = 10GB work_mem = 892MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7080MB max_connections = 80 autovacuum_max_workers = 3 Thnx. Ciao, Gerhard On 13.12.2015 08:49, Gerhard Wiesinger wrote: Hello, We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple worker processes connected via persistent connections to PostgreSQL, they perform just simple queries with SELECT on primary keys and simple INSERTS/UPDATES. Normally nearly all the workers are idle but they still consume the maximum configured work mem on the PostgreSQL server and the memory is also resident. If some other queries get in we get into out of memory situations. So it looks like PostgreSQL has memory leaks. I found a test scenario to reproduce it also on a newer FreeBSD 10.2 VM as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5): Executions in psql with one persisent connection: -- Create the table CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM generate_Series(1,1) s; -- Create the index CREATE INDEX ON t_random(s); -- Restart psql with a new connection: -- Memory goes slighty up after each execution even after canceling: -- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on cancel the query or multiple execution SELECT * FROM t_random ORDER BY md5 LIMIT 10; -- Therefore I created a function: CREATE OR REPLACE FUNCTION execmultiplei(IN num int8) RETURNS void AS $$ BEGIN -- RAISE NOTICE 'num=%', num; FOR i IN 1..num LOOP PERFORM * FROM t_random WHERE s = i; END LOOP; END; $$ LANGUAGE plpgsql; -- Test it several times SELECT execmultiplei(1000); -- Linux testing (FreeBSD is similar), relevant part is RES (resident memory): PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND -- after startup of psql 26851 postgres 20 0 2363276 7432 6292 S 0.0 0.2 0:00.00 postgres: postgres postgres [local] idle -- Memory goes up, ok so far 26851 postgres 20 0 2365732 255152 253548 R 99.0 6.3 0:10.77 postgres: postgres postgres [local] SELECT 26851 postgres 20 0 2365732 408464 406788 R 100.0 10.1 0:17.81 postgres: postgres postgres [local] SELECT 26851 postgres 20 0 2365732 864472 862576 R 100.0 21.4 0:38.90 postgres
[GENERAL] Memory Leak executing small queries without closing the connection
Hello, We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple worker processes connected via persistent connections to PostgreSQL, they perform just simple queries with SELECT on primary keys and simple INSERTS/UPDATES. Normally nearly all the workers are idle but they still consume the maximum configured work mem on the PostgreSQL server and the memory is also resident. If some other queries get in we get into out of memory situations. So it looks like PostgreSQL has memory leaks. I found a test scenario to reproduce it also on a newer FreeBSD 10.2 VM as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5): Executions in psql with one persisent connection: -- Create the table CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM generate_Series(1,1) s; -- Create the index CREATE INDEX ON t_random(s); -- Restart psql with a new connection: -- Memory goes slighty up after each execution even after canceling: -- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on cancel the query or multiple execution SELECT * FROM t_random ORDER BY md5 LIMIT 10; -- Therefore I created a function: CREATE OR REPLACE FUNCTION execmultiplei(IN num int8) RETURNS void AS $$ BEGIN -- RAISE NOTICE 'num=%', num; FOR i IN 1..num LOOP PERFORM * FROM t_random WHERE s = i; END LOOP; END; $$ LANGUAGE plpgsql; -- Test it several times SELECT execmultiplei(1000); -- Linux testing (FreeBSD is similar), relevant part is RES (resident memory): PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND -- after startup of psql 26851 postgres 20 0 2363276 7432 6292 S 0.0 0.2 0:00.00 postgres: postgres postgres [local] idle -- Memory goes up, ok so far 26851 postgres 20 0 2365732 255152 253548 R 99.0 6.3 0:10.77 postgres: postgres postgres [local] SELECT 26851 postgres 20 0 2365732 408464 406788 R 100.0 10.1 0:17.81 postgres: postgres postgres [local] SELECT 26851 postgres 20 0 2365732 864472 862576 R 100.0 21.4 0:38.90 postgres: postgres postgres [local] SELECT -- Function execmultiplei and transaction terminated, but memory still allocated!!! 26851 postgres 20 0 2365732 920668 918748 S 0.0 22.7 0:41.40 postgres: postgres postgres [local] idle -- Calling it again 26851 postgres 20 0 2365732 920668 918748 R 99.0 22.7 0:46.51 postgres: postgres postgres [local] SELECT -- idle again, memory still allocated 26851 postgres 20 0 2365732 920668 918748 S 0.0 22.7 1:22.54 postgres: postgres postgres [local] idle Memory will only be released if psql is exited. According to the PostgreSQL design memory should be freed when the transaction completed. top commands on FreeBSD: top -SaPz -o res -s 1 top commands on Linux: top -o RES d1 Config: VMs with 4GB of RAM, 2 vCPUs shared_buffers = 2048MB # min 128kB effective_cache_size = 2GB work_mem = 892MB wal_buffers = 8MB checkpoint_segments = 16 Any ideas? Thank you. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs
On Fri, 23 Mar 2012, Tom Lane wrote: Merlin Moncure writes: On Fri, Mar 23, 2012 at 11:35 AM, Gerhard Wiesinger wrote: So I think XID overflow should be planned for one of the next PostgreSQL releases. two mitigating factors: 1. read only transactions do not increment xid counter Yes. Ask your admin what his throughput of *data modifying* transactions is. I'll bet that number is a few orders of magnitude smaller again. That were all small writing transactions benchmarking the database (with the corrected values discussed). Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs
On Fri, 23 Mar 2012, John R Pierce wrote: On 03/23/12 9:17 AM, Gerhard Wiesinger wrote: So in that time autovacuum is triggered. autovacuum runs pretty much continuously in the background, its not an on/off thing. Yes, I know. I ment that it runs at least once in 1.5 days. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs
On 23.03.2012 11:16, Jan Kesten wrote: On 23.03.2012 06:45, Gerhard Wiesinger wrote: With a database admin of a commercial database system I've discussed that they have to provide and they also achieve 2^31 transactions per SECOND! As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they would turn around in about one second. Wow. What application issues that much transactions? And what is the database system that can handle that? I can't think of a single machine capable of this - and hardy believe postgresql can came close. 2^31 transactions mean that a single one lasts 0.5ns. Even the fastest DDR3-2133 has cycle times of 4ns. I have seen a database monster in action - 43 trillion (academic) transactions per day, but that's only 5*10^8 transactions per second, under a quarter of 2^31 per second. So, I can't answer your question - but you triggered my curiosity :-) I'm just answering in one of the posts ... Ok, talked again to the admin and he was wrong with 3 zeros and per minute :-) So corrected data are: 1 Mio transaction per minute. 1Mio/60s=1 transactions/s 2^32*60/1E6=257698s to get to the XID overflow which are 2.98 days (~1.5 days when 2^31 with signed int is the border) So in that time autovacuum is triggered. Nevertheless we are getting into the area where XID overflow is an issue in the near future. In your example with 5E8 transactions per second overflow will be in 4s (2^31) or 8s (2^32) ... So I think XID overflow should be planned for one of the next PostgreSQL releases. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs
On 23.03.2012 11:16, Jan Kesten wrote: On 23.03.2012 06:45, Gerhard Wiesinger wrote: With a database admin of a commercial database system I've discussed that they have to provide and they also achieve 2^31 transactions per SECOND! As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they would turn around in about one second. Wow. What application issues that much transactions? And what is the database system that can handle that? I can't think of a single machine capable of this - and hardy believe postgresql can came close. 2^31 transactions mean that a single one lasts 0.5ns. Even the fastest DDR3-2133 has cycle times of 4ns. I have seen a database monster in action - 43 trillion (academic) transactions per day, but that's only 5*10^8 transactions per second, under a quarter of 2^31 per second. So, I can't answer your question - but you triggered my curiosity :-) I'm just answering in one of the posts ... Ok, talked again to the admin and he was wrong with 3 zeros and per minute :-) So corrected data are: 1 Mio transaction per minute. 1Mio/60s=1 transactions/s 2^32*60/1E6=257698s to get to the XID overflow which are 2.98 days (~1.5 days when 2^31 with signed int is the border) So in that time autovacuum is triggered. Nevertheless we are getting into the area where XID overflow is an issue in the near future. In your example with 5E8 transactions per second overflow will be in 4s (2^31) or 8s (2^32) ... So I think XID overflow should be planned for one of the next PostgreSQL releases. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PostgreSQL 64 Bit XIDs - Transaction IDs
Hello, With a database admin of a commercial database system I've discussed that they have to provide and they also achieve 2^31 transactions per SECOND! As PostgreSQL uses transaction IDs (XIDs) in the range of 2^31 they would turn around in about one second. How can one achieve this with PostgreSQL? What is the status and plan of 64 Bits XIDs? I saw that 64 Bit XIDs were already discussed in 2005 but never found into the code: http://postgresql.1045698.n5.nabble.com/A-couple-of-patches-for-PostgreSQL-64bit-support-td2214264.html Thnx. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Double iteration in the buffer cache code
Hello, Did someone fix the double iteration in the buffer cache code as discussed at in the meantime: http://www.mail-archive.com/pgsql-general@postgresql.org/msg137230.html Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] OT: Re: [sfpug] 10/13 SFPUG meeting, "The Mighty GUCS," video now available
On Tue, 20 Oct 2009, Christophe Pettus wrote: On Oct 20, 2009, at 8:32 AM, Viktor Rosenfeld wrote: @Christophe, I enjoyed your talk very much, particularly because I learned about pgfouine, which from the looks of it, will make my current project vastly simpler. So, thank you. You should really thank Josh Berkus; he's the one who gave the talk. I just run the camera. :) Josh has the slides and sample files, and I'm sure he'll post them shortly. I know he's been busy with a client emergency and the PostgreSQL conference this weekend. They are already available from: http://www.pgexperts.com/presentations.html http://www.pgexperts.com/document.html?id=34 http://www.pgexperts.com/document.html?id=36 Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Function returning 2 columns evaluated twice when both columns are needed
On Mon, 19 Oct 2009, Tom Lane wrote: Gerhard Wiesinger writes: On Sun, 18 Oct 2009, Tom Lane wrote: The OFFSET bit is a kluge, but is needed to keep the planner from flattening the subquery and undoing your work. Thnx Tom. It also works without the OFFSET kludge. Any ideas why? Probably because you have the function declared VOLATILE. None of the function is declared VOLATILE. Any other idea? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Function returning 2 columns evaluated twice when both columns are needed
On Sun, 18 Oct 2009, Tom Lane wrote: Gerhard Wiesinger writes: Since getSums() is a cursor and is complex and takes long time getSums should only be evaluated once. Is there a better solution available to get both columns from the function in the select? You need a sub-select, along the lines of SELECT cur_date, (gs).sum_m1, (gs).sum_m2 FROM ( SELECT cur_date, getSums(start_ts, stop_ts) AS gs FROM getDatesTimestamps($1, $2) OFFSET 0 ) AS ss ; The OFFSET bit is a kluge, but is needed to keep the planner from flattening the subquery and undoing your work. Thnx Tom. It also works without the OFFSET kludge. Any ideas why? Looks also reasonable to me because there is no SETOF returned. BTW: Why is the function in the original statement evaluated twice? On "SELECT table.*" I guess query is also executed once and not n times (for each column). PG is version 8.3.8. Thnx. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Function returning 2 columns evaluated twice when both columns are needed
Hello, I'm having a problem with the following: CREATE TYPE Sums AS (sum_m1 double precision, sum_m2 double precision); CREATE TYPE date_m1_m2 AS (cur_date date, sum_m1 double precision, sum_m2 double precision); CREATE OR REPLACE FUNCTION getSums(IN start_ts timestamp with time zone, IN stop_ts timestamp with time zone) RETURNS Sums AS $$ ... CREATE OR REPLACE FUNCTION getsumInterval(date, date) RETURNS SETOF date_m1_m2 AS $$ SELECT cur_date, (getSums(start_ts, stop_ts)).* -- No optimal since function is evaluated 2 times => 24s -- getSums(start_ts, stop_ts) -- in one column and not usable as I need 2 columns, but takes only 12s FROM getDatesTimestamps($1, $2) ; $$ LANGUAGE SQL; Since getSums() is a cursor and is complex and takes long time getSums should only be evaluated once. Is there a better solution available to get both columns from the function in the select? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 10/13 SFPUG meeting, "The Mighty GUCS," video now available
On Fri, 16 Oct 2009, Christophe Pettus wrote: The video archive for the 10/13 SFPUG meeting, "The Mighty GUCS: A guide to the essential PostgreSQL settings you need to know," is now available: http://thebuild.com/blog/2009/10/16/the-mighty-gucs/ It's also available on Vimeo: http://vimeo.com/7109722 Hello, Can you also upload the sample config files and the presentation. Thnx. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 10/13 SFPUG meeting, "The Mighty GUCS," video now available
On Fri, 16 Oct 2009, Christophe Pettus wrote: The video archive for the 10/13 SFPUG meeting, "The Mighty GUCS: A guide to the essential PostgreSQL settings you need to know," is now available: http://thebuild.com/blog/2009/10/16/the-mighty-gucs/ It's also available on Vimeo: http://vimeo.com/7109722 Hello, Can you also upload the sample config files and the presentation. Thnx. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
On Fri, 9 Oct 2009, Greg Smith wrote: On Sat, 3 Oct 2009, Gerhard Wiesinger wrote: I wouldn't read 128k blocks all the time. I would do the following: When e.g. B0, B127, B256 should be read I would read in 8k random block I/O. When B1, B2, B3, B4, B5, B7, B8, B9, B10 are needed I would make 2 requests with the largest possible blocksize: 1.) B1-B5: 5*8k=40k 2.) B7-B10: 4*8k=32k I see what you mean now. This is impossible in the current buffer manager implementation because blocks are requested one at a time, and there are few situations where you can predict which are going to be wanted next. The hash index and sequential scan are two that were possible to predict in that way. The fadvise patches already committed didn't change the way blocks were read in, they just used knowledge about what was coming next to advise the OS. That's quite a bit different from actually asking for things in larger chunks and only using what you need. Implementing larger chunking reads or similar asynchronous batch I/O is a big project, because you'd have to rearchitect the whole way buffers are managed in the database to do it right. Greg Stark's earliest proof of concept prototype for async I/O included a Solaris implementation that used the AIO library. It wasn't feasible to actually use that underlying implemention in the database in the end though, because the model AIO uses expects you'll fire off a bunch of I/O and then retrieve blocks as they come in. That's really not easy to align with the model for how blocks are read into shared_buffers right now. He had some ideas for that and I've thought briefly about the problem, but it would be a major overhaul to some scary to touch database internals to pull off. Given that the OS and/or RAID implementations tend to do what we want in a lot of these cases, where smarter/chunkier read-ahead is what we you need, the payback on accelerating those cases hasn't been perceived as that great. There is a major win for the hash index reads, which Solaris systems can't take advantage of, so somebody who uses those heavily on that OS might be motivated enough produce improvements for that use case. Once the buffer cache at large understood how to handle batching async reads, Solaris AIO would be possible, fancier stuff with Linux AIO would be possible, and the type of chunking reads you're suggesting would be too. But none of that is happening without some major rearchitecting first. Unfortunately there aren't that many people with the right knowledge and motivation to start tinkering around with the buffer cache internals to the extent that would be required to do better here, and pretty much of them I'm aware of are hacking on projects with a much clearer payback instead. I've one idea, which is not ideal, but may work and shouldn't be much effort to implement: As in the example above we read B1-B5 and B7-B10 on a higher level outside of normal buffer management with large request sizes (e.g. where hash index scans and sequential scans are done). As the blocks are now in cache normal buffer management is very fast: 1.) B1-B5: 5*8k=40k 2.) B7-B10: 4*8k=32k So we are reading for 1.): B1-B5 in one 40k block (typically from disk), afterwards we read B1, B2, B3, B4, B5 in 8k chunks from cache again. The disadvantage is of course that we have more read requests more maybe performance is even better because the normal buffer requests are from cache. A second disadvantage is the "bad" design. But I think performance will be even better. And a configuration option to enable this might also be interesting. Maybe I will try that with pgiosim whether performance is better or not. What do you think about it? Is the idea clear? Thnx. Ciao Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?
On Mon, 5 Oct 2009, Greg Smith wrote: On Sun, 4 Oct 2009, Gerhard Wiesinger wrote: On Fri, 2 Oct 2009, Scott Marlowe wrote: I found that lowering checkpoint completion target was what helped. Does that seem counter-intuitive to you? I set it to 0.0 now. If you set that to 0.0, the whole checkpoing spreading logic doesn't apply like it's supposed to. I'm not sure what the results you posted mean now. If you had it set to 0 and saw a bad spike (which is how I read your message), I'd say "yes, that's what happens when you do reduce that parameter, so don't do that". If you meant something else please clarify. I think the problem is, that it is done on checkpoint time (whether spread or not). I should have been already be done by bgwriter. Thanks for the dtrace example, I suggested we add those checkpoint probes in there and someone did, but I hadn't seen anybody use them for anything yet. I think more probes (e.g. on different writing conditions like writing from bgwriter or on a checkpoint) would be interesting here. Bug1: usage_count is IHMO not consistent It's a bit hack-ish, but the changes made to support multiple buffer use strategies introduced by the "Make large sequential scans and VACUUMs work in a limited-size ring" commit are reasonable even if they're not as consistent as we'd like. Those changes were supported by benchmarks proving their utility, which always trump theoretical "that shouldn't work better!" claims when profiling performance. Also, they make sense to me, but I've spent a lot of time staring at pg_buffercache output to get a feel for what shows up in there under various circumstances. That's where I'd suggest you go if this doesn't seem right to you; run some real database tests and use pg_buffercache to see what's inside the cache when you're done. What's in there and what I expected to be in there weren't always the same thing, and it's interesting to note how that changes as shared_buffers increases. I consider some time studying that a pre-requisite to analyzing performance of this code. I have analyzed pg_buffercache (query every second, see below) in parallel to see what happens but I didn't see expected results in some ways with the usage_counts. Therefore I analyzed the code and found IHMO the problem with the usage_count and buffer reallocation. Since the code change is also new (I think it way 05/2009) it might be that you tested before ... BTW: Is it possible to get everything in pg_class over all databases as admin? Bug2: Double iteration of buffers As you can seen in the calling tree below there is double iteration with buffers involved. This might be a major performance bottleneck. Hmmm, this might be a real bug causing scans through the buffer cache to go twice as fast as intended. That's not twice O(2*n)=O(n) that's a factor n*n (outer and inner loop iteration) which means overall is O(n^2) which is IHMO too much. Since the part you suggest is doubled isn't very intensive or called all that often, there's no way it can be a major issue though. It is a major issue since it is O(n^2) and not O(n). E.g. with 2GB share buffer we have 262144 blocks and 68719476736 calls which is far too much. That's based on knowing what the code does and how much it was called, as well as some confidence that if it were really a *major* problem, it would have shown up on the extensive benchmarks done on all the code paths you're investigating. The problem might be hidden for the following reasons: 1.) Buffers values are too low that even n^2 is low for today's machines 2.) Code is not often called in that way 3.) backend writes out pages so that the code is never executed 4.) ... BTW: Are there some tests available how fast a buffer cache hit is and a disk cache hit is (not in the buffer cache but in the disk cache)? I'll asked, because a lot of locking is involved in the code. I did some once but didn't find anything particularly interesting about the results. Since you seem to be on a research tear here, it would be helpful to have a script to test that out available, I wasn't able to release mine and something dtrace based would probably be better than the approach I used (I threw a bunch of gettimeofdata calls into the logs and post-processed them with a script). Do you have an where one should set tracepoints inside and outside PostgreSQL? BTW2: Oracle buffercache and background writer strategy is also interesting. As a rule, we don't post links to other database implementation details here, as those can have patented design details we'd prefer not to intentionally re-implement. Much of Oracle's design here doesn't apply here anyway, as it was done in the
Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?
On Fri, 2 Oct 2009, Greg Smith wrote: On Fri, 2 Oct 2009, Scott Marlowe wrote: I found that lowering checkpoint completion target was what helped. Does that seem counter-intuitive to you? I set it to 0.0 now. Generally, but there are plenty of ways you can get into a state where a short but not immediate checkpoint is better. For example, consider a case where your buffer cache is filled with really random stuff. There's a sorting horizon in effect, where your OS and/or controller makes decisions about what order to write things based on the data it already has around, not really knowing what's coming in the near future. Ok, if checkpoint doesn't block anything on normal operation time doesn't really matter. Let's say you've got 256MB of cache in the disk controller, you have 1GB of buffer cache to write out, and there's 8GB of RAM in the server so it can cache the whole write. If you wrote it out in a big burst, the OS would elevator sort things and feed them to the controller in disk order. Very efficient, one pass over the disk to write everything out. But if you broke that up into 256MB write pieces instead on the database side, pausing after each chunk was written, the OS would only be sorting across 256MB at a time, and would basically fill the controller cache up with that before it saw the larger picture. The disk controller can end up making seek decisions with that small of a planning window now that are not really optimal, making more passes over the disk to write the same data out. If the timing between the DB write cache and the OS is pathologically out of sync here, the result can end up being slower than had you just written out in bigger chunks instead. This is one reason I'd like to see fsync calls happen earlier and more evenly than they do now, to reduce these edge cases. The usual approach I take in this situation is to reduce the amount of write caching the OS does, so at least things get more predictable. A giant write cache always gives the best average performance, but the worst-case behavior increases at the same time. There was a patch floating around at one point that sorted all the checkpoint writes by block order, which would reduce how likely it is you'll end up in one of these odd cases. That turned out to be hard to nail down the benefit of though, because in a typical case the OS caching here trumps any I/O scheduling you try to do in user land, and it's hard to repeatibly generate scattered data in a benchmark situation. Ok, on a basic insert test and a systemtap script (http://www.wiesinger.com/opensource/systemtap/postgresql-checkpoint.stp) checkpoint is still a major I/O spike. Buffers between : Sun Oct 4 18:29:50 2009, synced 55855 buffer(s), flushed 744 buffer(s) between checkpoint Checkpoint start: Sun Oct 4 18:29:50 2009 Checkpoint end : Sun Oct 4 18:29:56 2009, synced 12031 buffer(s), flushed 12031 buffer(s) Buffers between : Sun Oct 4 18:30:20 2009, synced 79000 buffer(s), flushed 0 buffer(s) between checkpoint Checkpoint start: Sun Oct 4 18:30:20 2009 Checkpoint end : Sun Oct 4 18:30:26 2009, synced 10753 buffer(s), flushed 10753 buffer(s) Buffers between : Sun Oct 4 18:30:50 2009, synced 51120 buffer(s), flushed 1007 buffer(s) between checkpoint Checkpoint start: Sun Oct 4 18:30:50 2009 Checkpoint end : Sun Oct 4 18:30:56 2009, synced 11899 buffer(s), flushed 11912 buffer(s) Ok, I further had a look at the code to understand the behavior of the buffercache and the background writer since that wasn't logically. So as far as I saw the basic algorithm is: 1.) Normally (non checkpoints) only dirty and non recently used pages (usage_count == 0) are flushed to disk. I think that's basically fine as a strategy as indexes might update blocks more than once. It's also ok that blocks are written and not flushed (well be done on checkpoint time). 2.) At checkpoints write out all dirty buffers and flush all previously written and newly written. Also spreading I/O seems also ok to me now. BUT: I think I've found 2 major bugs in the implementation (or I didn't understand something correctly). Codebase analyzed is 8.3.8 since I currently use it. ## Bug1: usage_count is IHMO not consistent ## I think this has been introduced with: http://git.postgresql.org/gitweb?p=postgresql.git;a=blobdiff;f=src/backend/storage/buffer/bufmgr.c;h=6e6b862273afea40241e410e18fd5d740c2b1643;hp=97f7822077de683989a064cdc624a025f85e54ab;hb=ebf3d5b66360823edbdf5ac4f9a119506fccd4c0;hpb=98ffa4e9bd75c81243
Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
On Fri, 2 Oct 2009, Simon Riggs wrote: On Sun, 2009-09-27 at 18:05 +0200, Gerhard Wiesinger wrote: So I saw, that even on sequential reads (and also on bitmap heap scan acces) PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck. A commercial software database vendor solved the problem by reading multiple continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 5 seconds on an equivalent "sequence scan": Is systemtap counting actual I/Os or just requests to access 8192 blocks once in OS cache? Postgres doesn't read more than one block at a time into its buffer pool, so those numbers of requests look about right. As far as I know these are VFS reads. So some reads might be from cache but since I did all requests should be from disk: echo 3 > /proc/sys/vm/drop_caches;service postgresql restart do benchmark Same for all benchmarks because I don't want to measure cache performance of OS or of the DB to benchmark. Therefore all requests (except reaing twice or more but that shouldn't be the case and would also be fine as cache hit) are from disk and not from the cache. There is belief here that multi-block I/O was introduced prior to OS doing this as a standard mechanism. Linux expands its read ahead window in response to sequential scans and so this seems like something we don't want to do in the database. I played even with large values on block device readaheads of /dev/md*, /dev/sd* and /dev/dm-* as well as stripe_cache_size of /dev/md* but without any performance improvements in the benmark scenarios. => All readaheads/caches don't seem to work in at least in the HEAP BITMAP SCAN scenarios on nearly latest Linux kernels. But I think such block issues (reading in largest blocks as possible) have to be optimized on application level (in our case DB level) because 1.) We can't assume that OS and even storage works well in such scenarios 2.) We can't assume that OS/storage is intelligent enough to reduce number of IOPS when 2 random blocks are at random 2 sequential blocks and that therefore the number of IOPS is reduced. 3.) I think such a logic should be very easy to integrate and even has been done with some patches. It's possible this is wrong. Is the table being scanned fairly sizable and was it allocated contiguously? i.e. was it a large table loaded via COPY? I also wonder if more L2 cache effects exist. What do you mean with "table being scanned fairly sizable"? I don't get it. Table was filled with a lot of inserts but at one time point. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
On Fri, 2 Oct 2009, Greg Smith wrote: On Fri, 2 Oct 2009, Gerhard Wiesinger wrote: Larger blocksizes also reduce IOPS (I/Os per second) which might be a critial threshold on storage systems (e.g. Fibre Channel systems). True to some extent, but don't forget that IOPS is always relative to a block size in the first place. If you're getting 200 IOPS with 8K blocks, increasing your block size to 128K will not result in your getting 200 IOPS at that larger size; the IOPS number at the larger block size is going to drop too. And you'll pay the penalty for that IOPS number dropping every time you're accessing something that would have only been an 8K bit of I/O before. Yes, there will be some (very small) drop in IOPS, when blocksize is higher but today disks have a lot of throughput when IOPS*128k are compared to e.g. 100MB/s. I've done some Excel calculations which support this. The trade-off is very application dependent. The position you're advocating, preferring larger blocks, only makes sense if your workload consists mainly of larger scans. Someone who is pulling scattered records from throughout a larger table will suffer with that same change, because they'll be reading a minimum of 128K even if all they really needed with a few bytes. That penalty ripples all the way from the disk I/O upwards through the buffer cache. I wouldn't read 128k blocks all the time. I would do the following: When e.g. B0, B127, B256 should be read I would read in 8k random block I/O. When B1, B2, B3, B4, B5, B7, B8, B9, B10 are needed I would make 2 requests with the largest possible blocksize: 1.) B1-B5: 5*8k=40k 2.) B7-B10: 4*8k=32k In this case when B5 and B7 are only one block away we could also discuss whether we should read B1-B10=10*8k=80k in one read request and don't use B6. That would reduce the IOPS of a factor of 4-5 in that scenario and therefore throughput would go up. It's easy to generate a synthetic benchmark workload that models some real-world applications and see performance plunge with a larger block size. There certainly are others where a larger block would work better. Testing either way is complicated by the way RAID devices usually have their own stripe sizes to consider on top of the database block size. Yes, there are block device read ahead buffers and also RAID stripe caches. But both don't seem to work well with the tested HEAP BITMAP SCAN scenario and also in practical PostgreSQL performance measurement scenarios. But the modelled pgiosim isn't a synthetic benchmark it is the same as a real work HEAP BITMAP SCAN scenario in PostgreSQL where some blocks are read directly consecutive at least logically in the filesystem (and with some propability also physically on disk) but currently only with each 8k block read even when 2 or more blocks could be read with one request. BTW: I would also limit the blocksize to some upper limit on such requests (e.g. 1MB). Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?
On Fri, 2 Oct 2009, Greg Smith wrote: On Sun, 27 Sep 2009, Gerhard Wiesinger wrote: Lowering bgwriter_delay is possible, but I think overhead is too much and still there is a limit of 800MB/s involved: Stuff written by the background writer turns into largely random I/O. 800MB/s of random writes is so large of a number it's only recently become remotely possible; a RAID0 of SSD devices might manage it. No hardware available until very recently had any hope of getting that sort of performance. In any case, I would wager you'll run into one of many other bottlenecks in PostgreSQL and/or currently available system/disk hardware long before the background writer limit gets important. Of course, 800MB/s are a theoretical max. limit I could thought of. But with SSDs this might be possible. So in fact I think bgwriter_lru_maxpages should be limited to 10 if limited at all. The current limit is based on the assumption that people will set it to values way too high if allowed, to the point where it's counterproductive. That's exactly what people used to do with early background writer designs. I think you're wandering down the same road, where what it actually does and what you think it does are not the same thing at all. Much of the important disk I/O coming out of the database should be related to checkpoints, not the background writer, and there is no limit on that I/O. In my experience flushing I/O as soon as possible is the best solution. Think of the following scenario: You currently limit bgwriter at 4MB/s but you would have about 10MB/s random I/O capacity (a normal low cost system). So utilitzzation would be only 40% and you could write even more. At checkpoint time you would get a spike which the I/O system couldn't handle at all and performance goes down to nearly zero because of the I/O spike (e.g. 500% of available I/O needed). IHMO such scenarios should be avoided. If you think you've got a situation where the current limits are not sufficient, the path to argue that would start with showing what you're seeing in pg_stat_bgwriter. I can imagine some results from there on a system with a very high rate of I/O available that would suggest the current limits are too small. I've never come close to actually seeing such results in the real world though, and if you're not already monitoring those numbers on a real system I'd suggest you start there rather than presuming there's a design limitation here. On an nearly idle database with sometimes some performance tests: SELECT buffers_checkpoint/buffers_clean AS checkpoint_spike, ROUND(100.0*buffers_checkpoint/(buffers_checkpoint + buffers_clean + buffers_backend),2) AS checkpoint_percentage, ROUND(100.0*buffers_clean/(buffers_checkpoint + buffers_clean + buffers_backend),2) AS pg_writer_percentage, ROUND(100.0*buffers_backend/(buffers_checkpoint + buffers_clean + buffers_backend),2) AS backend_percentage FROM pg_stat_bgwriter ; checkpoint_spike | checkpoint_percentage | pg_writer_percentage | backend_percentage --+---+--+ 31 | 90.58 | 2.92 | 6.50 So flushing happens typically at checkpoint time. In 6.5%of all blocks were put by the backend on disk which says IHMO: pgwriter is to slow, backend has to do the work now. So I'd like to do some tests with new statistics. Any fast way to reset statistics for all databases for pg_stat_pgwriter? Thnx. Ciao, Gerhard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
On Fri, 2 Oct 2009, Greg Smith wrote: On Sun, 27 Sep 2009, Gerhard Wiesinger wrote: I think this is one of the most critical performance showstopper of PostgreSQL on the I/O side. I wish, this is an easy problem compared to the real important ones that need to be resolved. Situations where the OS is capable of faster sequential I/O performance than PostgreSQL appears to deliver doing reads are often caused by something other than what the person doing said benchmarking believes they are. For example, the last time I thought I had a smoking gun situation just like the one you're describing, it turns out the background operation I didn't know was going on that slowed things down were hint bit updates: http://wiki.postgresql.org/wiki/Hint_Bits Background checkpoints can also cause this, typically if you set checkpoint_segments really high and watch when they're happening you can avoid that interfering with results too. It's hard to isolate out the cause of issues like this. Since most people seem to get something close to real disk speed from sequential scans when measured properly, I would suggest starting with the assumption there's something wrong with your test case rather than PostgreSQL. The best way to do that is to construct a test case others can run that shows the same problem on other systems using the database itself. The easiest way to build one of those is using generate_series to create some bogus test data, SELECT everything in there with \timing on, and then use the size of the relation on disk to estimate MB/s. Regardless, it's easy enough to build PostgreSQL with larger block sizes if you think that really matters for your situation. You're never going to see that in the mainstream version though, because there are plenty of downsides to using larger blocks. And since the database doesn't actually know where on disk things are at, it's not really in a good position to make decisions about I/O scheduling anyway. More on that below. What's the current status of the patch of Gregory Stark? Any timeframes to integrate? There needs to be a fairly major rearchitecting of how PostgreSQL handles incoming disk I/O for that to go anywhere else, and I don't believe that's expected to be ready in the near future. Does it also work for sequence scans? Any plans for a generic "multi block read count" solution? There was a similar patch for sequential scans submitted by someone else based on that work. It was claimed to help performance on a Linux system with a rather poor disk I/O setup. No one else was able to replicate any performance improvement using the patch though. As far as I've been able to tell, the read-ahead logic being done by the Linux kernel and in some hardware is already doing this sort of optimization for you on that OS, whether or not your app knows enough to recognize it's sequentially scanning the disk it's working against. I forgot to mention: Larger blocksizes also reduce IOPS (I/Os per second) which might be a critial threshold on storage systems (e.g. Fibre Channel systems). You would get e.g. the throughput from the storage with large block sizes (less IOPS) but with small block sizes the IOPS limit is reached and throughput performance goes down. Example: With 100MB/s and 8k blocks you need 12500 IOPS which is a lot (e.g. at least 90 disks with 140 IOPS)! When blocks can be read with e.g. 128k block size 781 IOPS are sufficient (6 disks are sufficient)! So this makes a major difference. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
On Fri, 2 Oct 2009, Greg Smith wrote: On Sun, 27 Sep 2009, Gerhard Wiesinger wrote: I think this is one of the most critical performance showstopper of PostgreSQL on the I/O side. I wish, this is an easy problem compared to the real important ones that need to be resolved. Situations where the OS is capable of faster sequential I/O performance than PostgreSQL appears to deliver doing reads are often caused by something other than what the person doing said benchmarking believes they are. For example, the last time I thought I had a smoking gun situation just like the one you're describing, it turns out the background operation I didn't know was going on that slowed things down were hint bit updates: http://wiki.postgresql.org/wiki/Hint_Bits Background checkpoints can also cause this, typically if you set checkpoint_segments really high and watch when they're happening you can avoid that interfering with results too. It's hard to isolate out the cause of issues like this. Since most people seem to get something close to real disk speed from sequential scans when measured properly, I would suggest starting with the assumption there's something wrong with your test case rather than PostgreSQL. The best way to do that is to construct a test case others can run that shows the same problem on other systems using the database itself. The easiest way to build one of those is using generate_series to create some bogus test data, SELECT everything in there with \timing on, and then use the size of the relation on disk to estimate MB/s. Regardless, it's easy enough to build PostgreSQL with larger block sizes if you think that really matters for your situation. You're never going to see that in the mainstream version though, because there are plenty of downsides to using larger blocks. And since the database doesn't actually know where on disk things are at, it's not really in a good position to make decisions about I/O scheduling anyway. More on that below. What's the current status of the patch of Gregory Stark? Any timeframes to integrate? There needs to be a fairly major rearchitecting of how PostgreSQL handles incoming disk I/O for that to go anywhere else, and I don't believe that's expected to be ready in the near future. Does it also work for sequence scans? Any plans for a generic "multi block read count" solution? There was a similar patch for sequential scans submitted by someone else based on that work. It was claimed to help performance on a Linux system with a rather poor disk I/O setup. No one else was able to replicate any performance improvement using the patch though. As far as I've been able to tell, the read-ahead logic being done by the Linux kernel and in some hardware is already doing this sort of optimization for you on that OS, whether or not your app knows enough to recognize it's sequentially scanning the disk it's working against. I've enhanced the pgiosim project http://pgfoundry.org/projects/pgiosim/ with a patch for larger blocksizes independent from PostgreSQL: http://www.wiesinger.com/opensource/pgiosim/pgiosim-0.2-blocksizes.diff You'll find some detailed results below and can verify this on your platforms with the patch above. Maybe someone can verify this on different HW/SW plattforms. If you have any questions regarding the pgiosim and the patch just feel free to ask. Summary: RANDOM I/O of blocksizes of e.g. 128k (e.g. BITMAP HEAP SCAN) has better performance than reading the same blocks with 8k block sizes (factor 1.5). Conclusio: In the test scenario the proposed solution would have a performance gain of a factor of 1.5 for typical BITMAP HEAP SCANS. For other scenarios no performance gain with larger block sizes of continuous blocks could be measured. Therefore I'm assuming that prefetching works well on Linux with sequential I/O but not with random I/O. I hope I can convince someone that such optimizations make sense as commercial database venders have implemented such features for performance reasons. BTW: Prefetch is enabled on the raid and blockdevices. Ciao, Gerhard -- http://www.wiesinger.com/ # RANDOM I/O 8k blocksize echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 test.txt Arg: 1 Added test.txt blocksize=8192, reading block as a whole Elapsed: 135.92 Read 1 blocks Wrote 0 blocks 73.57 op/sec, 588.60kB/sec # RANDOM I/O 8k blocksize (for verification only), in fact same test as below echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -r test.txt Arg: 1 Added test.txt blocksize=8192, doing single read requests with chunk size of 8192 bytes Elapsed: 136.30 Read 1 blocks Wrote 0 blocks 73.37 op/sec, 586.94kB/sec # RANDOM I/O 128k blocksize, read as one 128k block echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -o 131072 test.txt Arg: 1 Added test
Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
On Sun, 27 Sep 2009, Sam Mason wrote: On Sun, Sep 27, 2009 at 06:05:51PM +0200, Gerhard Wiesinger wrote: A google research has shown that Gregory Stark already worked on that issue (see references below) but as far as I saw only on bitmap heap scans. Greg Stark's patches are about giving the IO subsystem enough information about where the random accesses will be ending up next. This is important, but almost completely independent from the case where you know you're doing sequential IO, which is what you seem to be talking about. I'm talking about 2 cases 1.) Sequential scans 2.) Bitmap index scans which both hopefully end physically in blocks which are after each other and were larger block sizes can benefit. I think this is one of the most critical performance showstopper of PostgreSQL on the I/O side. PG's been able to handle data as fast as it can come back from the disk in my tests. When you start doing calculations then it will obviously slow down, but what you were talking about wouldn't help here either. Then again, I don't have a particularly amazing IO subsystem. What sort of performance do your disks give you and at what rate is PG doing sequential scans for you? Hello Sam, Detailed benchmarks are below, the original one from PostgreSQL have already been posted. So i would expect at least about 60-80MB in reading for PostgreSQL (when larger block sizes are read)in practical issues on sequence scans but they are at about 30MB/s. See also pgiosim below. Setup is: Disk Setup: SW RAID 5 with 3x1TB SATA 7200 RPM disks Linux Kernel: 2.6.30.5-43.fc11.x86_64 CPU: Quad Core: AMD Phenom(tm) II X4 940 Processor, 3GHz RAM: 8GB Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ http://pgfoundry.org/projects/pgiosim/ # # Performance benchmarks: # dd if=/dev/zero of=test.txt bs=8192 count=1310720 conv=fdatasync 1310720+0 records in 1310720+0 records out 10737418240 bytes (11 GB) copied, 169.482 s, 63.4 MB/s dd if=test.txt of=/dev/null bs=8192 1310720+0 records in 1310720+0 records out 10737418240 bytes (11 GB) copied, 86.4457 s, 124 MB/s # # RANDOM # # Random 8k block reads echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 test.txt Arg: 1 Added test.txt Elapsed: 148.22 Read 1 blocks Wrote 0 blocks 67.47 op/sec, 539.75kB/sec # Random 8k block reads & writes echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -w 100 test.txt Write Mode: 100% Arg: 1 Added test.txt Elapsed: 201.44 Read 1 blocks Wrote 1 blocks 49.64 op/sec, 397.14kB/sec # Random 8k block reads & writes, sync after each block echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -w 100 -y test.txt Write Mode: 100% fsync after each write Arg: 1 Added test.txt Elapsed: 282.30 Read 1 blocks Wrote 1 blocks 35.42 op/sec, 283.39kB/sec # # SEQUENTIAL # # Sequential 8k block reads echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 100 test.txt Seq Scan Arg: 1 Added test.txt Elapsed: 71.88 Read 100 blocks Wrote 0 blocks 13911.40 op/sec, 111291.17kB/sec # Sequential 8k block reads & writes echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 100 -w 100 test.txt Seq Scan Write Mode: 100% Arg: 1 Added test.txt Elapsed: 261.24 Read 100 blocks Wrote 100 blocks 3827.90 op/sec, 30623.18kB/sec # Sequential 8k block reads & writes, sync after each block echo 3 > /proc/sys/vm/drop_caches;./pgiosim -s -b 1 -w 100 -y test.txt Seq Scan Write Mode: 100% fsync after each write Arg: 1 Added test.txt Elapsed: 27.03 Read 1 blocks Wrote 1 blocks 369.96 op/sec, 2959.68kB/sec # -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Hello, As blocksizes, random I/O and linear I/O are critical I/O performance parameters I had a look on PostgreSQL and a commercial software vendor. Therefore I enhanced the system tap script: http://www.wiesinger.com/opensource/systemtap/disktop_gw.stp Output per 5 seconds on a sequence scan: UID PID PPID CMD DEVICETBYTES REQUESTSBYTES/REQ 26 4263 4166postmaster dm-1R168542208 20574 8192 => 32MB/s So I saw, that even on sequential reads (and also on bitmap heap scan acces) PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck. A commercial software database vendor solved the problem by reading multiple continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 5 seconds on an equivalent "sequence scan": UID PID PPID CMD DEVICETBYTES REQUESTSBYTES/REQ 1001 53811 process dm-1R277754638 2338 118800 => 53 MB/s A google research has shown that Gregory Stark already worked on that issue (see references below) but as far as I saw only on bitmap heap scans. I think this is one of the most critical performance showstopper of PostgreSQL on the I/O side. What's the current status of the patch of Gregory Stark? Any timeframes to integrate? Does it also work for sequence scans? Any plans for a generic "multi block read count" solution? Any comments? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ http://wiki.postgresql.org/wiki/Todo#Concurrent_Use_of_Resources http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php http://archives.postgresql.org/pgsql-hackers/2007-12/msg00395.php http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php http://archives.postgresql.org/pgsql-hackers/2007-12/msg00092.php http://archives.postgresql.org/pgsql-hackers/2007-12/msg00098.php http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:hz7uzhwxtkbzncy2+state:results http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:a5osy4qptxk2jgu3+state:results -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Hello, As blocksizes, random I/O and linear I/O are critical I/O performance parameters I had a look on PostgreSQL and a commercial software vendor. Therefore I enhanced the system tap script: http://www.wiesinger.com/opensource/systemtap/disktop_gw.stp Output per 5 seconds on a sequence scan: UID PID PPID CMD DEVICETBYTES REQUESTSBYTES/REQ 26 4263 4166postmaster dm-1R168542208 20574 8192 => 32MB/s So I saw, that even on sequential reads (and also on bitmap heap scan acces) PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck. A commercial software database vendor solved the problem by reading multiple continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 5 seconds on an equivalent "sequence scan": UID PID PPID CMD DEVICETBYTES REQUESTSBYTES/REQ 1001 53811 process dm-1R277754638 2338 118800 => 53 MB/s A google research has shown that Gregory Stark already worked on that issue (see references below) but as far as I saw only on bitmap heap scans. I think this is one of the most critical performance showstopper of PostgreSQL on the I/O side. What's the current status of the patch of Gregory Stark? Any timeframes to integrate? Does it also work for sequence scans? Any plans for a generic "multi block read count" solution? Any comments? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ http://wiki.postgresql.org/wiki/Todo#Concurrent_Use_of_Resources http://archives.postgresql.org/pgsql-hackers/2007-12/msg00027.php http://archives.postgresql.org/pgsql-hackers/2007-12/msg00395.php http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php http://archives.postgresql.org/pgsql-hackers/2007-12/msg00092.php http://archives.postgresql.org/pgsql-hackers/2007-12/msg00098.php http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:hz7uzhwxtkbzncy2+state:results http://markmail.org/message/a5osy4qptxk2jgu3#query:+page:1+mid:a5osy4qptxk2jgu3+state:results -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?
Hello, I think the limit of bgwriter_lru_maxpages of max. 1000 doesn't make any sense because in fact it limits performance of the database (version 8.3). On heavy write operations buffer cached must be freed. With the default config this is practically limited to: bgwriter_delay=200ms bgwriter_lru_maxpages=100 8k*bgwriter_lru_maxpages*1000/bgwriter_delay= =8k*100*1000/200=4000k=4MB/s Isn't that a major performancd bottleneck in default config? bgwriter_delay=200ms bgwriter_lru_maxpages=1000 8k*bgwriter_lru_maxpages*1000/bgwriter_delay= =8k*1000*1000/200=4k=40MB/s Still not a very high number for current I/O loads. Lowering bgwriter_delay is possible, but I think overhead is too much and still there is a limit of 800MB/s involved: bgwriter_delay=10ms bgwriter_lru_maxpages=1000 8k*bgwriter_lru_maxpages*1000/bgwriter_delay= =8k*1000*1000/10=80k=800MB/s So I think it would be better to have such a configuration: bgwriter_delay=50ms bgwriter_lru_maxpages=10 8k*bgwriter_lru_maxpages*1000/bgwriter_delay= =8k*10*1000/50=1600k=16000MB/s So in fact I think bgwriter_lru_maxpages should be limited to 10 if limited at all. Are my argumentations correct? Any comments? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problem with data corruption and psql memory usage
Hello Tom, Late answer, but answer :-) : Finally, it was a very strange hardware problem, where a very small part of RAM was defect but kernel never crashed. I had also a very strange behavior when verifying rpm packages with rpm -V. First I had the harddisk under suspicion. But then I flushed the OS caches: echo 3 > /proc/sys/vm/drop_caches and rpm -V was correct. => RAM issue. A memtest86+ showed very fast a defect RAM. So PostgreSQL didn't have any issue :-) Ciao, Gerhard -- http://www.wiesinger.com/ On Wed, 9 May 2007, Gerhard Wiesinger wrote: Hello Tom! I don't think this is a hardware problem. Machine runs 24/7 for around 4 years without any problems, daily backup with GBs of data to it, uptimes to the next kernel security patch, etc. The only problem I could believe is: I'm running the FC7 test packages of postgresql in FC6 and maybe there is a slight glibc library conflict or any other incompatibility. Ciao, Gerhard -- http://www.wiesinger.com/ On Wed, 9 May 2007, Tom Lane wrote: Gerhard Wiesinger writes: LOG: could not fsync segment 0 of relation 1663/16386/42726: Input/output error [ raised eyebrow... ] I think your machine is flakier than you believe. This error is particularly damning, but the general pattern of weird failures all over the place seems to me to fit the idea of hardware problems much better than any other explanation. FC6 and PG 8.2.3 are both pretty darn stable for most people, so there's *something* wrong with your installation, and unstable hardware is the first thing that comes to mind. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sum of multiplied deltas
Hello, Finally I used a function below which works well. Only one problem is left: It polutes the buffer cache because of the cursor. Any idea to get rid of this behavior? BTW: WINDOWING FUNCTION of 8.4 should help but noone could provide an examples how this could work. Any further comments how to implement it? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ CREATE TYPE PS AS ( sum_m1 double precision, sum_m2 double precision ); DROP FUNCTION getSum(); CREATE OR REPLACE FUNCTION getSum(IN start_ts timestamp with time zone, IN stop_ts timestamp with time zone) RETURNS PS AS $$ DECLARE curs CURSOR FOR SELECT * FROM log_entries WHERE datetime >= start_ts AND datetime <= stop_ts ORDER BY datetime ; row log_entries%ROWTYPE; i bigint = 0; datetime_old timestamp with time zone; old double precision; sum_m1 double precision = 0; sum_m2 double precision = 0; psum PS; BEGIN OPEN curs; LOOP FETCH curs INTO row; EXIT WHEN NOT FOUND; IF row.col IS NOT NULL THEN IF i > 0 THEN sum_m1 = sum_m1 + (row.col - old) * 0.01 * row.col2; sum_m2 = sum_m2 + EXTRACT('epoch' FROM row.datetime - datetime_old) * row.col3; END IF; i = i + 1; old = row.old; datetime_old = row.datetime; END IF; END LOOP; CLOSE curs; psum.sum_m1 = sum_m1; psum.sum_m2 = sum_m2; RETURN psum; END; $$ LANGUAGE plpgsql; On Mon, 8 Jun 2009, Gerhard Wiesinger wrote: Hello! I've the following data: datetime | val1 | val2 time1|4 | 40% time2|7 | 30% time3| 12 | 20% ... I'd like to sum up the following: (7-4)*30% + (12-7)*20% + ... datetime is ordered (and unique and has also an id). Rows are in the area of millions. How is it done best? 1.) Self join with one row shift? 2.) function? Any hint? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_buffercache - A lot of "unnamed" relfilenodes?
On Sat, 26 Sep 2009, Tom Lane wrote: Gerhard Wiesinger writes: Research on the pg_bufferache showed that a lot of pages don't have any assigned relfilenodesin pg_class, even when they are dirty (in this case inserts are done). There should never be any buffered pages for nonexistent tables. The method you are using will only show relnames for pages belonging to tables of the current database ... maybe there are a lot of dirty pages for other databases in the cluster? Or maybe the tables are newly created (not committed) and so you can't see their pg_class rows yet? OK, this is a visibility problem of the databases. Is it possible to GRANT visibility for e.g. one user to all databases for pg_class? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_buffercache - A lot of "unnamed" relfilenodes?
Hello, Research on the pg_bufferache showed that a lot of pages don't have any assigned relfilenodesin pg_class, even when they are dirty (in this case inserts are done). SELECT relname IS NULL AS relame_is_null, isdirty, COUNT(*) AS count FROM pg_buffercache b LEFT OUTER JOIN pg_class c ON b.relfilenode=c.relfilenode GROUP BY relame_is_null, isdirty ; relame_is_null isdirty count false false 8914 true false 7347 true true123 Any ideas of this behavior because inserts should have assigned relnames, shouldn't they? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problems with postgres online backup - restore
Hello Richard, OK, understood it and looks to me a good system with whole block overwrite and then the deltas. Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ On Fri, 26 Jun 2009, Richard Huxton wrote: Gerhard Wiesinger wrote: Hello, OK, what's then the difference doing a pg_start_backup() or just doing the backup? pg_start_backup() forces a checkpoint (and logs a label for your backup too). Isn't that a problem that the datablock are very inconsistent, even so inconsistent that they are corrupt: E.g. A part of a datablock is written when e.g. the file is tarred. => Datablock on backup is corrupt => An then even the WAL can't be applied. Why does it work correctly? Or is there some design problem? It works because the WAL doesn't hold a list of row updates ("update row 12345 set field 4 = true") but block updates. Any update to a disk block is recorded - table or index. The WAL doesn't really know anything about tables, columns, primary keys etc - just disk blocks. One small optimisation is that the first time a block is touched after a checkpoint the value of the whole block is written to WAL and after that only updates. So - if you have a checkpointed system (all updates guaranteed written to disk) and a complete set of WAL files from that point on you can always recreate the writes to any point in time after that. http://www.postgresql.org/docs/8.3/static/continuous-archiving.html -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problems with postgres online backup - restore
Hello, OK, what's then the difference doing a pg_start_backup() or just doing the backup? Isn't that a problem that the datablock are very inconsistent, even so inconsistent that they are corrupt: E.g. A part of a datablock is written when e.g. the file is tarred. => Datablock on backup is corrupt => An then even the WAL can't be applied. Why does it work correctly? Or is there some design problem? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ On Thu, 25 Jun 2009, Richard Huxton wrote: Gerhard Wiesinger wrote: Hello, I'd like to understand the PostgreSQL internals in "backup mode". When I understood it correctly pg_start_backup() make a checkpoint and stops writing to the data/ directory except the WAL. All new transaction go into WAL which is also logical. But how is data consistency done when the written/changed blocks don't fit into the buffer cache? The data isn't kept consistent. Which is why you need the WAL. Restoring from a PITR backup is basically the same idea as recovering from a crash. Any blocks that might have been updated since you called pg_start_backup() will be rewritten from the WAL. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problems with postgres online backup - restore
Hello, I'd like to understand the PostgreSQL internals in "backup mode". When I understood it correctly pg_start_backup() make a checkpoint and stops writing to the data/ directory except the WAL. All new transaction go into WAL which is also logical. But how is data consistency done when the written/changed blocks don't fit into the buffer cache? E.g. A lot of updates which exceed buffer cache (Where is data written except WAL since data should be kept constant?) SELECT FROM updated data: WHERE is the new data fetched from? Thnx for any explainations. Ciao, Gerhard -- http://www.wiesinger.com/ On Thu, 25 Jun 2009, Chris Barnes wrote: SELECT pg_switch_xlog(); SELECT pg_start_backup('postres_full_backup_June222009'); tar -czvf pgprd01_June22_2009_production.dmp.tar.gz data/ SELECT pg_stop_backup(); -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Implicit conversion/comparision of timestamp with and without timezone
On Sun, 21 Jun 2009, Richard Huxton wrote: Gerhard Wiesinger wrote: Hello, It is unclear to me how implicit conversion/comparision of timestamp with and without timezone works. It's not entirely clear where the problem is. You don't say the results you're getting or what you thought they should be. Ok, the "problem" is: -- datetime >= '2009-03-09 00:00:00.0' AND datetime < '2009-03-10 00:00:00.0' -- Index Cond: ((datetime >= '2009-03-09 00:00:00+01'::timestamp with time zone) AND (datetime < '2009-03-10 00:00:00+01'::timestamp with time zone)) -- datetime >= '2009-04-01 00:00:00.0' AND datetime < '2009-04-02 00:00:00.0' -- Index Cond: ((datetime >= '2009-04-01 00:00:00+02'::timestamp with time zone) AND (datetime < '2009-04-02 00:00:00+02'::timestamp with time zone)) I would have expected that the 1st conversion is done with my current timezone of the query time (CEST=+02). I don't see any problem with this since all ugly timezone calculation are already done by PostgreSQL, but it is nowhere (?) documented and was quite confusing me. So for me documentation looks like: A timestamp without any timezone information is converted into a timestamp with time zone information (explicitly or implicitly) with the with the time zone from the timestamp which has to be converted (and not from the time zone from now()). The date and time information is not touched. And: A timestamp with timezone information is converted into a timestamp without timezone information by removing just the timezone information. So the date and time information is not touched. Correct? BTW: I'm in CET (now CEST) timezone which is changed on last sunday in march 2:00 CET (and last sunday in October 3:00 CEST). show timezone; TimeZone --- Europe/Vienna -- -- datetime TIMESTAMP WITH TIME ZONE -- datetime entries are with UTC+01 and UTC+02 done -- 2009-03-09: UTC+01 -- 2009-06-12: UTC+02 -- current timezone: UTC+02 OK, so I've got this: => show timezone; TimeZone -- UTC+02 SELECT * FROM tstest; d 2009-03-08 23:00:00-02 2009-06-12 00:00:00-02 (2 rows) So - it's showing the timestamp with timezones you mentioned but in the current timezone. The UTC+02 actually being a -02 offset is just part of the standards afaik. Note that midnight 2009-03-09 is 2009-03-08 in the current timezone. SELECT datetime FROM table ORDER BY datetime DESC LIMIT 1; datetime 2009-06-21 14:54:00+02 So it should also be +02 at your test. Why is it -02 at your test? BTW: 2009-03-09 and 2009-03-08 are both in CET timezone in Europe (see above). So - if we run EXPLAIN on your queries that should show us how the values are getting typecast. => EXPLAIN SELECT * FROM tstest WHERE d >= '2009-03-09 00:00:00.0' AND d < '2009-03-10 00:00:00.0'; QUERY PLAN Seq Scan on tstest (cost=0.00..42.10 rows=11 width=8) Filter: ((d >= '2009-03-09 00:00:00-02'::timestamp with time zone) AND (d < '2009-03-10 00:00:00-02'::timestamp with time zone)) This comes back with zero rows because without an explicit timezone it assumes our current one. => EXPLAIN SELECT * FROM tstest WHERE d >= '2009-06-12 00:00:00.0' AND d < '2009-06-13 00:00:00.0'; QUERY PLAN Seq Scan on tstest (cost=0.00..42.10 rows=11 width=8) Filter: ((d >= '2009-06-12 00:00:00-02'::timestamp with time zone) AND (d < '2009-06-13 00:00:00-02'::timestamp with time zone)) This will match one row. Repeating the EXPLAIN on your other queries should show you what's happening. If I've missed the point of your question, can you say what results you get and what you think they should be. See above. For completeness: -- datetime >= TIMESTAMP WITHOUT TIME ZONE '2009-03-09 00:00:00.0' AND datetime < TIMESTAMP WITHOUT TIME ZONE '2009-03-10 00:00:00.0' -- Index Cond: ((datetime >= '2009-03-09 00:00:00'::timestamp without time zone) AND (datetime < '2009-03-10 00:00:00'::timestamp without time zone)) => Afterwars an implicit conversation to timestamp with tim
[GENERAL] Implicit conversion/comparision of timestamp with and without timezone
Hello, It is unclear to me how implicit conversion/comparision of timestamp with and without timezone works. -- -- datetime TIMESTAMP WITH TIME ZONE -- datetime entries are with UTC+01 and UTC+02 done -- 2009-03-09: UTC+01 -- 2009-06-12: UTC+02 -- current timezone: UTC+02 SELECT * FROM table WHERE -- currently in UTC+02 timezone, entries in UTC+01 timezone -- => works well with UTC+01 entry timezone, but why? datetime >= '2009-03-09 00:00:00.0' AND datetime < '2009-03-10 00:00:00.0' ; -- currently in UTC+02 timezone, entries in UTC+02 timezone -- => works well with UTC+02 entry timezone, but why? datetime >= '2009-06-12 00:00:00.0' AND datetime < '2009-06-13 00:00:00.0' ; -- Same result, unclear why datetime >= TIMESTAMP WITHOUT TIME ZONE '2009-03-09 00:00:00.0' AND datetime < TIMESTAMP WITHOUT TIME ZONE '2009-03-10 00:00:00.0' -- Same result, unclear why datetime >= TIMESTAMP WITH TIME ZONE '2009-03-09 00:00:00.0' AND datetime < TIMESTAMP WITH TIME ZONE '2009-03-10 00:00:00.0' -- Same result2, unclear why datetime >= TIMESTAMP WITHOUT TIME ZONE '2009-06-12 00:00:00.0' AND datetime < TIMESTAMP WITHOUT TIME ZONE '2009-06-13 00:00:00.0' -- Same result2, unclear why datetime >= TIMESTAMP WITH TIME ZONE '2009-06-12 00:00:00.0' AND datetime < TIMESTAMP WITH TIME ZONE '2009-06-13 00:00:00.0' How is implicit conversion done? With timezone of datetime or timezone of now() or timezone? Would it make a difference when datetime would be declared with TIMESTAMP WITHOUT TIME ZONE? -- Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sum of multiplied deltas
Any hints for an 8.3 environment (currently)? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ On Mon, 8 Jun 2009, David Fetter wrote: On Mon, Jun 08, 2009 at 08:35:20AM +0200, Gerhard Wiesinger wrote: Hello! I've the following data: datetime | val1 | val2 time1|4 | 40% time2|7 | 30% time3| 12 | 20% ... I'd like to sum up the following: (7-4)*30% + (12-7)*20% + ... This is best done in 8.4 using Windowing. Sadly, it's an 8.4-only feature, and dodgy hacks are the rule until you can use them. Cheers, David. datetime is ordered (and unique and has also an id). Rows are in the area of millions. How is it done best? 1.) Self join with one row shift? 2.) function? Any hint? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Sum of multiplied deltas
Hello! I've the following data: datetime | val1 | val2 time1|4 | 40% time2|7 | 30% time3| 12 | 20% ... I'd like to sum up the following: (7-4)*30% + (12-7)*20% + ... datetime is ordered (and unique and has also an id). Rows are in the area of millions. How is it done best? 1.) Self join with one row shift? 2.) function? Any hint? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] HOT question - insert/delete
The data isn't really historical, but some data have to be for e.g. some regulations after a period of time. But all the available data should be available for e.g. reporting. So partitioning doesn't make any sense in this case, right? Ciao, Gerhard -- http://www.wiesinger.com/ On Thu, 21 May 2009, Simon Riggs wrote: On Wed, 2009-05-20 at 16:01 -0400, Merlin Moncure wrote: On Wed, May 20, 2009 at 3:11 PM, Gerhard Wiesinger wrote: Hello! Is HOT in 8.3 used in the following scenario: INSERT ... DELETE ... at the same ratio. So for example current data is added every minute and all old data older than 2 years are deleted. Would the heap be bloated or is this avoided by the HOT feature of 8.3 and nearly kept constant? HOT doesn't help here...it only helps with updates and then only if you are updating fields that are not indexed. Partial vacuum, in 8.4, will deal with this situation, though partitioning does sound best for such clearly historical data. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] HOT question - insert/delete
Are there any plans to support this kind of scenario? Ciao, Gerhard -- http://www.wiesinger.com/ On Wed, 20 May 2009, Merlin Moncure wrote: On Wed, May 20, 2009 at 3:11 PM, Gerhard Wiesinger wrote: Hello! Is HOT in 8.3 used in the following scenario: INSERT ... DELETE ... at the same ratio. So for example current data is added every minute and all old data older than 2 years are deleted. Would the heap be bloated or is this avoided by the HOT feature of 8.3 and nearly kept constant? HOT doesn't help here...it only helps with updates and then only if you are updating fields that are not indexed. if your table has a rolling set of data, for example a log file...you probably want to look at table partitioning (either manual or built in). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] HOT question - insert/delete
Hello! Is HOT in 8.3 used in the following scenario: INSERT ... DELETE ... at the same ratio. So for example current data is added every minute and all old data older than 2 years are deleted. Would the heap be bloated or is this avoided by the HOT feature of 8.3 and nearly kept constant? Or is vacuum still needed for this scenario? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full backup - pg_dumpall sufficient?
Hello Ray, Yes, that's clear. But there was even some stuff which isn't dumped with pg_dumpall (as far as I read). So it was like to run 2 statements like: 1.) Run pg_dumpall 2.) Run pg_dump additionally ... Ciao, Gerhard -- http://www.wiesinger.com/ On Thu, 29 Jan 2009, Raymond O'Donnell wrote: On 29/01/2009 16:31, Gerhard Wiesinger wrote: I recently read some Mail on the mailinglist where some parts of PostgreSQL were not dumped with pg_dumpall and additionally some pg_dump was necessary (it was something like internals, catalog, etc.) It's the other way around - pg_dump dumps just the specified database, but not cluster-wide stuff like login roles; you need to do a pg_dumpall to get those as well. Ray. -- Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland r...@iol.ie Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals -- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full backup - pg_dumpall sufficient?
Hello! I recently read some Mail on the mailinglist where some parts of PostgreSQL were not dumped with pg_dumpall and additionally some pg_dump was necessary (it was something like internals, catalog, etc.) Any ideas what additionally has to be dumped to pg_dumpall for a full backup? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Description of transaction model for indexes
Hello! The transaction model is discussed in several areas: http://www.packtpub.com/article/transaction-model-of-postgresql The POSTGRES Data Model (1987) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.8578 The Design of POSTGRES (1986) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1422 However: Is there a paper available, how indexes are handled inside the transaction (e.g. visibility (not handled), concurrency, etc.)? E.g. What happens on SELECT/INSERT/UPDATE/DELETE? Some aspects can be found in: http://www.pgcon.org/2008/schedule/attachments/54_PGCon2008-HOT.ppt Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Conditional commit inside functions
Hello, Aren't there any drawbacks in postgrs on such large transaction (like in Oracle), e.g if I would use 500.000.000 or even more? Ciao, Gerhard -- http://www.wiesinger.com/ On Fri, 26 Dec 2008, Pavel Stehule wrote: Hello why do you need commit? pavel 2008/12/26 Gerhard Wiesinger : Hello! I tried the following, but still one transaction: SELECT insert_1Mio(); (parallel select count(id) from employee; is done) CREATE OR REPLACE FUNCTION insert_some(start_i INTEGER, end_i INTEGER) RETURNS void AS $func$ DECLARE BEGIN FOR i IN start_i..end_i LOOP INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i, 'John' || i, 'Smith' || i); END LOOP; END; $func$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION insert_1Mio() RETURNS void AS $func$ DECLARE maxcommit INTEGER; start_i INTEGER; end_i INTEGER; now_i INTEGER; BEGIN maxcommit := 1; start_i :=1; end_i := 100; now_i := start_i; FOR i IN start_i..end_i LOOP IF MOD(i, maxcommit) = 0 THEN PERFORM insert_some(now_i, i); now_i := i + 1; END IF; END LOOP; PERFORM insert_some(now_i, end_i); END; $func$ LANGUAGE plpgsql; Any ideas? Ciao, Gerhard -- http://www.wiesinger.com/ On Thu, 25 Dec 2008, Gerhard Wiesinger wrote: Hello! I want to translate the following Oracle PL/SQL script into plpgsql. Especially I'm having problems with the transaction thing. i tried START TRANSACTION and COMMIT without success. Any ideas? Thanx. Ciao, Gerhard CREATE OR REPLACE PROCEDURE insert_1Mio IS maxcommit NUMBER; BEGIN maxcommit := 1; FOR i IN 1..100 LOOP INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i, 'John' || to_char(i), 'Smith' || to_char(i)); IF MOD(i, maxcommit) = 0 THEN COMMIT; END IF; END LOOP; COMMIT; END; -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Information about Pages, row versions of tables, indices
WHEN cl.relkind = 'r' THEN relname WHEN cl.relkind = 'i' THEN (SELECT relname FROM pg_index pi, pg_class cx WHERE cl.oid=pi.indexrelid and pi.indrelid=cx.oid) WHEN cl.relkind = 't' THEN relname ELSE null END AS tablename, reltoastrelid as reltoastrelid, reltoastidxid as reltoastidxid, reltype AS reltype, reltablespace AS reltablespace, CASE WHEN cl.relkind = 'i' THEN 0.0 ELSE pg_relation_size(cl.oid) END AS tablesize, pg_relation_size(cl.oid), -- pg_relation_size(cl.relname) AS tablesize, CASE WHEN cl.relkind = 'r' THEN CAST('TABLE' AS VARCHAR(20)) WHEN cl.relkind = 'i' THEN CASE WHEN cl.oid in (SELECT pi.indexrelid FROM pg_index pi, pg_class pc, pg_class pt WHERE pi.indexrelid = pt.oid AND pi.indexrelid = pc.reltoastidxid ORDER BY pi.indexrelid) THEN CAST('INDEX OF TOAST TABLE' AS VARCHAR(20)) ELSE CAST('INDEX' AS VARCHAR(20)) END WHEN cl.relkind = 'S' THEN CAST('SEQUENCE' AS VARCHAR(20)) WHEN cl.relkind = 's' THEN CAST('SPECIAL' AS VARCHAR(20)) WHEN cl.relkind = 'v' THEN CAST('VIEW' AS VARCHAR(20)) WHEN cl.relkind = 't' THEN CAST('TOAST TABLE' AS VARCHAR(18)) ELSE null END AS object_type, CASE WHEN cl.relkind = 'r' THEN COALESCE((SELECT SUM(pg_relation_size(indexrelid))::bigint FROM pg_index WHERE cl.oid=indrelid), 0) ELSE pg_relation_size(cl.oid) END AS indexsize, CASE WHEN reltoastrelid=0 THEN 0 ELSE pg_relation_size(reltoastrelid) END AS toastsize, CASE WHEN reltoastrelid=0 THEN 0 ELSE pg_relation_size((SELECT reltoastidxid FROM pg_class ct WHERE cl.reltoastrelid = ct.oid)) END AS toastindexsize FROM pg_class cl LEFT OUTER JOIN pg_namespace n ON n.oid = cl.relnamespace ) ss WHERE schemaname='public' AND object_type='INDEX' ORDER BY schemaname, tablename, object_type DESC, relname; On Thu, 25 Dec 2008, Pavel Stehule wrote: Hello look on contrib module pg_stat_tuple http://www.postgresql.org/docs/8.3/interactive/pgstattuple.html regards Pavel Stehule 2008/12/25 Gerhard Wiesinger : Hello! Is there some information in meta tables available about the number of pages currently unused, row versions of tables and indices which are unused? I'm asking because I want to measure how efficient HOT is working and whether vacuum should be run or not saving diskspace (I know this is done automatically). Thanx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Conditional commit inside functions
Hello! I tried the following, but still one transaction: SELECT insert_1Mio(); (parallel select count(id) from employee; is done) CREATE OR REPLACE FUNCTION insert_some(start_i INTEGER, end_i INTEGER) RETURNS void AS $func$ DECLARE BEGIN FOR i IN start_i..end_i LOOP INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i, 'John' || i, 'Smith' || i); END LOOP; END; $func$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION insert_1Mio() RETURNS void AS $func$ DECLARE maxcommit INTEGER; start_i INTEGER; end_i INTEGER; now_i INTEGER; BEGIN maxcommit := 1; start_i :=1; end_i := 100; now_i := start_i; FOR i IN start_i..end_i LOOP IF MOD(i, maxcommit) = 0 THEN PERFORM insert_some(now_i, i); now_i := i + 1; END IF; END LOOP; PERFORM insert_some(now_i, end_i); END; $func$ LANGUAGE plpgsql; Any ideas? Ciao, Gerhard -- http://www.wiesinger.com/ On Thu, 25 Dec 2008, Gerhard Wiesinger wrote: Hello! I want to translate the following Oracle PL/SQL script into plpgsql. Especially I'm having problems with the transaction thing. i tried START TRANSACTION and COMMIT without success. Any ideas? Thanx. Ciao, Gerhard CREATE OR REPLACE PROCEDURE insert_1Mio IS maxcommit NUMBER; BEGIN maxcommit := 1; FOR i IN 1..100 LOOP INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i, 'John' || to_char(i), 'Smith' || to_char(i)); IF MOD(i, maxcommit) = 0 THEN COMMIT; END IF; END LOOP; COMMIT; END; -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Conditional commit inside functions
Hello! I want to translate the following Oracle PL/SQL script into plpgsql. Especially I'm having problems with the transaction thing. i tried START TRANSACTION and COMMIT without success. Any ideas? Thanx. Ciao, Gerhard CREATE OR REPLACE PROCEDURE insert_1Mio IS maxcommit NUMBER; BEGIN maxcommit := 1; FOR i IN 1..100 LOOP INSERT INTO employee (id, department, firstname, lastname) VALUES (i, i, 'John' || to_char(i), 'Smith' || to_char(i)); IF MOD(i, maxcommit) = 0 THEN COMMIT; END IF; END LOOP; COMMIT; END; -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Information about Pages, row versions of tables, indices
Hello! Is there some information in meta tables available about the number of pages currently unused, row versions of tables and indices which are unused? I'm asking because I want to measure how efficient HOT is working and whether vacuum should be run or not saving diskspace (I know this is done automatically). Thanx. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Tsearch2 Upgrade from 8.2 to 8.3.1 with mediawiki
Hello! I found the solution: Normal export. Normal upgrade procedure. su - postgres # Upgrade tsearch2 # http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html createdb wikidb psql wikidb < /usr/share/pgsql/contrib/tsearch2.sql psql < pg_dumpall.sql postgres Ciao, Gerhard -- http://www.wiesinger.com/ On Thu, 15 May 2008, Gerhard Wiesinger wrote: Hello! I want to upgrade from 8.2 to 8.3.1 but I've problems: I did a pg_dumpall but this doesn't work. I found the migration guide with a trick to load the new contrib/tsearch2 module. But how is this done exactly? - http://www.postgresql.org/docs/8.3/interactive/textsearch-migration.html The old contrib/tsearch2 functions and other objects must be suppressed when loading pg_dump output from a pre-8.3 database. While many of them won't load anyway, a few will and then cause problems. One simple way to deal with this is to load the new contrib/tsearch2 module before restoring the dump; then it will block the old objects from being loaded. - Some other link I've found: http://sql-info.de/postgresql/notes/converting-tsearch2-to-8.3.html http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes http://archives.postgresql.org/pgsql-hackers/2007-10/msg00509.php http://translate.google.com/translate?u=http%3A%2F%2Fwww.nabble.com%2FAtualiza%25C3%25A7%25C3%25A3o-de-Mediawiki-para-8.3-td15722654.html&hl=en&ie=UTF8&sl=es&tl=en http://www.postgresql.org/docs/8.3/interactive/textsearch-migration.html http://www.postgresql.org/docs/current/static/tsearch2.html#AEN102824 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ http://people.planetpostgresql.org/greg/index.php?/archives/123-MediaWiki-is-Postgres-8.3-compatible.html Thank you. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Tsearch2 Upgrade from 8.2 to 8.3.1
Hello! I want to upgrade from 8.2 to 8.3.1 but I've problems: I did a pg_dumpall but this doesn't work. I found the migration guide with a trick to load the new contrib/tsearch2 module. But how is this done exactly? - http://www.postgresql.org/docs/8.3/interactive/textsearch-migration.html The old contrib/tsearch2 functions and other objects must be suppressed when loading pg_dump output from a pre-8.3 database. While many of them won't load anyway, a few will and then cause problems. One simple way to deal with this is to load the new contrib/tsearch2 module before restoring the dump; then it will block the old objects from being loaded. - Some other link I've found: http://sql-info.de/postgresql/notes/converting-tsearch2-to-8.3.html http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes http://archives.postgresql.org/pgsql-hackers/2007-10/msg00509.php http://translate.google.com/translate?u=http%3A%2F%2Fwww.nabble.com%2FAtualiza%25C3%25A7%25C3%25A3o-de-Mediawiki-para-8.3-td15722654.html&hl=en&ie=UTF8&sl=es&tl=en http://www.postgresql.org/docs/8.3/interactive/textsearch-migration.html http://www.postgresql.org/docs/current/static/tsearch2.html#AEN102824 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ http://people.planetpostgresql.org/greg/index.php?/archives/123-MediaWiki-is-Postgres-8.3-compatible.html Thank you. Ciao, Gerhard -- http://www.wiesinger.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Tools for dumping pg_xlog, pg_clog, etc?
Hello! Since the pg_filedump utility manages only the database file I'm currently writing a pg_xlog/pg_clog viewer. I'm looking for the documentation in: // See src/include/access/xlog.h // See src/include/access/xlog_internal.h // See src/include/access/xlogdefs.h // See src/backend/access/transam/xlog.c // See src/include/access/clog.h // See src/backend/access/transam/clog.c // See src/include/access/rmgr.h // See src/backend/access/transam/rmgr.c But I'm still having problems to get a suitable output: 1.) pg_xlog: a.) Reading XLogPageHeaderData or XLogLongPageHeaderData works well b.) Afterwards a XLogRecord should be expected but the output doesn't look suitable (e.g. CRC has 00 in it in different files which is not suitable) I'll expect in b.): * The overall layout of an XLOG record is: * Fixed-size header (XLogRecord struct) * rmgr-specific data * BkpBlock * backup block data * BkpBlock * backup block data * ... Any ideas of the correct structure? 2.) pg_clog: I'm not sure what the correct structures of pg_clog are. Any ideas? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ On Sat, 26 May 2007, Gerhard Wiesinger wrote: Hello! I think I found a proper utility for that: pg_filedump http://sources.redhat.com/rhdb/utilities.html Ciao, Gerhard -- http://www.wiesinger.com/ On Fri, 18 May 2007, Gerhard Wiesinger wrote: Hello! Are there any tools available to dump the files of the pg_xlog, pg_clog, ... directories in human readable format to understand how transaction handling is done? Thanx. Ciao, Gerhard -- http://www.wiesinger.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Tools for dumping pg_xlog, pg_clog, etc?
Hello! I think I found a proper utility for that: pg_filedump http://sources.redhat.com/rhdb/utilities.html Ciao, Gerhard -- http://www.wiesinger.com/ On Fri, 18 May 2007, Gerhard Wiesinger wrote: Hello! Are there any tools available to dump the files of the pg_xlog, pg_clog, ... directories in human readable format to understand how transaction handling is done? Thanx. Ciao, Gerhard -- http://www.wiesinger.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Tools for dumping pg_xlog, pg_clog, etc?
Hello! Are there any tools available to dump the files of the pg_xlog, pg_clog, ... directories in human readable format to understand how transaction handling is done? Thanx. Ciao, Gerhard -- http://www.wiesinger.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Internals of PostgreSQL - Documentation or presentations
Hello! Are there some presentations or documents of the internals of PostgreSQL available? Especially I'm looking for the concepts and detailed internals of general transaction handling, internals of commit log, transaction logs, pg_multixact, pg_subtrans, pg_tblspc and pg_twophase. Also some comments about concurrent access with multiple processes, locking and shared memory concepts (or other communication systems used) would be nice. I already found http://www.postgresql.org/files/developer/internalpics.pdf but some comments are missing to understand it well. Thanx. Ciao, Gerhard -- http://www.wiesinger.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Problem with data corruption and psql memory usage
Hello Tom! I don't think this is a hardware problem. Machine runs 24/7 for around 4 years without any problems, daily backup with GBs of data to it, uptimes to the next kernel security patch, etc. The only problem I could believe is: I'm running the FC7 test packages of postgresql in FC6 and maybe there is a slight glibc library conflict or any other incompatibility. Ciao, Gerhard -- http://www.wiesinger.com/ On Wed, 9 May 2007, Tom Lane wrote: Gerhard Wiesinger <[EMAIL PROTECTED]> writes: LOG: could not fsync segment 0 of relation 1663/16386/42726: Input/output error [ raised eyebrow... ] I think your machine is flakier than you believe. This error is particularly damning, but the general pattern of weird failures all over the place seems to me to fit the idea of hardware problems much better than any other explanation. FC6 and PG 8.2.3 are both pretty darn stable for most people, so there's *something* wrong with your installation, and unstable hardware is the first thing that comes to mind. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Problem with data corruption and psql memory usage
Hello! I'm new to Postgresql and I did make some import with about 2.8 Mio with normal insert commands. Config was (difference from default config): listen_addresses = '*' temp_buffers = 20MB# min 800kB work_mem = 20MB# min 64kB maintenance_work_mem = 32MB# min 1MB fsync = off# turns forced synchronization on or off full_page_writes = off wal_buffers = 20MB It crashed with a core dump (ulimit -c 0): LOG: server process (PID 12720) was terminated by signal 11 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server proc ess exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server proc ess exited abnormally and possibly corrupted shared memory. Afterwards I got the following error messages: WARNING: index "table_pkey" contains 2572948 row versions, but table contains 2572949 row versions HINT: Rebuild the index with REINDEX. LOG: server process (PID 13794) was terminated by signal 11 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server proc ess exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: could not fsync segment 0 of relation 1663/16386/42726: Input/output error ERROR: storage sync failed on magnetic disk: Input/output error ERROR: could not access status of transaction 808464434 DETAIL: Could not open file "pg_clog/0303": No such file or directory. Afterwards I got: ERROR: could not access status of transaction 5526085 There were also some coredumps afterwards where I have a stack trace: #0 0x0807d241 in heap_deform_tuple () #1 0x08095b8c in toast_delete () #2 0x0809432e in heap_delete () #3 0x0814bfa4 in ExecutorRun () #4 0x081d7ece in FreeQueryDesc () #5 0x081d80c1 in FreeQueryDesc () #6 0x081d8979 in PortalRun () #7 0x081d4480 in pg_parse_query () #8 0x081d5a57 in PostgresMain () #9 0x081ad4fe in ClosePostmasterPorts () #10 0x081ae307 in PostmasterMain () #11 0x0816dec0 in main () So my questions are: 1.) Are my settings to aggresive (fsync=off, full_page_writes=off)? 2.) Should PostgreSQL also recover with these 2 options enabled on a core dump or is data corruption normally with these settings? 3.) Any ideas for the reason of coredumps? Write access was only from one session at a time. I only did select count(*) from table from other sessions. Afterwards I cleaned up the tables, pg_dumpall/restore session, initdb and disabled these 2 settings and everything went fine. I also had a problem with psql: psql < file.sql => psql took around 2GB virtual memory with heavy swapping. After Ctrl-C, restarting, it worked well. Any ideas? Machine is stable so I would say that a hardware failure is not the problem. Postgresql version is 8.2.3 on FC6 Thank you for the answer. Ciao, Gerhard -- http://www.wiesinger.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Fw: PostgreSQL Performance Tuning
Hello! I would do the following (in that order): 1.) Check for a performant application logic and application design (e.g. degree of granularity of the Java Hibernate Mapping, are there some object iterators with hundreds of objects, etc.) 2.) Check the hibernate generated queries and whether the query is suitable or not. Also do a "explain query" do see the query plan. Sometimes a manually generated is much more efficient than hibernate ones. 3.) Optimize the database e.g. postgresql. Ciao, Gerhard -- http://www.wiesinger.com/ On Thu, 26 Apr 2007, Shohab Abdullah wrote: Dear, We are facing performance tuning problem while using PostgreSQL Database over the network on a linux OS. Our Database consists of more than 500 tables with an average of 10K records per table with an average of 20 users accessing the database simultaneously over the network. Each table has indexes and we are querying the database using Hibernate. The biggest problem is while insertion, updating and fetching of records, ie the database performance is very slow. It take a long time to respond in the above scenario. Please provide me with the tuning of the database. I am attaching my postgresql.conf file for the reference of our current configuration Please replay me ASAP Regards, Shohab Abdullah Software Engineer, Manufacturing SBU-POWAI Larsen and Toubro Infotech Ltd.| 4th floor, L&T Technology Centre, Saki Vihar Road, Powai, Mumbai-400072 (: +91-22-67767366 | (: +91-9870247322 Visit us at : http://www.lntinfotech.com ÿÿI cannot predict future, I cannot change past, I have just the present moment, I must treat it as my last" The information contained in this email has been classified: [ X] L&T Infotech General Business [] L&T Infotech Internal Use Only [] L&T Infotech Confidential [] L&T Infotech Proprietary This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. __ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq