Re: [PERFORM] pg_repack solves alter table set tablespace lock
On Fri, Jan 24, 2014 at 3:48 PM, Ying He yinghe0...@yahoo.com wrote: I looked at the pg_repack usage and in release 1.2 http://reorg.github.io/pg_repack/. there is -s tablespace that claims to be an online version of ALTER TABLE ... SET TABLESPACE is this the functionality that solves the alter table set tablespace lock issue? Cross-posting to multiple lists in quick succession is generally considered rude; I see you have posted to the reorg-general list already, which is the right forum for questions about pg_repack. (And yes, that -s flag sounds like what you are after.) Josh
[PERFORM] COMMIT stuck for days after bulk delete
We have a 9.1.11 backend (Ubuntu 12.04 x86_64, m1.medium EC2 instance) which seems to be stuck at COMMIT for 2 days now: mydb=# SELECT procpid, waiting, current_query, CURRENT_TIMESTAMP - query_start AS query_elapsed, CURRENT_TIMESTAMP - xact_start AS xact_elapsed FROM pg_stat_activity WHERE procpid != pg_backend_pid() AND current_query != 'IDLE'; -[ RECORD 1 ]-+--- procpid | 6061 waiting | f current_query | COMMIT; query_elapsed | 2 days 08:59:17.619142 xact_elapsed | 3 days 15:48:10.739912 The transaction behind that COMMIT has been the only thing running on this Postgres instance for the past 3 days or so, since Postgres was started on that machine. I spun the EC2 instance for this database up solely to test a database subsetting process, which is what the transaction was doing before it got stuck at COMMIT -- using a bunch of DELETEs and ALTER TABLE ... DROP|ADD CONSTRAINTs to delete 90% or so of our data in order to be able to pg_dump a slimmed-down development copy. The EC2 instances we use have separate EBS-backed volumes for the PostgreSQL data and WAL directories. The backend in question seems to be stuck reading a ton of data from the data partition: the monitoring for those EBS volumes shows those volumes have been hammered reading a constant aggregate 90MB/sec since that COMMIT started. The write bandwidth to the postgresql-data partition has been almost nil since the COMMIT, and there has been no read/write activity on the WAL volumes. Here, we can see that backend has managed to read 22 TB despite the fact that the entire database is only 228 GB on disk. $ sudo cat /proc/6061/io rchar: 24505414843923 wchar: 23516159014 syscr: 2991395854 syscw: 2874613 read_bytes: 24791719338496 write_bytes: 22417580032 cancelled_write_bytes: 221208576 $ df -h /dev/md0 /dev/md1 Filesystem Size Used Avail Use% Mounted on /dev/md0480G 228G 253G 48% /mnt/ebs/postgresql-data /dev/md1 32G 20G 13G 61% /mnt/ebs/postgresql-wal Running an strace on the backend shows a whole ton of read() calls and the occasional lseek(). I grabbed a backtrace of the backend with gdb, attached. Attached also are the non-default pg_settings for this instance. You'll notice that fsync, full_page_writes, and autovacuum are all off: this is intentional, since this instance is transient and has nothing important on it. There are no interesting errors in the Postgres log files since it was spun up. Any ideas on how to further diagnose or avoid this problem? Josh (gdb) bt #0 0x7fa235a73fe0 in read () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x006249fd in read (__nbytes=8192, __buf=0x7fa21f8d7ee0, __fd=optimized out) at /usr/include/x86_64-linux-gnu/bits/unistd.h:45 #2 FileRead (file=optimized out, buffer=0x7fa21f8d7ee0 RI, amount=8192) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/file/fd.c:1191 #3 0x0063d856 in mdread (reln=optimized out, forknum=optimized out, blocknum=253211, buffer=0x7fa21f8d7ee0 RI) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/smgr/md.c:646 #4 0x00620556 in ReadBuffer_common (smgr=optimized out, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=253211, mode=RBM_NORMAL, strategy=optimized out, hit=0x7fffa31cb7df ) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/buffer/bufmgr.c:440 #5 0x00621014 in ReadBufferExtended (reln=0x18678a0, forkNum=MAIN_FORKNUM, blockNum=253211, mode=RBM_NORMAL, strategy=optimized out) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/storage/buffer/bufmgr.c:246 #6 0x0046ff81 in heapgetpage (scan=0x2202340, page=253211) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:230 #7 0x0047049a in heapgettup_pagemode (scan=0x2202340, dir=optimized out, nkeys=0, key=0x0) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:824 #8 0x00471506 in heap_getnext (scan=0x2202340, direction=optimized out) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/access/heap/heapam.c:1349 #9 0x0058a50b in SeqNext (node=optimized out) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/nodeSeqscan.c:66 #10 0x00579fa3 in ExecScanFetch (recheckMtd=0x58a4d0 SeqRecheck, accessMtd=0x58a4e0 SeqNext, node=0x2201590) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execScan.c:82 #11 ExecScan (node=0x2201590, accessMtd=0x58a4e0 SeqNext, recheckMtd=0x58a4d0 SeqRecheck) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execScan.c:167 #12 0x00572a98 in ExecProcNode (node=0x2201590) at /build/buildd/postgresql-9.1-9.1.11/build/../src/backend/executor/execProcnode.c:394 #13 0x0058587a in ExecLockRows (node=0x22013a0) at
Re: [PERFORM] COMMIT stuck for days after bulk delete
On Tue, Jan 14, 2014 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Kupershmidt schmi...@gmail.com writes: We have a 9.1.11 backend (Ubuntu 12.04 x86_64, m1.medium EC2 instance) which seems to be stuck at COMMIT for 2 days now: ... The transaction behind that COMMIT has been the only thing running on this Postgres instance for the past 3 days or so, since Postgres was started on that machine. I spun the EC2 instance for this database up solely to test a database subsetting process, which is what the transaction was doing before it got stuck at COMMIT -- using a bunch of DELETEs and ALTER TABLE ... DROP|ADD CONSTRAINTs to delete 90% or so of our data in order to be able to pg_dump a slimmed-down development copy. A plausible guess is that the backend is running around trying to verify that some deferred foreign key constraints still hold. But without knowing what your schema is, that's only a guess. Yeah, that's a good guess. A bunch of the FK constraints I am dropping and re-adding are marked DEFERRABLE INITIALLY DEFERRED; there are 167 counted by: SELECT COUNT(*) FROM pg_catalog.pg_constraint c WHERE contype = 'f' AND condeferrable AND condeferred AND connamespace = (SELECT oid FROM pg_catalog.pg_namespace WHERE nspname = 'public') ; If that is it, a likely solution is to drop *all* the FK constraints before doing the bulk delete, then (in a new transaction, probably) recreate the ones you still want. Will try that, thanks for the suggestion. Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] get/set priority of PostgreSQL backends
Hi all, I noticed a note on the 'Priorities' wiki page[1], which talked about the need for having a C-language function 'nice_backend(prio)' that renices the calling backend to prio.', and suggests posting a link to this list. Well, here you go: http://pgxn.org/dist/prioritize/ The API is a tiny bit different than what was suggested on the wiki; the wiki suggested nice_backend() and nice_backend_super(), whereas I just consolidated those into set_backend_priority(), with permissions checks similar to pg_cancel_backend(). There is also get_backend_priority(), which should play nicely with the former function, and perhaps enable scripted queries to automatically bump priorities based on pg_stat_activity. See the doc[3] for more details. The wiki says nice_backend_super() might be able to renice any backend pid and set any priority, but is usable only by the [database] superuser, hinting that it would be feasible to lower a backend's priority value (i.e. increase the scheduling priority). Unfortunately this is not possible on at least OS X and Linux, where one must be root to lower priority values. I haven't checked whether this module works on Windows, would appreciate if someone could give it a shot there. I can update the 'Priorities' wiki page in a bit. Josh [1] http://wiki.postgresql.org/wiki/Priorities [3] https://github.com/schmiddy/pg_prioritize/blob/master/doc/prioritize.md -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] get/set priority of PostgreSQL backends
On Sat, Apr 7, 2012 at 11:05 AM, Scott Marlowe scott.marl...@gmail.com wrote: On Sat, Apr 7, 2012 at 11:06 AM, Josh Kupershmidt schmi...@gmail.com wrote: The wiki says nice_backend_super() might be able to renice any backend pid and set any priority, but is usable only by the [database] superuser, hinting that it would be feasible to lower a backend's priority value (i.e. increase the scheduling priority). Unfortunately this is not possible on at least OS X and Linux, where one must be root to lower priority values. I haven't checked whether this module works on Windows, would appreciate if someone could give it a shot there. I thought you were limited to only settings above 0 and your own processes in linux. For non-root users, you may always only *increase* the priority values of your processes, and the default priority value is 0. So yes as non-root, you're effectively limited to positive and increasing values for setpriority(), and of course you may only alter process priorities running under the same user. I think that's what I was saying above, though maybe I wasn't so clear. For example, if you try to lower your own backend's priority with this function, you'll get a warning like this: test=# SELECT set_backend_priority(pg_backend_pid(), -1); WARNING: Not possible to lower a process's priority (currently 0) set_backend_priority -- f (1 row) Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Update problem on large table
On Mon, Dec 6, 2010 at 4:31 PM, felix crucialfe...@gmail.com wrote: thanks for the replies !, but actually I did figure out how to kill it but pb_cancel_backend didn't work. here's some notes: this has been hung for 5 days: ns | 32681 | nssql | IDLE in transaction | f | 2010-12-01 15 Right, pg_cancel_backend() isn't going to help when the session you're trying to kill is 'IDLE in transaction' -- there's no query to be killed. If this 'IDLE in transaction' session was causing problems by blocking other transactions, you should look at the application running these queries and figure out why it's hanging out in this state. Staying like that for 5 days is not a good sign, and can cause also problems with e.g. autovacuum. [snip] but it still will not die the docs for pg_ctl state: Use pb_ctl --help to see a list of supported signal names. doing so does indeed tell me the names: HUP INT QUIT ABRT TERM USR1 USR2 but nothing about them whatseover :) I agree this could be better documented. There's a brief mention at: http://www.postgresql.org/docs/current/static/app-postgres.html#AEN77350 To cancel a running query, send the SIGINT signal to the process running that command. though that snippet of information is out-of-place on a page about the postmaster, and SIGINT vs. SIGTERM for individual backends isn't discussed there at any rate. At any rate, as you discovered, you have to send SIGTERM to the backend to kill off an 'IDLE in transaction' session. If you're using 8.4 or newer, you have pg_terminate_backend() as a SQL wrapper for SIGTERM. If you're using an older version, be careful, see e.g. http://archives.postgresql.org/pgsql-admin/2010-04/msg00274.php Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Update problem on large table
On Mon, Dec 6, 2010 at 2:48 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: On Mon, Dec 6, 2010 at 1:46 PM, bricklen brick...@gmail.com wrote: Not sure if anyone replied about killing your query, but you can do it like so: select pg_cancel_backend(5902); -- assuming 5902 is the pid of the query you want canceled. How does this differ from just killing the pid? pg_cancel_backend(5902) does the same thing as: kill -SIGINT 5902 Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle
On Mon, Oct 25, 2010 at 2:12 PM, Divakar Singh dpsma...@yahoo.com wrote: 1. How does PostgreSQL perform when inserting data into an indexed (type: btree) table? Is it true that as you add the indexes on a table, the performance deteriorates significantly whereas Oracle does not show that much performance decrease. I have tried almost all postgreSQL performance tips available. I want to have very good insert performance (with indexes), select performance is not that important at this point of time. I don't claim to have any experience with Oracle, but this boast smells fishy. See for example Figure 3-2 (pp. 57-58) in The Art of SQL, where the author presents simple charts showing the performance impact upon INSERTs of adding indexes to a table in Oracle and MySQL: they're both in the same ballpark, and the performance impact is indeed significant. As Joshua Drake suggests, table schemas and test results would help your case. Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to get the total number of records in report
On Mon, Oct 18, 2010 at 1:16 AM, AI Rumman rumman...@gmail.com wrote: At present for reporting I use following types of query: select crm.*, crm_cnt.cnt from crm, (select count(*) as cnt from crm) crm_cnt; Here count query is used to find the total number of records. Same FROM clause is copied in both the part of the query. Is there any other good alternative way to get this similar value? Well, it looks like you're currently executing two sequential scans over the crm table. And you're including the total row-count as a separate column in every row you get back, although you really only need this piece of information once. Since you're fetching all of the crm table anyway, why not get rid of the COUNT(*) entirely and just keep a count on the client-side of the total number of rows you've fetched? Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] cleanup on pg_ system tables?
On Mon, Sep 20, 2010 at 1:25 PM, mark dvlh...@gmail.com wrote: Hi All, (pg 8.3.7 on RHEL 2.6.18-92.el5 ) I ran the query below (copied from http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html ) on a production DB we have and I am looking at some pretty nasty looking numbers for tables in the pg_catalog schema. I have tried a reindex and vaccum but neither seem to be clearing these out, tried a cluster and it won't let me. I am viewing the problem wrong? is there anything I can do while the DB is online ? do I need to clean up other things first ? You sure you tried VACUUM FULL, not just VACUUM? I've been in the same boat on 8.3, actually, see: http://archives.postgresql.org/pgsql-performance/2010-04/msg00204.php I migrated the server mentioned in that thread to 8.4, so at least I don't have to deal with the max_fsm_pages problem; you might want to double check your postmaster logfile to make sure you don't see a bunch of warnings about max_fsm_pages. I put in a twice-hourly cron job which runs a VACUUM ANALYZE on the pg_catalog tables I'd been having trouble with, which has helped keep pg_catalog bloat down. My pg_catalog tables are still somewhat bloated (11 GB for pg_attribute), but at least they've been at a steady size for the past few months without needing a VACUUM FULL. Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] stats collector suddenly causing lots of IO
On Thu, Apr 15, 2010 at 6:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Chris li...@deksai.com writes: I have a lot of centos servers which are running postgres. Postgres isn't used that heavily on any of them, but lately, the stats collector process keeps causing tons of IO load. It seems to happen only on servers with centos 5. Say, I just realized that both of you are complaining about stats collector overhead on centos 5 servers. I hadn't been thinking in terms of OS-specific causes, but maybe that is what we need to consider. Can you tell me the exact kernel versions you are seeing these problems with? uname -a says ... 2.6.18-92.1.13.el5 #1 SMP ... x86_64, and it's CentOS 5.2. I'm not sure whether this is related to the stats collector problems on this machine, but I noticed alarming table bloat in the catalog tables pg_attribute, pg_attrdef, pg_depend, and pg_type. Perhaps this has happened slowly over the past few months, but I discovered the bloat when I ran the query from: http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html on the most-active database on this server (OID 16389 from the pgstat.stat I sent in). See attached table_bloat.txt. The autovacuum settings for this server haven't been tweaked from the default; they probably should have been, given the heavy bulk updates/inserts done. Maybe there's another cause for this extreme catalog bloat, besides the weak autovacuum settings, though. Table sizes, according to pg_size_pretty(pg_total_relation_size(...)): * pg_attribute: 145 GB * pg_attrdef: 85 GB * pg_depend: 38 GB * pg_type: 3465 MB I'll try to send in strace outputs later today. Josh schemaname | tablename | reltuples | relpages | otta | tbloat | wastedpages | wastedbytes | wastedsize | iname | ituples | ipages | iotta | ibloat | wastedipages | wastedibytes | wastedisize ---+---+---+--++--+-+-+++--+-++-+--+--+- pg_catalog| pg_attribute | 0 | 12178069 | 0 | 0.0 |12178069 | 99762741248 | 93 GB | pg_attribute_relid_attnam_index |0 | 4525199 | 0 | 0.0 | 4525199 | 37070430208 | 35 GB pg_catalog| pg_attribute | 0 | 12178069 | 0 | 0.0 |12178069 | 99762741248 | 93 GB | pg_attribute_relid_attnum_index |0 | 848842 | 0 | 0.0 | 848842 | 6953713664 | 6632 MB pg_catalog| pg_attrdef| 767 | 8789472 | 30 | 292982.4 | 8789442 | 72003108864 | 67 GB | pg_attrdef_oid_index | 767 | 361513 | 28 | 12911.2 | 361485 | 2961285120 | 2824 MB pg_catalog| pg_attrdef| 767 | 8789472 | 30 | 292982.4 | 8789442 | 72003108864 | 67 GB | pg_attrdef_adrelid_adnum_index | 767 | 359805 | 28 | 12850.2 | 359777 | 2947293184 | 2811 MB snip one bloated user table pg_catalog| pg_depend | 14270 | 1890691 | 98 | 19292.8 | 1890593 | 15487737856 | 14 GB | pg_depend_depender_index |14270 | 1649557 | 65 | 25377.8 | 1649492 | 13512638464 | 13 GB pg_catalog| pg_depend | 14270 | 1890691 | 98 | 19292.8 | 1890593 | 15487737856 | 14 GB | pg_depend_reference_index |14270 | 1371522 | 65 | 21100.3 | 1371457 | 11234975744 | 10 GB snip some more user tables pg_catalog| pg_type | 1333 | 385445 | 31 | 12433.7 | 385414 | 3157311488 | 3011 MB| pg_type_oid_index | 1333 | 54965 | 24 | 2290.2 |54941 | 450076672 | 429 MB pg_catalog| pg_type | 1333 | 385445 | 31 | 12433.7 | 385414 | 3157311488 | 3011 MB| pg_type_typname_nsp_index | 1333 |3129 | 24 | 130.4 | 3105 | 25436160 | 24 MB snip -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] stats collector suddenly causing lots of IO
On Fri, Apr 16, 2010 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Kupershmidt schmi...@gmail.com writes: I'm not sure whether this is related to the stats collector problems on this machine, but I noticed alarming table bloat in the catalog tables pg_attribute, pg_attrdef, pg_depend, and pg_type. Hmm. That makes me wonder if autovacuum is functioning properly at all. What does pg_stat_all_tables show for the last vacuum and analyze times of those tables? Try something like select relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze from pg_stat_all_tables where schemaname = 'pg_catalog' order by 1; Output attached. Note that I ran pg_stat_reset() a few days ago. Josh # select relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze from pg_stat_all_tables where schemaname = 'pg_catalog' order by 1; relname | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum| last_analyze | last_autoanalyze --+++-+---+--+--- pg_aggregate | 0 | 0 | | | | pg_am| 0 | 0 | | | | pg_amop | 0 | 0 | | | | pg_amproc| 0 | 0 | | | | pg_attrdef | 2 |3527675 | | | | pg_attribute | 56 | 12771286 | | | | pg_auth_members | 11 | 0 | | | | pg_authid| 16 | 2 | | | | pg_autovacuum| 0 | 0 | | | | pg_cast | 0 | 0 | | | | pg_class | 30191 | 155416 | | 2010-04-15 21:14:06.826358+00 | | 2010-04-15 21:14:06.826358+00 pg_constraint| 2435 |701 | | 2010-04-15 16:13:33.92506+00 | | 2010-04-15 16:13:33.92506+00 pg_conversion| 0 | 0 | | | | pg_database | 0 | 0 | | | | pg_depend| 55 |4961774 | | | | pg_description | 0 | 0 | | | | pg_enum | 0 | 0 | | | | pg_index | 4603 | 2272 | | 2010-04-15 16:18:21.289279+00 | | 2010-04-15 16:18:21.289279+00 pg_inherits | 2 | 0 | | | | pg_language | 0 | 0 | | | | pg_largeobject | 0 | 0 | | | | pg_listener | 37 | 117512 | | | | pg_namespace | 0 | 0 | | | | pg_opclass | 0 | 0 | | | | pg_operator | 0 | 0 | | | | pg_opfamily | 0 | 0 | | | | pg_pltemplate| 0 | 0 | | | | pg_proc | 0 | 0 | | | | pg_rewrite | 4 | 0 | | | | pg_shdepend |717 | 172806 | | 2010-04-15 16:56:26.296809+00 | | 2010-04-15 16:56:26.296809+00 pg_shdescription | 0 | 0 | | | | pg_statistic | 0 | 0 | | | | pg_tablespace| 0 | 0 | | | | pg_trigger | 7021 | 1232 | | 2010-04-15 16:24:55.054294+00
Re: [PERFORM] stats collector suddenly causing lots of IO
On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote: Wow. Well, we have a smoking gun here: for some reason, autovacuum isn't running, or isn't doing its job if it is. If it's not running at all, that would explain failure to prune the stats collector's file too. Hrm, well autovacuum is at least trying to do work: it's currently stuck on those bloated pg_catalog tables, of course. Another developer killed an autovacuum of pg_attribute (or maybe it was pg_attrdef) after it had been running for two weeks. See current pg_stat_activity output attached, which shows the three autovacuum workers running plus two manual VACUUM ANALYZEs I started yesterday. Is there anything in the postmaster log that would suggest autovac difficulties? Yup, there are logs from April 1st which I just grepped through. I attached the redacted output, and I see a few warnings about [table] contains more than max_fsm_pages pages with useful free space, as well as ERROR: canceling autovacuum task. Perhaps bumping up max_fsm_pages and making autovacuum settings more aggressive will help me? I was also planning to run a CLUSTER of those four bloated pg_catalog tables -- is this safe, particularly for tables like pg_attrdef which rely on OIDs? Josh # SELECT * FROM pg_stat_activity WHERE usename = 'postgres' AND current_query ILIKE '%vacuum%'; datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_start | backend_start | client_addr | client_port -++-+--+--+---+-+---+---+---+---+- 7651893 | master_dev | 20288 | 10 | postgres | VACUUM ANALYZE myschema.mytable; | f | 2010-04-14 16:24:51.421088+00 | 2010-04-14 16:24:51.421088+00 | 2010-04-14 16:24:46.875911+00 | 192.168.24.63 | 54065 [snip SELECT * FROM pg_stat_activity] 16389 | master |9596 | 10 | postgres | autovacuum: VACUUM ANALYZE pg_catalog.pg_attrdef | f | 2010-04-15 21:18:41.583044+00 | 2010-04-15 21:18:41.583044+00 | 2010-04-01 11:25:53.903941+00 | | 16389 | master | 18043 | 10 | postgres | autovacuum: VACUUM ANALYZE pg_catalog.pg_attribute| f | 2010-04-15 17:18:36.071479+00 | 2010-04-15 17:18:36.071479+00 | 2010-04-15 16:51:50.230827+00 | | 16389 | master | 19608 | 10 | postgres | autovacuum: VACUUM ANALYZE pg_catalog.pg_depend | f | 2010-04-15 16:26:41.941069+00 | 2010-04-15 16:26:41.941069+00 | 2010-04-08 17:08:07.417122+00 | | 16389 | master | 20278 | 10 | postgres | VACUUM ANALYZE myschema.mytable; | f | 2010-04-14 16:24:35.7458+00 | 2010-04-14 16:24:35.7458+00 | 2010-04-14 16:24:28.632431+00 | 192.168.24.63 | 54064 (6 rows) # grep -A 2 -B 2 -i vacuum *.log postgresql-2010-04-01_00.log-2010-04-01 06:48:40 UTC username1 1.2.3.4 master WARNING: there is already a transaction in progress postgresql-2010-04-01_00.log-2010-04-01 06:48:44 UTCWARNING: relation pg_catalog.pg_depend contains more than max_fsm_pages pages with useful free space postgresql-2010-04-01_00.log:2010-04-01 06:48:44 UTCHINT: Consider using VACUUM FULL on this relation or increasing the configuration parameter max_fsm_pages. postgresql-2010-04-01_00.log-2010-04-01 06:48:51 UTC username1 1.2.3.4 master WARNING: there is already a transaction in progress postgresql-2010-04-01_00.log-2010-04-01 06:48:52 UTC username1 1.2.3.4 master WARNING: there is already a transaction in progress -- [snip connection info] postgresql-2010-04-01_00.log-2010-04-01 10:41:38 UTCWARNING: relation schema.table contains more than max_fsm_pages pages with useful free space postgresql-2010-04-01_00.log:2010-04-01 10:41:38 UTCHINT: Consider using VACUUM FULL on this relation or increasing the configuration parameter max_fsm_pages. postgresql-2010-04-01_00.log-2010-04-01 10:41:46 UTC username1 1.2.3.4 master WARNING: there is already a transaction in progress postgresql-2010-04-01_00.log-2010-04-01 10:41:47 UTC username1 1.2.3.4 master WARNING: there is already a transaction in progress -- postgresql-2010-04-01_00.log-2010-04-01 10:49:33 UTC username1 1.2.3.4 master WARNING: there is already a transaction in progress postgresql-2010-04-01_00.log-2010-04-01 10:49:33
Re: [PERFORM] stats collector suddenly causing lots of IO
On Fri, Apr 16, 2010 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Kupershmidt schmi...@gmail.com writes: Hrm, well autovacuum is at least trying to do work: it's currently stuck on those bloated pg_catalog tables, of course. Another developer killed an autovacuum of pg_attribute (or maybe it was pg_attrdef) after it had been running for two weeks. See current pg_stat_activity output attached, which shows the three autovacuum workers running plus two manual VACUUM ANALYZEs I started yesterday. Two weeks? What have you got the autovacuum cost delays set to? SELECT name, current_setting(name), source FROM pg_settings WHERE source != 'default' AND name ILIKE '%vacuum%'; name | current_setting | source --+-+ vacuum_cost_delay| 200ms | configuration file vacuum_cost_limit| 100 | configuration file vacuum_cost_page_hit | 6 | configuration file (3 rows) I'm guessing these values and the default autovacuum configuration values need to be cranked significantly to make vacuum much more aggressive :-( Once you're up to three AV workers, no new ones can get launched until one of those finishes or is killed. So that would explain failure to prune the stats collector's tables (the tabpurge code is only run during AV worker launch). So what we need to figure out is why it's taking so obscenely long to vacuum these tables ... Hopefully changing those three vacuum_cost_* params will speed up the manual- and auto-vacuums.. it'll take me a few days to see any results, since I still need to do something about the bloat that's already there. Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] stats collector suddenly causing lots of IO
On Fri, Apr 16, 2010 at 2:14 PM, Greg Smith g...@2ndquadrant.com wrote: Josh Kupershmidt wrote: SELECT name, current_setting(name), source FROM pg_settings WHERE source != 'default' AND name ILIKE '%vacuum%'; name | current_setting | source --+-+ vacuum_cost_delay | 200ms | configuration file vacuum_cost_limit | 100 | configuration file vacuum_cost_page_hit | 6 | configuration file Hopefully changing those three vacuum_cost_* params will speed up the manual- and auto-vacuums.. Those only impact manual VACUUM statements. There's a different set with names like autovacuum_vacuum_cost_delay that control the daemon. You can set those to -1 in order to match the regular VACUUM, but that's not the default. It looks like the default which I have of autovacuum_vacuum_cost_limit = -1, which means it's inheriting the vacuum_cost_limit of 100 I had set. I'll try bumping vacuum_cost_limit up to 1000 or so. You really need to sort out the max_fsm_pages setting too, because until that issue goes away these tables are unlikely to ever stop growing. And, no, you can't use CLUSTER on the system tables to clean those up. I have max_fsm_pages = 524288 , but from the hints in the logfiles this obviously needs to go up much higher. And it seems the only way to compact the pg_catalog tables is VACUUM FULL + REINDEX on 8.3 -- I had tried the CLUSTER on my 9.0 machine and wrongly assumed it would work on 8.3, too. Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] stats collector suddenly causing lots of IO
On Fri, Apr 16, 2010 at 3:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Kupershmidt schmi...@gmail.com writes: name | current_setting | source --+-+ vacuum_cost_delay | 200ms | configuration file vacuum_cost_limit | 100 | configuration file vacuum_cost_page_hit | 6 | configuration file It looks like the default which I have of autovacuum_vacuum_cost_limit = -1, which means it's inheriting the vacuum_cost_limit of 100 I had set. I'll try bumping vacuum_cost_limit up to 1000 or so. Actually I think the main problem is that cost_delay value, which is probably an order of magnitude too high. The way to limit vacuum's I/O impact on other stuff is to make it take frequent short delays, not have it run full speed and then sleep a long time. In any case, your current settings have got it sleeping way too much. Two WEEKS !!!?? Yup, I was going to turn vacuum_cost_delay down to 20. The two weeks was for the pg_catalog table which has bloated to 145 GB, I think. One of those manual VACUUMs I kicked off just finished, after 48 hours -- and that table was only 25 GB or so. I wasn't the one who set up this postgresql.conf, but I am stuck fixing things :/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PG-related ACM Article: The Pathologies of Big Data
Just stumbled across this recent article published in the Communications of the ACM: http://cacm.acm.org/magazines/2009/8/34493-the-pathologies-of-big-data/fulltext The author shares some insights relating to difficulties processing a 6.75 billion-row table, a dummy table representing census-type data for everyone on earth, in Postgres. I'd really like to replicate the author's experiment, but it's not clear from the article what his table definition looks like. He claims to be using a 16-byte record to store the several columns he needs for each row, so perhaps he's using a user-defined type? The author implies with his definition of big data that the dataset he analyzed is ... too large to be placed in a relational database... . From Fig. 2, the SELECT query he ran took just under 10^5 seconds (~28 hours) when run on 6.75 billion rows. This amount of time for the query didn't seem surprising to me given how many rows he has to process, but in a recent post on comp.databases.ingres someone claimed that on a far-inferior PC, Ingres ran the same SELECT query in 105 minutes! This would be very impressive (a 10-fold improvement over Postgres) if true. The author complained that on larger tables [Postgres' planner] switched to sorting by grouping columns, which he blamed for the slow query execution. I don't personally see this plan as a problem, but maybe someone can enlighten me. One intriguing tidbit I picked up from the article: in modern systems, as demonstrated in the figure, random access to memory is typically slower than sequential access to disk. In hindsight, this seems plausible (since modern disks can sustain sequential reads at well over 100MB/sec). Anyway, it would be very interesting to attempt to speed up the author's query if at all possible. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance