Re: [PERFORM] Planner sometimes doesn't use a relevant index with IN (subquery) condition
On 12/11/12 22:06, Rafał Rzepecki wrote: This indeed works around the issue. Thanks! On Mon, Nov 12, 2012 at 9:53 AM, ashutosh durugkar ashuco...@gmail.com wrote: Hey Rafal, SELECT * FROM (SELECT run_id, utilization FROM stats) AS s WHERE run_id IN (SELECT run_id FROM runs WHERE server_id = 515); could you try this: SELECT * FROM (SELECT run_id, utilization FROM stats) AS s WHERE run_id = ANY(ARRAY(SELECT run_id FROM runs WHERE server_id = 515)); Thanks, On Sun, Nov 11, 2012 at 8:48 AM, Rafał Rzepecki divided.m...@gmail.com wrote: [Please CC me on replies, as I'm not subscribed; thank you.] I've ran into a problem with the query planner and IN (subquery) conditions which I suspect to be a bug. I'll attempt to describe the relevant details of my database and explain which behaviour I find unexpected. I've also tried to trigger this behaviour in a clean database; I think I've succeeded, but the conditions are a bit different, so perhaps it's a different problem. I'll describe this setup in detail below. I have a somewhat large table (~2.5M rows), stats, which is quite often (several records a minute) INSERTed to, but never UPDATEd or DELETEd from. (In case it's relevant, it has an attached AFTER INSERT trigger which checks time and rebuilds an aggregate materialized view every hour.) This is the schema: # \d+ stats Table serverwatch.stats Column |Type | Modifiers | Storage | Description --+-++-+- id | integer | not null default nextval('stats_id_seq'::regclass) | plain | run_id | integer | not null | plain | start_time | timestamp without time zone | not null | plain | end_time | timestamp without time zone | not null | plain | cpu_utilization | double precision| | plain | disk_read_ops| bigint | | plain | disk_write_ops | bigint | | plain | network_out | bigint | | plain | network_in | bigint | | plain | disk_read_bytes | bigint | | plain | disk_write_bytes | bigint | | plain | Indexes: stats_pkey PRIMARY KEY, btree (id) stats_day_index btree (run_id, day(stats.*)) stats_month_index btree (run_id, month(stats.*)) stats_week_index btree (run_id, week(stats.*)) Foreign-key constraints: stats_runs FOREIGN KEY (run_id) REFERENCES runs(id) Triggers: stats_day_refresh_trigger AFTER INSERT OR UPDATE ON stats FOR EACH STATEMENT EXECUTE PROCEDURE mat_view_refresh('serverwatch.stats_day') Has OIDs: no day(), month() and week() functions are just trivial date_trunc on a relevant field. The referenced table looks like this: # \d+ runs Table serverwatch.runs Column |Type | Modifiers | Storage | Description -+-+---+-+- id | integer | not null default nextval('runs_id_seq'::regclass) | plain | server_id | integer | not null | plain | flavor | flavor | not null | plain | region | region | not null | plain | launch_time | timestamp without time zone | not null | plain | stop_time | timestamp without time zone | | plain | project_info_id | integer | not null | plain | owner_info_id | integer | not null | plain | Indexes: runs_pkey PRIMARY KEY, btree (id) index_runs_on_flavor btree (flavor) index_runs_on_owner_info_id btree (owner_info_id) index_runs_on_project_info_id btree (project_info_id) index_runs_on_region btree (region) index_runs_on_server_id btree (server_id) Foreign-key constraints: runs_owner_info_id_fkey FOREIGN KEY (owner_info_id) REFERENCES user_infos(id) runs_project_info_id_fkey FOREIGN KEY (project_info_id) REFERENCES project_infos(id) Referenced by: TABLE stats_day CONSTRAINT stats_day_runs FOREIGN KEY
[PERFORM] postgres 8.4, COPY, and high concurrency
I was working on a data warehousing project where a fair number of files could be COPY'd more or less directly into tables. I have a somewhat nice machine to work with, and I ran on 75% of the cores I have (75% of 32 is 24). Performance was pretty bad. With 24 processes going, each backend (in COPY) spent 98% of it's time in semop (as identified by strace). I tried larger and smaller shared buffers, all sorts of other tweaks, until I tried reducing the number of concurrent processes from 24 to 4. Disk I/O went up (on average) at least 10X and strace reports that the top system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty reasonable IMO. Given that each COPY is into it's own, newly-made table with no indices or foreign keys, etc, I would have expected the interaction among the backends to be minimal, but that doesn't appear to be the case. What is the likely cause of the semops? I can't really try a newer version of postgres at this time (perhaps soon). I'm using PG 8.4.13 on ScientificLinux 6.2 (x86_64), and the CPU is a 32 core Xeon E5-2680 @ 2.7 GHz. -- Jon
Re: [PERFORM] postgres 8.4, COPY, and high concurrency
On 13.11.2012 21:13, Jon Nelson wrote: I was working on a data warehousing project where a fair number of files could be COPY'd more or less directly into tables. I have a somewhat nice machine to work with, and I ran on 75% of the cores I have (75% of 32 is 24). Performance was pretty bad. With 24 processes going, each backend (in COPY) spent 98% of it's time in semop (as identified by strace). I tried larger and smaller shared buffers, all sorts of other tweaks, until I tried reducing the number of concurrent processes from 24 to 4. Disk I/O went up (on average) at least 10X and strace reports that the top system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty reasonable IMO. Given that each COPY is into it's own, newly-made table with no indices or foreign keys, etc, I would have expected the interaction among the backends to be minimal, but that doesn't appear to be the case. What is the likely cause of the semops? I'd guess it's lock contention on WALInsertLock. That means, the system is experiencing lock contention on generating WAL records for the insertions. If that theory is correct, you ought to get a big gain if you have wal_level=minimal, and you create or truncate the table in the same transaction with the COPY. That allows the system to skip WAL-logging the COPY. Or you could upgrade to 9.2. The WAL-logging of bulk COPY was optimized in 9.2, it should help precisely the scenario you're facing. - Heikki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgres 8.4, COPY, and high concurrency
On Tue, Nov 13, 2012 at 11:13 AM, Jon Nelson jnelson+pg...@jamponi.net wrote: I was working on a data warehousing project where a fair number of files could be COPY'd more or less directly into tables. I have a somewhat nice machine to work with, and I ran on 75% of the cores I have (75% of 32 is 24). Performance was pretty bad. With 24 processes going, each backend (in COPY) spent 98% of it's time in semop (as identified by strace). They are probably fighting over the right to insert records into the WAL stream. This has been improved in 9.2 Given that each COPY is into it's own, newly-made table with no indices or foreign keys, etc, I would have expected the interaction among the backends to be minimal, but that doesn't appear to be the case. On newer versions if you set wal_level to minimal and archive_mode to off, then these operations would bypass WAL entirely. I can't figure out if there is a corresponding optimization in 8.4, though. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgres 8.4, COPY, and high concurrency
On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 13.11.2012 21:13, Jon Nelson wrote: I was working on a data warehousing project where a fair number of files could be COPY'd more or less directly into tables. I have a somewhat nice machine to work with, and I ran on 75% of the cores I have (75% of 32 is 24). Performance was pretty bad. With 24 processes going, each backend (in COPY) spent 98% of it's time in semop (as identified by strace). I tried larger and smaller shared buffers, all sorts of other tweaks, until I tried reducing the number of concurrent processes from 24 to 4. Disk I/O went up (on average) at least 10X and strace reports that the top system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty reasonable IMO. Given that each COPY is into it's own, newly-made table with no indices or foreign keys, etc, I would have expected the interaction among the backends to be minimal, but that doesn't appear to be the case. What is the likely cause of the semops? I'd guess it's lock contention on WALInsertLock. That means, the system is experiencing lock contention on generating WAL records for the insertions. If that theory is correct, you ought to get a big gain if you have wal_level=minimal, and you create or truncate the table in the same transaction with the COPY. That allows the system to skip WAL-logging the COPY. wal_level doesn't exist for 8.4, but I have archive_mode = off and I am creating the table in the same transaction as the COPY. Or you could upgrade to 9.2. The WAL-logging of bulk COPY was optimized in 9.2, it should help precisely the scenario you're facing. Unfortunately, that's what I was expecting. -- Jon
Re: [PERFORM] postgres 8.4, COPY, and high concurrency
On Tue, Nov 13, 2012 at 12:03 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 13.11.2012 21:13, Jon Nelson wrote: I'd guess it's lock contention on WALInsertLock. That means, the system is experiencing lock contention on generating WAL records for the insertions. If that theory is correct, you ought to get a big gain if you have wal_level=minimal, and you create or truncate the table in the same transaction with the COPY. That allows the system to skip WAL-logging the COPY. wal_level doesn't exist for 8.4, but I have archive_mode = off and I am creating the table in the same transaction as the COPY. That should work to bypass WAL. Can you directly verify whether you are generating lots of WAL (look at the churn in pg_xlog) during those loads? Maybe your contention is someplace else. Since they must all be using different tables, I don't think it would be the relation extension lock. Maybe buffer mapping lock or freelist lock? Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostreSQL v9.2 uses a lot of memory in Windows XP
Please reply to the list, not directly to me. Comments follow in-line. On 11/13/2012 11:37 PM, Wu Ming wrote: Hi, What column in Process Explorer to determine memory usage? Currently I thought Working Set is the correct one. As I said, it just isn't that simple when shared memory is involved. A rough measure for PostgreSQL is the virtual size of one of the processes, plus the working sets of all the others. Alternately, you can reasonably estimate the memory consumption by adding all the working sets and then adding the value of shared_buffers to that - this will under-estimate usage slightly because PostgreSQL also uses shared memory for other things, but not tons of it in a normal configuration. The 'lagging' is like when you try to alt+tab or activating/focusing other application window, or changing tab in browser, it goes slow or lagged in its UI loading. Sure, that's what you see, but you should really be looking at the numbers. Swap in and out bytes, memory usage, etc. In Windows 7 or Win2k8 Server you'd use the Performance Monitor for that; I don't remember off the top of my head where to look in XP. My firefox has many tabs opened (around 30 tabs) and eclipse is well known as its high memory usage. On a 2GB machine? Yup, that'll do it. You've shown a screenshot that suggests that Pg is using relatively little RAM, and you're running two known memory pigs. I'd say your problem has nothing to do with PostgreSQL. Then usually I also opened opera and chrome with ~10-20 tabs opened. Time to buy more RAM. I saw that chrome also spawned many process (I had 4 tabs opened, but it shows 8 child process). They might be the big process that probably is the main cause of the lagging. It's going to be everything adding up. Chrome, Eclipse, Firefox, all fighting for RAM. BTW, chrome uses a multi-process architecture like PostgreSQL, but unlike PostgreSQL it does not use shared memory, so you can tell how much RAM Chrome is using very easily by adding up the working sets. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgres 8.4, COPY, and high concurrency
On Tue, Nov 13, 2012 at 2:43 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Nov 13, 2012 at 12:03 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 13.11.2012 21:13, Jon Nelson wrote: I'd guess it's lock contention on WALInsertLock. That means, the system is experiencing lock contention on generating WAL records for the insertions. If that theory is correct, you ought to get a big gain if you have wal_level=minimal, and you create or truncate the table in the same transaction with the COPY. That allows the system to skip WAL-logging the COPY. wal_level doesn't exist for 8.4, but I have archive_mode = off and I am creating the table in the same transaction as the COPY. That should work to bypass WAL. Can you directly verify whether you are generating lots of WAL (look at the churn in pg_xlog) during those loads? Maybe your contention is someplace else. Since they must all be using different tables, I don't think it would be the relation extension lock. Maybe buffer mapping lock or freelist lock? I had moved on to a different approach to importing the data which does not work concurrently. However, I went back and tried to re-create the situation and - at least a naive attempt failed. I'll give it a few more tries -- I was creating two tables using CREATE TABLE unique name LIKE (some other table INCLUDING everything). Then I would copy the data in, add some constraints (FK constraints but only within these two tables) and then finally (for each table) issue an ALTER TABLE unique name INHERIT some other table. To be clear, however, everything bogged down in the COPY stage which was immediately following the table creation. I'll note that my naive test showed almost no unexpected overhead at all, so it's clearly not representative of the problem I encountered. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Jeff Janes wrote On Thu, Nov 8, 2012 at 1:04 AM, Denis lt; socsam@ gt; wrote: Still I can't undesrtand why pg_dump has to know about all the tables? Strictly speaking it probably doesn't need to. But it is primarily designed for dumping entire databases, and the efficient way to do that is to read it all into memory in a few queries and then sort out the dependencies, rather than tracking down every dependency individually with one or more trips back to the database. (Although it still does make plenty of trips back to the database per table/sequence, for acls, defaults, attributes. If you were to rewrite pg_dump from the ground up to achieve your specific needs (dumping one schema, with no dependencies between to other schemata) you could probably make it much more efficient. But then it wouldn't be pg_dump, it would be something else. Cheers, Jeff -- Sent via pgsql-performance mailing list ( pgsql-performance@ ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance Please don't think that I'm trying to nitpick here, but pg_dump has options for dumping separate tables and that's not really consistent with the idea that pg_dump is primarily designed for dumping entire databases. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731900.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On 11/13/2012 10:12 PM, Denis wrote: Please don't think that I'm trying to nitpick here, but pg_dump has options for dumping separate tables and that's not really consistent with the idea that pg_dump is primarily designed for dumping entire databases. Sure it is. The word primarily is not just a noise word here. The fact that we have options to do other things doesn't mean that its primary design goal has changed. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostreSQL v9.2 uses a lot of memory in Windows XP
Hi, As I said, it just isn't that simple when shared memory is involved. A rough measure for PostgreSQL is the virtual size of one of the processes, plus the working sets of all the others. Alternately, you can reasonably estimate the memory consumption by adding all the working sets and then adding the value of shared_buffers to that - this will under-estimate usage slightly because PostgreSQL also uses shared memory for other things, but not tons of it in a normal configuration. This is interesting. About the virtual size of one of the process, which process I should look up? Is the one who has the biggest virtual size? http://i45.tinypic.com/vr4t3b.png For example, from the above screenshot, the biggest virtual size from all postgresql process is 740004. So can we said the total approximate of memory usage of the postgresql service is 740004 K + total_of_working_sets (4844 K + 10056 K + 5408 K + 6020 K + ...) ? Sure, that's what you see, but you should really be looking at the numbers. Swap in and out bytes, memory usage, etc. In Windows 7 or Win2k8 Server you'd use the Performance Monitor for that; I don't remember off the top of my head where to look in XP. I had total paging file size = 3GB. There is perfmon.exe in windows xp, but don't know how to use and analyze the graph. I'd say your problem has nothing to do with PostgreSQL. Maybe you're right. If I close one of the memory porks, it gets a bit better. Maybe I was too quick to blame postgreSQL, it's just that I can't close and restart other applications because they are either too important or slow to reload, where postgresql service is very quick in restarting. I hope it'll understand. On Wed, Nov 14, 2012 at 6:07 AM, Craig Ringer cr...@2ndquadrant.com wrote: Please reply to the list, not directly to me. Comments follow in-line. On 11/13/2012 11:37 PM, Wu Ming wrote: Hi, What column in Process Explorer to determine memory usage? Currently I thought Working Set is the correct one. As I said, it just isn't that simple when shared memory is involved. A rough measure for PostgreSQL is the virtual size of one of the processes, plus the working sets of all the others. Alternately, you can reasonably estimate the memory consumption by adding all the working sets and then adding the value of shared_buffers to that - this will under-estimate usage slightly because PostgreSQL also uses shared memory for other things, but not tons of it in a normal configuration. The 'lagging' is like when you try to alt+tab or activating/focusing other application window, or changing tab in browser, it goes slow or lagged in its UI loading. Sure, that's what you see, but you should really be looking at the numbers. Swap in and out bytes, memory usage, etc. In Windows 7 or Win2k8 Server you'd use the Performance Monitor for that; I don't remember off the top of my head where to look in XP. My firefox has many tabs opened (around 30 tabs) and eclipse is well known as its high memory usage. On a 2GB machine? Yup, that'll do it. You've shown a screenshot that suggests that Pg is using relatively little RAM, and you're running two known memory pigs. I'd say your problem has nothing to do with PostgreSQL. Then usually I also opened opera and chrome with ~10-20 tabs opened. Time to buy more RAM. I saw that chrome also spawned many process (I had 4 tabs opened, but it shows 8 child process). They might be the big process that probably is the main cause of the lagging. It's going to be everything adding up. Chrome, Eclipse, Firefox, all fighting for RAM. BTW, chrome uses a multi-process architecture like PostgreSQL, but unlike PostgreSQL it does not use shared memory, so you can tell how much RAM Chrome is using very easily by adding up the working sets. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostreSQL v9.2 uses a lot of memory in Windows XP
On 11/14/2012 01:56 PM, Wu Ming wrote: This is interesting. About the virtual size of one of the process, which process I should look up? Is the one who has the biggest virtual size? Thinking about this some more, I haven't checked to see if Windows adds dirtied shared_buffers to the process's working set. If so, you'd still be multiply counting shared memory. In that case, since you can't use an approach like Depesz writes about here for Linux: http://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/ then it's going to be pretty hard to actually work it out. pg_buffercache might be of some help ( http://www.postgresql.org/docs/current/static/pgbuffercache.html http://www.postgresql.org/docs/9.1/static/pgbuffercache.html) but it's not exactly friendly. Yes, it's absurd that it's so hard to work out how much memory Pg uses. It'd be nice if Pg provided better tools for this by allowing the postmaster to interrogate backends' memory contexts, though that'd only report how much memory Pg thought it was using, not how much memory it was actually using from the OS. Really, OS-specific tools are required, and nobody's written them - at least, I'm not aware of any that've been published. Most of the problem is that operating systems make it so hard to tell where memory is going when shared memory is involved. http://i45.tinypic.com/vr4t3b.png For example, from the above screenshot, the biggest virtual size from all postgresql process is 740004. So can we said the total approximate of memory usage of the postgresql service is 740004 K + total_of_working_sets (4844 K + 10056 K + 5408 K + 6020 K + ...) ? *if* Windows XP doesn't add dirtied shared buffers to the working set, then that would be a reasonable approximation. If it does, then it'd be massively out because it'd be double-counting shared memory. Off the top of my head I'm not sure how best to test this. Maybe if you do a simple query like `SELECT * FROM some_big_table` in `psql` and dump the result to the null device (\o NUL in windows if I recall correctly, but again not tested) or a temp file and see how much the backend grows. If it grows more than a few hundred K then I expect it's probably having the dirtied shared_buffers counted against it. Maybe you're right. If I close one of the memory porks, it gets a bit better. Maybe I was too quick to blame postgreSQL, it's just that I can't close and restart other applications because they are either too important or slow to reload, where postgresql service is very quick in restarting. I hope it'll understand. And, of course, because PostgreSQL looks like it uses a TON of memory, even when it's really using only a small amount. This has been an ongoing source of confusion, but it's one that isn't going to go away until OSes offer a way to easily ask how much RAM is this group of processes using in total. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services