Re: [PERFORM] Planner sometimes doesn't use a relevant index with IN (subquery) condition

2012-11-13 Thread Gavin Flower

On 12/11/12 22:06, Rafał Rzepecki wrote:

This indeed works around the issue. Thanks!

On Mon, Nov 12, 2012 at 9:53 AM, ashutosh durugkar ashuco...@gmail.com wrote:

Hey Rafal,



SELECT * FROM (SELECT run_id, utilization FROM stats) AS s WHERE

run_id IN (SELECT run_id FROM runs WHERE server_id = 515);

could you try this:


SELECT * FROM (SELECT run_id, utilization FROM stats) AS s WHERE
run_id = ANY(ARRAY(SELECT run_id FROM runs WHERE server_id = 515));

Thanks,

On Sun, Nov 11, 2012 at 8:48 AM, Rafał Rzepecki divided.m...@gmail.com
wrote:

[Please CC me on replies, as I'm not subscribed; thank you.]

I've ran into a problem with the query planner and IN (subquery)
conditions which I suspect to be a bug. I'll attempt to describe the
relevant details of my database and explain which behaviour I find
unexpected. I've also tried to trigger this behaviour in a clean
database; I think I've succeeded, but the conditions are a bit
different, so perhaps it's a different problem. I'll describe this
setup in detail below.

I have a somewhat large table (~2.5M rows), stats, which is quite
often (several records a minute) INSERTed to, but never UPDATEd or
DELETEd from. (In case it's relevant, it has an attached AFTER INSERT
trigger which checks time and rebuilds an aggregate materialized view
every hour.) This is the schema:
# \d+ stats
   Table
serverwatch.stats
   Column  |Type |
Modifiers  | Storage | Description

--+-++-+-
  id   | integer | not null default
nextval('stats_id_seq'::regclass) | plain   |
  run_id   | integer | not null
   | plain   |
  start_time   | timestamp without time zone | not null
   | plain   |
  end_time | timestamp without time zone | not null
   | plain   |
  cpu_utilization  | double precision|
   | plain   |
  disk_read_ops| bigint  |
   | plain   |
  disk_write_ops   | bigint  |
   | plain   |
  network_out  | bigint  |
   | plain   |
  network_in   | bigint  |
   | plain   |
  disk_read_bytes  | bigint  |
   | plain   |
  disk_write_bytes | bigint  |
   | plain   |
Indexes:
 stats_pkey PRIMARY KEY, btree (id)
 stats_day_index btree (run_id, day(stats.*))
 stats_month_index btree (run_id, month(stats.*))
 stats_week_index btree (run_id, week(stats.*))
Foreign-key constraints:
 stats_runs FOREIGN KEY (run_id) REFERENCES runs(id)
Triggers:
 stats_day_refresh_trigger AFTER INSERT OR UPDATE ON stats FOR EACH
STATEMENT EXECUTE PROCEDURE mat_view_refresh('serverwatch.stats_day')
Has OIDs: no

day(), month() and week() functions are just trivial date_trunc on a
relevant field. The referenced table looks like this:
# \d+ runs
  Table serverwatch.runs
  Column  |Type |
Modifiers | Storage | Description

-+-+---+-+-
  id  | integer | not null default
nextval('runs_id_seq'::regclass) | plain   |
  server_id   | integer | not null
 | plain   |
  flavor  | flavor  | not null
 | plain   |
  region  | region  | not null
 | plain   |
  launch_time | timestamp without time zone | not null
 | plain   |
  stop_time   | timestamp without time zone |
 | plain   |
  project_info_id | integer | not null
 | plain   |
  owner_info_id   | integer | not null
 | plain   |
Indexes:
 runs_pkey PRIMARY KEY, btree (id)
 index_runs_on_flavor btree (flavor)
 index_runs_on_owner_info_id btree (owner_info_id)
 index_runs_on_project_info_id btree (project_info_id)
 index_runs_on_region btree (region)
 index_runs_on_server_id btree (server_id)
Foreign-key constraints:
 runs_owner_info_id_fkey FOREIGN KEY (owner_info_id) REFERENCES
user_infos(id)
 runs_project_info_id_fkey FOREIGN KEY (project_info_id)
REFERENCES project_infos(id)
Referenced by:
 TABLE stats_day CONSTRAINT stats_day_runs FOREIGN KEY 

[PERFORM] postgres 8.4, COPY, and high concurrency

2012-11-13 Thread Jon Nelson
I was working on a data warehousing project where a fair number of files
could be COPY'd more or less directly into tables. I have a somewhat nice
machine to work with, and I ran on 75% of the cores I have (75% of 32 is
24).

Performance was pretty bad. With 24 processes going, each backend (in COPY)
spent 98% of it's time in semop (as identified by strace).  I tried larger
and smaller shared buffers, all sorts of other tweaks, until I tried
reducing the number of concurrent processes from 24 to 4.

Disk I/O went up (on average) at least 10X and strace reports that the top
system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty
reasonable IMO.

Given that each COPY is into it's own, newly-made table with no indices or
foreign keys, etc, I would have expected the interaction among the backends
to be minimal, but that doesn't appear to be the case.  What is the likely
cause of the semops?

I can't really try a newer version of postgres at this time (perhaps soon).

I'm using PG 8.4.13 on ScientificLinux 6.2 (x86_64), and the CPU is a 32
core Xeon E5-2680 @ 2.7 GHz.

-- 
Jon


Re: [PERFORM] postgres 8.4, COPY, and high concurrency

2012-11-13 Thread Heikki Linnakangas

On 13.11.2012 21:13, Jon Nelson wrote:

I was working on a data warehousing project where a fair number of files
could be COPY'd more or less directly into tables. I have a somewhat nice
machine to work with, and I ran on 75% of the cores I have (75% of 32 is
24).

Performance was pretty bad. With 24 processes going, each backend (in COPY)
spent 98% of it's time in semop (as identified by strace).  I tried larger
and smaller shared buffers, all sorts of other tweaks, until I tried
reducing the number of concurrent processes from 24 to 4.

Disk I/O went up (on average) at least 10X and strace reports that the top
system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty
reasonable IMO.

Given that each COPY is into it's own, newly-made table with no indices or
foreign keys, etc, I would have expected the interaction among the backends
to be minimal, but that doesn't appear to be the case.  What is the likely
cause of the semops?


I'd guess it's lock contention on WALInsertLock. That means, the system 
is experiencing lock contention on generating WAL records for the 
insertions. If that theory is correct, you ought to get a big gain if 
you have wal_level=minimal, and you create or truncate the table in the 
same transaction with the COPY. That allows the system to skip 
WAL-logging the COPY.


Or you could upgrade to 9.2. The WAL-logging of bulk COPY was optimized 
in 9.2, it should help precisely the scenario you're facing.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 8.4, COPY, and high concurrency

2012-11-13 Thread Jeff Janes
On Tue, Nov 13, 2012 at 11:13 AM, Jon Nelson jnelson+pg...@jamponi.net wrote:
 I was working on a data warehousing project where a fair number of files
 could be COPY'd more or less directly into tables. I have a somewhat nice
 machine to work with, and I ran on 75% of the cores I have (75% of 32 is
 24).

 Performance was pretty bad. With 24 processes going, each backend (in COPY)
 spent 98% of it's time in semop (as identified by strace).

They are probably fighting over the right to insert records into the WAL stream.

This has been improved in 9.2


 Given that each COPY is into it's own, newly-made table with no indices or
 foreign keys, etc, I would have expected the interaction among the backends
 to be minimal, but that doesn't appear to be the case.

On newer versions if you set wal_level to minimal and archive_mode to
off, then these operations would bypass WAL entirely.  I can't figure
out if there is a corresponding optimization in 8.4, though.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 8.4, COPY, and high concurrency

2012-11-13 Thread Jon Nelson
On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 13.11.2012 21:13, Jon Nelson wrote:

 I was working on a data warehousing project where a fair number of files
 could be COPY'd more or less directly into tables. I have a somewhat nice
 machine to work with, and I ran on 75% of the cores I have (75% of 32 is
 24).

 Performance was pretty bad. With 24 processes going, each backend (in
 COPY)
 spent 98% of it's time in semop (as identified by strace).  I tried larger
 and smaller shared buffers, all sorts of other tweaks, until I tried
 reducing the number of concurrent processes from 24 to 4.

 Disk I/O went up (on average) at least 10X and strace reports that the top
 system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty
 reasonable IMO.

 Given that each COPY is into it's own, newly-made table with no indices or
 foreign keys, etc, I would have expected the interaction among the
 backends
 to be minimal, but that doesn't appear to be the case.  What is the likely
 cause of the semops?


 I'd guess it's lock contention on WALInsertLock. That means, the system is
 experiencing lock contention on generating WAL records for the insertions.
 If that theory is correct, you ought to get a big gain if you have
 wal_level=minimal, and you create or truncate the table in the same
 transaction with the COPY. That allows the system to skip WAL-logging the
 COPY.


wal_level doesn't exist for 8.4, but I have archive_mode = off and I am
creating the table in the same transaction as the COPY.



 Or you could upgrade to 9.2. The WAL-logging of bulk COPY was optimized in
 9.2, it should help precisely the scenario you're facing.


Unfortunately, that's what I was expecting.



-- 
Jon


Re: [PERFORM] postgres 8.4, COPY, and high concurrency

2012-11-13 Thread Jeff Janes
On Tue, Nov 13, 2012 at 12:03 PM, Jon Nelson jnelson+pg...@jamponi.net wrote:
 On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 On 13.11.2012 21:13, Jon Nelson wrote:


 I'd guess it's lock contention on WALInsertLock. That means, the system is
 experiencing lock contention on generating WAL records for the insertions.
 If that theory is correct, you ought to get a big gain if you have
 wal_level=minimal, and you create or truncate the table in the same
 transaction with the COPY. That allows the system to skip WAL-logging the
 COPY.


 wal_level doesn't exist for 8.4, but I have archive_mode = off and I am
 creating the table in the same transaction as the COPY.


That should work to bypass WAL.  Can you directly verify whether you
are generating lots of WAL (look at the churn in pg_xlog) during those
loads?

Maybe your contention is someplace else.  Since they must all be using
different tables, I don't think it would be the relation extension
lock.  Maybe buffer mapping lock or freelist lock?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostreSQL v9.2 uses a lot of memory in Windows XP

2012-11-13 Thread Craig Ringer
Please reply to the list, not directly to me. Comments follow in-line.

On 11/13/2012 11:37 PM, Wu Ming wrote:
 Hi,

 What column in Process Explorer to determine memory usage? Currently I
 thought Working Set is the correct one.
As I said, it just isn't that simple when shared memory is involved. A
rough measure for PostgreSQL is the virtual size of one of the
processes, plus the working sets of all the others. Alternately, you can
reasonably estimate the memory consumption by adding all the working
sets and then adding the value of shared_buffers to that - this will
under-estimate usage slightly because PostgreSQL also uses shared memory
for other things, but not tons of it in a normal configuration.
 The 'lagging' is like when you try to alt+tab or activating/focusing
 other application window, or changing tab in browser, it goes slow or
 lagged in its UI loading.
Sure, that's what you see, but you should really be looking at the
numbers. Swap in and out bytes, memory usage, etc. In Windows 7 or
Win2k8 Server you'd use the Performance Monitor for that; I don't
remember off the top of my head where to look in XP.
 My firefox has many tabs opened (around 30 tabs) and eclipse is well
 known as its high memory usage.
On a 2GB machine? Yup, that'll do it.

You've shown a screenshot that suggests that Pg is using relatively
little RAM, and you're running two known memory pigs. I'd say your
problem has nothing to do with PostgreSQL.
 Then usually I also opened opera and
 chrome with ~10-20 tabs opened.
Time to buy more RAM.
 I saw that chrome also spawned many
 process (I had 4 tabs opened, but it shows 8 child process). They
 might be the big process that probably is the main cause of the
 lagging.
It's going to be everything adding up. Chrome, Eclipse, Firefox, all
fighting for RAM.

BTW, chrome uses a multi-process architecture like PostgreSQL, but
unlike PostgreSQL it does not use shared memory, so you can tell how
much RAM Chrome is using very easily by adding up the working sets.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 8.4, COPY, and high concurrency

2012-11-13 Thread Jon Nelson
On Tue, Nov 13, 2012 at 2:43 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Nov 13, 2012 at 12:03 PM, Jon Nelson jnelson+pg...@jamponi.net 
 wrote:
 On Tue, Nov 13, 2012 at 1:27 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 On 13.11.2012 21:13, Jon Nelson wrote:


 I'd guess it's lock contention on WALInsertLock. That means, the system is
 experiencing lock contention on generating WAL records for the insertions.
 If that theory is correct, you ought to get a big gain if you have
 wal_level=minimal, and you create or truncate the table in the same
 transaction with the COPY. That allows the system to skip WAL-logging the
 COPY.


 wal_level doesn't exist for 8.4, but I have archive_mode = off and I am
 creating the table in the same transaction as the COPY.


 That should work to bypass WAL.  Can you directly verify whether you
 are generating lots of WAL (look at the churn in pg_xlog) during those
 loads?

 Maybe your contention is someplace else.  Since they must all be using
 different tables, I don't think it would be the relation extension
 lock.  Maybe buffer mapping lock or freelist lock?

I had moved on to a different approach to importing the data which
does not work concurrently. However, I went back and tried to
re-create the situation and - at least a naive attempt failed. I'll
give it a few more tries -- I was creating two tables using CREATE
TABLE unique name LIKE (some other table INCLUDING everything).
Then I would copy the data in, add some constraints (FK constraints
but only within these two tables) and then finally (for each table)
issue an ALTER TABLE unique name INHERIT some other table.  To be
clear, however, everything bogged down in the COPY stage which was
immediately following the table creation.

I'll note that my naive test showed almost no unexpected overhead at
all, so it's clearly not representative of the problem I encountered.


--
Jon


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-13 Thread Denis
Jeff Janes wrote
 On Thu, Nov 8, 2012 at 1:04 AM, Denis lt;

 socsam@

 gt; wrote:

 Still I can't undesrtand why pg_dump has to know about all the tables?
 
 Strictly speaking it probably doesn't need to.  But it is primarily
 designed for dumping entire databases, and the efficient way to do
 that is to read it all into memory in a few queries and then sort out
 the dependencies, rather than tracking down every dependency
 individually with one or more trips back to the database.  (Although
 it still does make plenty of trips back to the database per
 table/sequence, for acls, defaults, attributes.
 
 If you were to rewrite pg_dump from the ground up to achieve your
 specific needs (dumping one schema, with no dependencies between to
 other schemata) you could probably make it much more efficient.  But
 then it wouldn't be pg_dump, it would be something else.
 
 Cheers,
 
 Jeff
 
 
 -- 
 Sent via pgsql-performance mailing list (

 pgsql-performance@

 )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Please don't think that I'm trying to nitpick here, but pg_dump has options
for dumping separate tables and that's not really consistent with the idea
that pg_dump is primarily  designed for dumping entire databases.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731900.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-13 Thread Andrew Dunstan


On 11/13/2012 10:12 PM, Denis wrote:

Please don't think that I'm trying to nitpick here, but pg_dump has options
for dumping separate tables and that's not really consistent with the idea
that pg_dump is primarily  designed for dumping entire databases.





Sure it is. The word primarily is not just a noise word here.

The fact that we have options to do other things doesn't mean that its 
primary design goal has changed.



cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostreSQL v9.2 uses a lot of memory in Windows XP

2012-11-13 Thread Wu Ming
Hi,

 As I said, it just isn't that simple when shared memory is involved. A
 rough measure for PostgreSQL is the virtual size of one of the
 processes, plus the working sets of all the others. Alternately, you can
 reasonably estimate the memory consumption by adding all the working
 sets and then adding the value of shared_buffers to that - this will
 under-estimate usage slightly because PostgreSQL also uses shared memory
 for other things, but not tons of it in a normal configuration.

This is interesting. About the virtual size of one of the process,
which process I should look up? Is the one who has the biggest virtual
size?

http://i45.tinypic.com/vr4t3b.png

For example, from the above screenshot, the biggest virtual size from
all postgresql process is 740004. So can we said the total approximate
of memory usage of the postgresql service is 740004 K +
total_of_working_sets (4844 K + 10056 K + 5408 K + 6020 K + ...) ?


 Sure, that's what you see, but you should really be looking at the
 numbers. Swap in and out bytes, memory usage, etc. In Windows 7 or
 Win2k8 Server you'd use the Performance Monitor for that; I don't
 remember off the top of my head where to look in XP.

I had total paging file size = 3GB.

There is perfmon.exe in windows xp, but don't know how to use and
analyze the graph.


 I'd say your problem has nothing to do with PostgreSQL.

Maybe you're right. If I close one of the memory porks, it gets a bit
better. Maybe I was too quick to blame postgreSQL, it's just that I
can't close and restart other applications because they are either too
important or slow to reload, where postgresql service is very quick in
restarting. I hope it'll understand.



On Wed, Nov 14, 2012 at 6:07 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 Please reply to the list, not directly to me. Comments follow in-line.

 On 11/13/2012 11:37 PM, Wu Ming wrote:
 Hi,

 What column in Process Explorer to determine memory usage? Currently I
 thought Working Set is the correct one.
 As I said, it just isn't that simple when shared memory is involved. A
 rough measure for PostgreSQL is the virtual size of one of the
 processes, plus the working sets of all the others. Alternately, you can
 reasonably estimate the memory consumption by adding all the working
 sets and then adding the value of shared_buffers to that - this will
 under-estimate usage slightly because PostgreSQL also uses shared memory
 for other things, but not tons of it in a normal configuration.
 The 'lagging' is like when you try to alt+tab or activating/focusing
 other application window, or changing tab in browser, it goes slow or
 lagged in its UI loading.
 Sure, that's what you see, but you should really be looking at the
 numbers. Swap in and out bytes, memory usage, etc. In Windows 7 or
 Win2k8 Server you'd use the Performance Monitor for that; I don't
 remember off the top of my head where to look in XP.
 My firefox has many tabs opened (around 30 tabs) and eclipse is well
 known as its high memory usage.
 On a 2GB machine? Yup, that'll do it.

 You've shown a screenshot that suggests that Pg is using relatively
 little RAM, and you're running two known memory pigs. I'd say your
 problem has nothing to do with PostgreSQL.
 Then usually I also opened opera and
 chrome with ~10-20 tabs opened.
 Time to buy more RAM.
 I saw that chrome also spawned many
 process (I had 4 tabs opened, but it shows 8 child process). They
 might be the big process that probably is the main cause of the
 lagging.
 It's going to be everything adding up. Chrome, Eclipse, Firefox, all
 fighting for RAM.

 BTW, chrome uses a multi-process architecture like PostgreSQL, but
 unlike PostgreSQL it does not use shared memory, so you can tell how
 much RAM Chrome is using very easily by adding up the working sets.

 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostreSQL v9.2 uses a lot of memory in Windows XP

2012-11-13 Thread Craig Ringer
On 11/14/2012 01:56 PM, Wu Ming wrote:

 This is interesting. About the virtual size of one of the process,
 which process I should look up? Is the one who has the biggest virtual
 size?

Thinking about this some more, I haven't checked to see if Windows adds
dirtied shared_buffers to the process's working set. If so, you'd still
be multiply counting shared memory. In that case, since you can't use an
approach like Depesz writes about here for Linux:

http://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/

then it's going to be pretty hard to actually work it out.
pg_buffercache might be of some help (
http://www.postgresql.org/docs/current/static/pgbuffercache.html
http://www.postgresql.org/docs/9.1/static/pgbuffercache.html) but it's
not exactly friendly.

Yes, it's absurd that it's so hard to work out how much memory Pg uses.
It'd be nice if Pg provided better tools for this by allowing the
postmaster to interrogate backends' memory contexts, though that'd only
report how much memory Pg thought it was using, not how much memory it
was actually using from the OS. Really, OS-specific tools are required,
and nobody's written them - at least, I'm not aware of any that've been
published.

Most of the problem is that operating systems make it so hard to tell
where memory is going when shared memory is involved.

 http://i45.tinypic.com/vr4t3b.png

 For example, from the above screenshot, the biggest virtual size from
 all postgresql process is 740004. So can we said the total approximate
 of memory usage of the postgresql service is 740004 K +
 total_of_working_sets (4844 K + 10056 K + 5408 K + 6020 K + ...) ?
*if* Windows XP doesn't add dirtied shared buffers to the working set,
then that would be a reasonable approximation.

If it does, then it'd be massively out because it'd be double-counting
shared memory.

Off the top of my head I'm not sure how best to test this. Maybe if you
do a simple query like `SELECT * FROM some_big_table` in `psql` and dump
the result to the null device (\o NUL in windows if I recall correctly,
but again not tested) or a temp file and see how much the backend grows.
If it grows more than a few hundred K then I expect it's probably having
the dirtied shared_buffers counted against it.
 Maybe you're right. If I close one of the memory porks, it gets a bit
 better. Maybe I was too quick to blame postgreSQL, it's just that I
 can't close and restart other applications because they are either too
 important or slow to reload, where postgresql service is very quick in
 restarting. I hope it'll understand.
And, of course, because PostgreSQL looks like it uses a TON of memory,
even when it's really using only a small amount.

This has been an ongoing source of confusion, but it's one that isn't
going to go away until OSes offer a way to easily ask how much RAM is
this group of processes using in total.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services