Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Jeff Janes
On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Was this applied?

 No, we fixed the server side instead.

 But only for 9.2, right?  So people running back branches are still screwed.

 Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
 behaviors involved here, not all of which are masked by what Tatsuo-san
 suggested.

All of the other ones that I know of were associated with pg_dump
itself, and since it is recommended to run the newer version of
pg_dump against the older version of the server, no back patching
would be necessary to get the benefits of those particular fixes.

 Six months or a year from now, we might have enough confidence in that
 batch of 9.2 fixes to back-port them en masse.  Don't want to do it
 today though.


What would be the recommendation for people trying to upgrade, but who
can't get their data out in a reasonable window?

Putting Tatsuo-san's change into a future pg_dump might be more
conservative than back-porting the server's Lock Table change to the
server version they are trying to get rid of.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Robert Haas
On Sun, Sep 2, 2012 at 5:39 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Was this applied?

 No, we fixed the server side instead.

 But only for 9.2, right?  So people running back branches are still screwed.

 Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
 behaviors involved here, not all of which are masked by what Tatsuo-san
 suggested.

 All of the other ones that I know of were associated with pg_dump
 itself, and since it is recommended to run the newer version of
 pg_dump against the older version of the server, no back patching
 would be necessary to get the benefits of those particular fixes.

 Six months or a year from now, we might have enough confidence in that
 batch of 9.2 fixes to back-port them en masse.  Don't want to do it
 today though.


 What would be the recommendation for people trying to upgrade, but who
 can't get their data out in a reasonable window?

 Putting Tatsuo-san's change into a future pg_dump might be more
 conservative than back-porting the server's Lock Table change to the
 server version they are trying to get rid of.

What he said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-08-30 Thread Bruce Momjian
On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
  Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
  management in the server.  What I fixed so far on the pg_dump side
  should be enough to let partial dumps run at reasonable speed even if
  the whole database contains many tables.  But if psql is taking
  AccessShareLock on lots of tables, there's still a problem.
  
  Ok, I modified the part of pg_dump where tremendous number of LOCK
  TABLE are issued. I replace them with single LOCK TABLE with multiple
  tables. With 100k tables LOCK statements took 13 minutes in total, now
  it only takes 3 seconds. Comments?
 
 Shall I commit to master and all supported branches?

Was this applied?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-08-30 Thread Bruce Momjian
On Thu, Aug 30, 2012 at 04:51:56PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
  Ok, I modified the part of pg_dump where tremendous number of LOCK
  TABLE are issued. I replace them with single LOCK TABLE with multiple
  tables. With 100k tables LOCK statements took 13 minutes in total, now
  it only takes 3 seconds. Comments?
 
  Shall I commit to master and all supported branches?
 
  Was this applied?
 
 No, we fixed the server side instead.

Again, thanks.  I knew we fixed the server, but wasn't clear that made
the client changes unnecessary, but I think I now do remember discussion
about that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-08-30 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Shall I commit to master and all supported branches?

 Was this applied?

No, we fixed the server side instead.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-08-30 Thread Robert Haas
On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Shall I commit to master and all supported branches?

 Was this applied?

 No, we fixed the server side instead.

But only for 9.2, right?  So people running back branches are still screwed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-08-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Was this applied?

 No, we fixed the server side instead.

 But only for 9.2, right?  So people running back branches are still screwed.

Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
behaviors involved here, not all of which are masked by what Tatsuo-san
suggested.

Six months or a year from now, we might have enough confidence in that
batch of 9.2 fixes to back-port them en masse.  Don't want to do it
today though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tatsuo Ishii
 We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
 extremely specific cases that might or might not have anything to do
 with what you're seeing.  The complainant was extremely helpful about
 tracking down the problems:
 http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
 http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php
 
 I'm wondering if these fixes (or today's commit) include the case for
 a database has ~100 thounsands of tables, indexes. One of my customers
 has had troubles with pg_dump for the database, it takes over 10
 hours.
 
 So I did qucik test with old PostgreSQL 9.0.2 and current (as of
 commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed
 database I created 100,000 tables, and each has two integer
 attributes, one of them is a primary key. Creating tables were
 resonably fast as expected (18-20 minutes). This created a 1.4GB
 database cluster.
 
 pg_dump dbname /dev/null took 188 minutes on 9.0.2, which was pretty
 long time as the customer complained. Now what was current?  Well it
 took 125 minutes. Ps showed that most of time was spent in backend.
 
 Below is the script to create tables.
 
 cnt=10
 while [ $cnt -gt 0 ]
 do
 psql -e -p 5432 -c create table t$cnt(i int primary key, j int); test
 cnt=`expr $cnt - 1`
 done
 
 p.s. You need to increate max_locks_per_transaction before running
 pg_dump (I raised to 640 in my case).

Just for record, I rerun the test again with my single-LOCK patch, and
now total runtime of pg_dump is 113 minutes.
188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

So far, I'm glad to see 40% time savings at this point.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Klemme
On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Just for record, I rerun the test again with my single-LOCK patch, and
 now total runtime of pg_dump is 113 minutes.
 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

 So far, I'm glad to see 40% time savings at this point.

I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

Cheers

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tatsuo Ishii
 On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Just for record, I rerun the test again with my single-LOCK patch, and
 now total runtime of pg_dump is 113 minutes.
 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

 So far, I'm glad to see 40% time savings at this point.
 
 I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

What I meant was (100 * (113/188 - 1)).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Klemme
On Thu, May 31, 2012 at 4:07 PM, Tatsuo Ishii is...@postgresql.org wrote:
 On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Just for record, I rerun the test again with my single-LOCK patch, and
 now total runtime of pg_dump is 113 minutes.
 188 minutes(9.0)-125 minutes(git master)-113 minutes(with my patch).

 So far, I'm glad to see 40% time savings at this point.

 I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

 What I meant was (100 * (113/188 - 1)).

OK, my fault was to assume you wanted to measure only your part, while
apparently you meant overall savings.  But Tom had asked for separate
measurements if I understood him correctly.  Also, that measurement of
your change would go after the O(N^2) fix.  It could actually turn out
to be much more than 9% because the overall time would be reduced even
more dramatic.  So it might actually be good for your fix to wait a
bit. ;-)

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire
On Thu, May 31, 2012 at 11:17 AM, Robert Klemme
shortcut...@googlemail.com wrote:

 OK, my fault was to assume you wanted to measure only your part, while
 apparently you meant overall savings.  But Tom had asked for separate
 measurements if I understood him correctly.  Also, that measurement of
 your change would go after the O(N^2) fix.  It could actually turn out
 to be much more than 9% because the overall time would be reduced even
 more dramatic.  So it might actually be good for your fix to wait a
 bit. ;-)

It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm asking because it doesn't seem like a complicated patch,
contributors may want to get working if not ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm not; Jeff Janes is.  But you shouldn't be holding your breath
anyway, since it's 9.3 material at this point.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 It's not clear whether Tom is already working on that O(N^2) fix in locking.

 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

I agree we can't back-patch that change, but then I think we ought to
consider back-patching some variant of Tatsuo's patch.  Maybe it's not
reasonable to thunk an arbitrary number of relation names in there on
one line, but how about 1000 relations per LOCK statement or so?  I
guess we'd need to see how much that erodes the benefit, but we've
certainly done back-branch rearrangements in pg_dump in the past to
fix various kinds of issues, and this is pretty non-invasive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

 I agree we can't back-patch that change, but then I think we ought to
 consider back-patching some variant of Tatsuo's patch.  Maybe it's not
 reasonable to thunk an arbitrary number of relation names in there on
 one line, but how about 1000 relations per LOCK statement or so?  I
 guess we'd need to see how much that erodes the benefit, but we've
 certainly done back-branch rearrangements in pg_dump in the past to
 fix various kinds of issues, and this is pretty non-invasive.

I am not convinced either that this patch will still be useful after
Jeff's fix goes in, or that it provides any meaningful savings when
you consider a complete pg_dump run.  Yeah, it will make the lock
acquisition phase faster, but that's not a big part of the runtime
except in very limited scenarios (--schema-only, perhaps).

The performance patches we applied to pg_dump over the past couple weeks
were meant to relieve pain in situations where the big server-side
lossage wasn't the dominant factor in runtime (ie, partial dumps).
But this one is targeting exactly that area, which is why it looks like
a band-aid and not a fix to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Bruce Momjian
On Thu, May 31, 2012 at 10:50:51AM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I'm not; Jeff Janes is. �But you shouldn't be holding your breath
  anyway, since it's 9.3 material at this point.
 
  I agree we can't back-patch that change, but then I think we ought to
  consider back-patching some variant of Tatsuo's patch.  Maybe it's not
  reasonable to thunk an arbitrary number of relation names in there on
  one line, but how about 1000 relations per LOCK statement or so?  I
  guess we'd need to see how much that erodes the benefit, but we've
  certainly done back-branch rearrangements in pg_dump in the past to
  fix various kinds of issues, and this is pretty non-invasive.
 
 I am not convinced either that this patch will still be useful after
 Jeff's fix goes in, or that it provides any meaningful savings when
 you consider a complete pg_dump run.  Yeah, it will make the lock
 acquisition phase faster, but that's not a big part of the runtime
 except in very limited scenarios (--schema-only, perhaps).

FYI, that is the pg_upgrade use-case, and pg_dump/restore time is
reportedly taking the majority of time in many cases.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire
On Thu, May 31, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The performance patches we applied to pg_dump over the past couple weeks
 were meant to relieve pain in situations where the big server-side
 lossage wasn't the dominant factor in runtime (ie, partial dumps).
 But this one is targeting exactly that area, which is why it looks like
 a band-aid and not a fix to me.

No, Tatsuo's patch attacks a phase dominated by latency in some
setups. That it's also becoming slow currently because of the locking
cost is irrelevant, with locking sped up, the patch should only
improve the phase even further. Imagine the current timeline:

* = locking
. = waiting

*.*.**.**.***.***...*.

Tatsuo's patch converts it to:

*.**

The locking fix would turn the timeline into:

*.*.*.*.*.*.*

Tatsuo's patch would turn that into:

***

And, as noted before, pg_dump --schema-only is a key bottleneck in pg_upgrade.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

 I agree we can't back-patch that change, but then I think we ought to
 consider back-patching some variant of Tatsuo's patch.  Maybe it's not
 reasonable to thunk an arbitrary number of relation names in there on
 one line, but how about 1000 relations per LOCK statement or so?  I
 guess we'd need to see how much that erodes the benefit, but we've
 certainly done back-branch rearrangements in pg_dump in the past to
 fix various kinds of issues, and this is pretty non-invasive.

 I am not convinced either that this patch will still be useful after
 Jeff's fix goes in, ...

But people on older branches are not going to GET Jeff's fix.

 or that it provides any meaningful savings when
 you consider a complete pg_dump run.  Yeah, it will make the lock
 acquisition phase faster, but that's not a big part of the runtime
 except in very limited scenarios (--schema-only, perhaps).

That is not a borderline scenario, as others have also pointed out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Bruce Momjian
On Thu, May 31, 2012 at 11:04:12AM -0400, Robert Haas wrote:
 On Thu, May 31, 2012 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I'm not; Jeff Janes is.  But you shouldn't be holding your breath
  anyway, since it's 9.3 material at this point.
 
  I agree we can't back-patch that change, but then I think we ought to
  consider back-patching some variant of Tatsuo's patch.  Maybe it's not
  reasonable to thunk an arbitrary number of relation names in there on
  one line, but how about 1000 relations per LOCK statement or so?  I
  guess we'd need to see how much that erodes the benefit, but we've
  certainly done back-branch rearrangements in pg_dump in the past to
  fix various kinds of issues, and this is pretty non-invasive.
 
  I am not convinced either that this patch will still be useful after
  Jeff's fix goes in, ...
 
 But people on older branches are not going to GET Jeff's fix.

FYI, if it got into Postgres 9.2, everyone upgrading to Postgres 9.2
would benefit because pg_upgrade uses the new cluster's pg_dumpall.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 On Thu, May 31, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The performance patches we applied to pg_dump over the past couple weeks
 were meant to relieve pain in situations where the big server-side
 lossage wasn't the dominant factor in runtime (ie, partial dumps).
 But this one is targeting exactly that area, which is why it looks like
 a band-aid and not a fix to me.

 No, Tatsuo's patch attacks a phase dominated by latency in some
 setups.

No, it does not.  The reason it's a win is that it avoids the O(N^2)
behavior in the server.  Whether the bandwidth savings is worth worrying
about cannot be proven one way or the other as long as that elephant
is in the room.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire
On Thu, May 31, 2012 at 12:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 No, Tatsuo's patch attacks a phase dominated by latency in some
 setups.

 No, it does not.  The reason it's a win is that it avoids the O(N^2)
 behavior in the server.  Whether the bandwidth savings is worth worrying
 about cannot be proven one way or the other as long as that elephant
 is in the room.

                        regards, tom lane

I understand that, but if the locking is fixed and made to be O(N)
(and hence each table locking O(1)), then latency suddenly becomes the
dominating factor.

I'm thinking, though, pg_upgrade runs locally, contrary to pg_dump
backups, so in that case latency would be negligible and Tatsuo's
patch inconsequential.

I'm also thinking, whether the ResourceOwner patch you've proposed
would get negated by Tatsuo's patch, because suddenly a portal
(IIRC) has a lot more locks than ResourceOwner could accomodate,
forcing a reversal to O(N²) behavior. In that case, that patch would
in fact be detrimental... huh... way to go 180

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-30 Thread Tatsuo Ishii
 Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
 management in the server.  What I fixed so far on the pg_dump side
 should be enough to let partial dumps run at reasonable speed even if
 the whole database contains many tables.  But if psql is taking
 AccessShareLock on lots of tables, there's still a problem.
 
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

Shall I commit to master and all supported branches?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-30 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Shall I commit to master and all supported branches?

I'm not excited by this patch.  It dodges the O(N^2) lock behavior for
the initial phase of acquiring the locks, but it does nothing for the
lock-related slowdown occurring in all pg_dump's subsequent commands.
I think we really need to get in the server-side fix that Jeff Janes is
working on, and then re-measure to see if something like this is still
worth the trouble.  I am also a tad concerned about whether we might not
have problems with parsing memory usage, or some such, with thousands of
tables being listed in a single command.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-30 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Tatsuo Ishii is...@postgresql.org writes:
  Shall I commit to master and all supported branches?
 
 I'm not excited by this patch.  It dodges the O(N^2) lock behavior for
 the initial phase of acquiring the locks, but it does nothing for the
 lock-related slowdown occurring in all pg_dump's subsequent commands.
 I think we really need to get in the server-side fix that Jeff Janes is
 working on, and then re-measure to see if something like this is still
 worth the trouble.  I am also a tad concerned about whether we might not
 have problems with parsing memory usage, or some such, with thousands of
 tables being listed in a single command.

I can't imagine a case where it's actually better to incur the latency
penalty (which is apparently on the order of *minutes* of additional
time here..) than to worry about the potential memory usage of having to
parse such a command.

If that's really a concern, where is that threshold, and could we simply
cap pg_dump's operations based on it?  Is 1000 alright?  Doing a 'lock'
w/ 1000 tables at a time is still going to be hugely better than doing
them individually and the amount of gain between every-1000 and
all-at-once is likely to be pretty minimal anyway...

The current situation where the client-to-server latency accounts for
multiple minutes of time is just ridiculous, however, so I feel we need
some form of this patch, even if the server side is magically made much
faster.  The constant back-and-forth isn't cheap.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-30 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 The current situation where the client-to-server latency accounts for
 multiple minutes of time is just ridiculous, however, so I feel we need
 some form of this patch, even if the server side is magically made much
 faster.  The constant back-and-forth isn't cheap.

No, you're missing my point.  I don't believe that client-to-server
latency, or any other O(N) cost, has anything to do with the problem
here.  The problem, as Jeff has demonstrated, is the O(N^2) costs
associated with management of the local lock table.  It is utterly
pointless to worry about O(N) costs until that's fixed; and it's just
wrong to claim that you've created a significant speedup by eliminating
a constant factor when all you've done is staved off occurrences of the
O(N^2) problem.

Once we've gotten rid of the local lock table problem, we can re-measure
and see what the true benefit of this patch is.  I'm of the opinion
that it will be in the noise compared to the overall runtime of pg_dump.
I could be wrong, but you won't convince me of that with measurements
taken while the local lock table problem is still there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-30 Thread Tatsuo Ishii
 I'm not excited by this patch.  It dodges the O(N^2) lock behavior for
 the initial phase of acquiring the locks, but it does nothing for the
 lock-related slowdown occurring in all pg_dump's subsequent commands.
 I think we really need to get in the server-side fix that Jeff Janes is
 working on, and then re-measure to see if something like this is still
 worth the trouble.

Well, even with current backend, locking 100,000 tables has been done
in 3 seconds in my test. So even if Jeff Janes's fix is succeeded, I
guess it will just save 3 seconds in my case. and if number of tables
is smaller, the saving will smaller. This suggests that most of time
for processing LOCK has been spent in communication between pg_dump
and backend. Of course this is just my guess, though.

 I am also a tad concerned about whether we might not
 have problems with parsing memory usage, or some such, with thousands of
 tables being listed in a single command.

That's easy to fix. Just divide each LOCK statements into multiple
LOCK statements.

My big concern is, even if the locking part is fixed (either by Jeff
Jane's fix or by me) still much time in pg_dump is spent for SELECTs
against system catalogs. The fix will be turn many SELECTs into single
SELECT, probably using big IN clause for tables oids.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers