Re: [HACKERS] extending relations more efficiently

2012-05-31 Thread Jeff Davis
On Tue, 2012-05-01 at 10:08 -0400, Robert Haas wrote:
 We've previously discussed the possible desirability of extending
 relations in larger increments, rather than one block at a time, for
 performance reasons.  I attempted to determine how much performance we
 could possibly buy this way, and found that, as far as I can see, the
 answer is, basically, none.

Another point here is that with checksums, we will want to make sure
that zero pages can be treated as corrupt. That will probably involve
using the WAL for extension operations, and we'll want to mitigate that
cost somehow. Extending in larger chunks would probably be necessary.

There are some challenges there, but I think it's worth pursuing.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-03 Thread Stephen Frost
Robert,

* Stephen Frost (sfr...@snowman.net) wrote:
  In all seriousness, this is not a great test case unless you can
  provide some scripts to make it easy to run it in a reproducible
  fashion.  Can you?
 
 Yeah, sure, I'll do that.  The PostGIS folks have scripts, but they're
 kind of ugly, tbh..  I'll give you what I used.

Alright, I made it dirt simple.  Get a DB set up, get PostGIS installed
in to it (or not, if you don't really want to..) and then download this:

http://tamriel.snowman.net/~sfrost/testcase.tar.gz
(Note: it's ~3.7G)

It'll untar in to a 'testcase' directory.  If you have PostGIS
instealled already, just run testcase/create_linearwater.sql first, then
run all the other .sql files in there in parallel by passing them to
psql, eg:

for file in tl*.sql; do
psql -d mydb -f $file 
done

It shouldn't take long to see stuff start blocking on that extension
lock, since the table starts out empty.

If you don't want to install PostGIS, just change the
create_linearwater.sql script to have a column at the end named
'the_geom' and nuke the 'AddGeometryColumn' call.  If you run into
trouble getting PG to exhibit the expected behavior, you might try
combining sets of files, like so:

for state in `ls tl_2011_* | cut -f3 -d_ | cut -c1,2 | sort -u`; do
cat tl_2011_${state}*.sql | psql -d mydb 
done

Which will reduce the number of processes to just the number of states
and territories (eg: 56), while increasing the amount of work each is
doing.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Jeroen Vermeulen

On 2012-05-01 22:06, Robert Haas wrote:


It might also be interesting to provide a mechanism to pre-extend a
relation to a certain number of blocks, though if we did that we'd
have to make sure that autovac got the memo not to truncate those
pages away again.


Good point.  And just to check before skipping over it, do we know that 
autovacuum not leaving enough slack space is not a significant cause of 
the bottlenecks in the first place?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 7:16 AM, Jeroen Vermeulen j...@xs4all.nl wrote:
 On 2012-05-01 22:06, Robert Haas wrote:
 It might also be interesting to provide a mechanism to pre-extend a
 relation to a certain number of blocks, though if we did that we'd
 have to make sure that autovac got the memo not to truncate those
 pages away again.

 Good point.  And just to check before skipping over it, do we know that
 autovacuum not leaving enough slack space is not a significant cause of the
 bottlenecks in the first place?

I'm not sure exactly what you mean by this: autovacuum doesn't need
any slack space.  Regular DML operations can certainly benefit from
slack space, both within each page and overall within the relation.
But there's no evidence that vacuum is doing too good a job cleaning
up the mess, forcing the relation to be re-extended.  Rather, the
usual (and frequent) complaint is that vacuum is leaving way too much
slack space - i.e. bloat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié may 02 08:14:36 -0400 2012:
 On Wed, May 2, 2012 at 7:16 AM, Jeroen Vermeulen j...@xs4all.nl wrote:
  On 2012-05-01 22:06, Robert Haas wrote:
  It might also be interesting to provide a mechanism to pre-extend a
  relation to a certain number of blocks, though if we did that we'd
  have to make sure that autovac got the memo not to truncate those
  pages away again.
 
  Good point.  And just to check before skipping over it, do we know that
  autovacuum not leaving enough slack space is not a significant cause of the
  bottlenecks in the first place?
 
 I'm not sure exactly what you mean by this: autovacuum doesn't need
 any slack space.  Regular DML operations can certainly benefit from
 slack space, both within each page and overall within the relation.
 But there's no evidence that vacuum is doing too good a job cleaning
 up the mess, forcing the relation to be re-extended.  Rather, the
 usual (and frequent) complaint is that vacuum is leaving way too much
 slack space - i.e. bloat.

Hm.  I see those two things as different -- to me, bloat is unremoved
dead tuples, whereas slack space would be free space that can be reused
by new tuples.  Slack space is useful as it avoids relation extension;
bloat is not.

I wonder, though, if we should set a less-than-100 fillfactor for heap
relations.  Just like default_statistic_target, it seems that the
default value should be a conservative tradeoff between two extremes.
This doesn't help extension for bulk insertions a lot, of course, but
it'd be useful for tables where HOT updates happen with some regularity.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 12:26 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié may 02 08:14:36 -0400 2012:
 On Wed, May 2, 2012 at 7:16 AM, Jeroen Vermeulen j...@xs4all.nl wrote:
  On 2012-05-01 22:06, Robert Haas wrote:
  It might also be interesting to provide a mechanism to pre-extend a
  relation to a certain number of blocks, though if we did that we'd
  have to make sure that autovac got the memo not to truncate those
  pages away again.
 
  Good point.  And just to check before skipping over it, do we know that
  autovacuum not leaving enough slack space is not a significant cause of the
  bottlenecks in the first place?

 I'm not sure exactly what you mean by this: autovacuum doesn't need
 any slack space.  Regular DML operations can certainly benefit from
 slack space, both within each page and overall within the relation.
 But there's no evidence that vacuum is doing too good a job cleaning
 up the mess, forcing the relation to be re-extended.  Rather, the
 usual (and frequent) complaint is that vacuum is leaving way too much
 slack space - i.e. bloat.

 Hm.  I see those two things as different -- to me, bloat is unremoved
 dead tuples, whereas slack space would be free space that can be reused
 by new tuples.  Slack space is useful as it avoids relation extension;
 bloat is not.

I guess I think of bloat as including both unremoved dead tuples and
unwanted internal free space.  If you create a giant table, delete 9
out of every 10 tuples, and vacuum, the table is still bloated, IMV.

 I wonder, though, if we should set a less-than-100 fillfactor for heap
 relations.  Just like default_statistic_target, it seems that the
 default value should be a conservative tradeoff between two extremes.
 This doesn't help extension for bulk insertions a lot, of course, but
 it'd be useful for tables where HOT updates happen with some regularity.

Perhaps, but in theory that should be self-correcting: the data should
spread itself onto the number of pages where HOT pruning is able to
prevent further relation extension.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, May 2, 2012 at 12:26 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Hm.  I see those two things as different -- to me, bloat is unremoved
 dead tuples, whereas slack space would be free space that can be reused
 by new tuples.  Slack space is useful as it avoids relation extension;
 bloat is not.

 I guess I think of bloat as including both unremoved dead tuples and
 unwanted internal free space.  If you create a giant table, delete 9
 out of every 10 tuples, and vacuum, the table is still bloated, IMV.

The difficulty is to tell the difference between useless free space and
useful free space.  If there's a reasonable probability of putting new
data into a given chunk of free space in the near future, it's not
bloat.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié may 02 12:37:35 -0400 2012:
 
 On Wed, May 2, 2012 at 12:26 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of mié may 02 08:14:36 -0400 2012:
  On Wed, May 2, 2012 at 7:16 AM, Jeroen Vermeulen j...@xs4all.nl wrote:
   On 2012-05-01 22:06, Robert Haas wrote:
   It might also be interesting to provide a mechanism to pre-extend a
   relation to a certain number of blocks, though if we did that we'd
   have to make sure that autovac got the memo not to truncate those
   pages away again.
  
   Good point.  And just to check before skipping over it, do we know that
   autovacuum not leaving enough slack space is not a significant cause of 
   the
   bottlenecks in the first place?
 
  I'm not sure exactly what you mean by this: autovacuum doesn't need
  any slack space.  Regular DML operations can certainly benefit from
  slack space, both within each page and overall within the relation.
  But there's no evidence that vacuum is doing too good a job cleaning
  up the mess, forcing the relation to be re-extended.  Rather, the
  usual (and frequent) complaint is that vacuum is leaving way too much
  slack space - i.e. bloat.
 
  Hm.  I see those two things as different -- to me, bloat is unremoved
  dead tuples, whereas slack space would be free space that can be reused
  by new tuples.  Slack space is useful as it avoids relation extension;
  bloat is not.
 
 I guess I think of bloat as including both unremoved dead tuples and
 unwanted internal free space.  If you create a giant table, delete 9
 out of every 10 tuples, and vacuum, the table is still bloated, IMV.

Agreed.  Perhaps to solve this issue what we need is a way to migrate
tuples from later pages into earlier ones.  (This was one of the points,
never resolved, that we discussed during the VACUUM FULL rework.)

  I wonder, though, if we should set a less-than-100 fillfactor for heap
  relations.  Just like default_statistic_target, it seems that the
  default value should be a conservative tradeoff between two extremes.
  This doesn't help extension for bulk insertions a lot, of course, but
  it'd be useful for tables where HOT updates happen with some regularity.
 
 Perhaps, but in theory that should be self-correcting: the data should
 spread itself onto the number of pages where HOT pruning is able to
 prevent further relation extension.

True.  I gather you consider that the cases where it doesn't happen due
to particular conditions are the ones that need manual tweaking.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 12:46 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Agreed.  Perhaps to solve this issue what we need is a way to migrate
 tuples from later pages into earlier ones.  (This was one of the points,
 never resolved, that we discussed during the VACUUM FULL rework.)

Yeah, I agree.  And frankly, we need to find a way to make it work
without taking AccessExclusiveLock on the relation.  Having to run
VACUUM FULL is killing actual users and scaring off potential ones.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié may 02 12:55:17 -0400 2012:
 
 On Wed, May 2, 2012 at 12:46 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Agreed.  Perhaps to solve this issue what we need is a way to migrate
  tuples from later pages into earlier ones.  (This was one of the points,
  never resolved, that we discussed during the VACUUM FULL rework.)
 
 Yeah, I agree.  And frankly, we need to find a way to make it work
 without taking AccessExclusiveLock on the relation.  Having to run
 VACUUM FULL is killing actual users and scaring off potential ones.

And ideally without bloating the indexes while at it.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Simon Riggs
On Tue, May 1, 2012 at 3:08 PM, Robert Haas robertmh...@gmail.com wrote:

 We've previously discussed the possible desirability of extending
 relations in larger increments, rather than one block at a time, for
 performance reasons.  I attempted to determine how much performance we
 could possibly buy this way, and found that, as far as I can see, the
 answer is, basically, none.

Fair enough, but my understanding was that tests showed that the
extension lock was a bottleneck, so doing extensions in larger chunks
should reduce the time we spend waiting for a lock and thus improve
performance. So while your results here show no gain, there is gain to
be had elsewhere as a result.

The algorithm needs to be adaptive, so we don't waste space on smaller
tables or lower loads.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Andres Freund
On Tuesday, May 01, 2012 04:08:27 PM Robert Haas wrote:
 We've previously discussed the possible desirability of extending
 relations in larger increments, rather than one block at a time, for
 performance reasons.  I attempted to determine how much performance we
 could possibly buy this way, and found that, as far as I can see, the
 answer is, basically, none.  I wrote a test program which does writes
 until it reaches 1GB, and times how long the writes take in aggregate.
  Then it performs a single fdatasync at the end and times that as
 well.  On some of the machines it is slightly faster in the aggregate
 to extend in larger chunks, but the magnitude of the change is little
 enough that, at least to me, it seems entirely not worth bothering
 with.  Some results are below.  Now, one thing that this test doesn't
 help much with is the theory that it's better to extend a file in
 larger chunks because the file will become less fragmented on disk.  I
 don't really know how to factor that effect into the test - any ideas?
I think that test disregards the fact that were holding an exclusive lock 
during the file extension which happens rather frequently if you have multiple 
COPYs or similar running at the same time. Extending in bigger chunks reduces 
the frequency of taking that lock.

I really would *love* to see improvements in that kind of workload.

 I also considered two other methods of extending a file.  First, there
 is ftruncate().  It's really fast.  Unfortunately, it's unsuitable for
 our purposes because it will cheerfully leave holes in the file, and
 part of the reason for our current implementation is to make sure that
 there are no holes, so that later writes to the file can't fail for
 lack of disk space.  So that's no good.  Second, and more
 interestingly, there is a function called posix_fallocate().  It is
 present on Linux but not on MacOS X; I haven't checked any other
 platforms.  It claims that it will extend a file out to a particular
 size, forcing disk blocks to be allocated so that later writes won't
 fail.  Testing (more details below) shows that posix_fallocate() is
 quite efficient for large chunks.  For example, extending a file to
 1GB in size 64 blocks at a time (that is, 256kB at a time) took only
 ~60 ms and the subsequent fdatasync took almost no time at all,
 whereas zero-filling the file out 1GB using write() took 600-700 ms
 and the subsequent fdatasync took another 4-5 seconds.That seems
 like a pretty sizable win, and it's not too hard to imagine that it
 could be even better when the I/O subsystem is busy.  Unfortunately,
 using posix_fallocate() for 8kB chunks seems to be significantly less
 efficient than our current method - I'm guessing that it actually
 writes the updated metadata back to disk, where write() does not (this
 makes one wonder how safe it is to count on write to have the behavior
 we need here in the first place).
Currently the write() doesn't need to be crashsafe because it will be repeated 
on crash-recovery and a checkpoint will fsync the file.

I don't really see why it would need to compare in the 8kb case. What reason 
would there be to further extend in that small increments?

There is the question whether this should be done in the background though, so 
the relation extension lock is never hit in anything time-critical...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Robert Haas
On Tue, May 1, 2012 at 10:22 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Fair enough, but my understanding was that tests showed that the
 extension lock was a bottleneck, so doing extensions in larger chunks
 should reduce the time we spend waiting for a lock and thus improve
 performance. So while your results here show no gain, there is gain to
 be had elsewhere as a result.

Maybe, but I'm skeptical.  There are a few cases - such as the problem
I fixed with SInvalReadLock - where the actual overhead of taking and
releasing the lock is the problem.  ProcArrayLock has some
as-yet-unfixed issues in this area as well.  But most of our locking
bottlenecks are caused by doing too much work while holding the lock,
not by acquiring and releasing it too frequently.  A single lock
manager partition can cope with upwards of 30k acquire/release cycles
per second, which would amount to 240MB/sec of file extension.  It
seems very unlikely that we're hitting our head against that ceiling,
although maybe you'd like to suggest a test case.   Rather, I suspect
that it's just plain taking too long to perform the actual extension.
If we could extend by 8 blocks in only 4 times the time it takes to
extend by 1 block, then obviously there would be a win available, but
the test results suggest that isn't the case.

...thinks...

Maybe the solution here isn't extending in larger chunks, but allowing
several extensions to proceed in parallel.  It seems likely that a big
chunk of the system call time is being used up waiting for write() to
copy data from user space to kernel space, and there's no reason
several backends couldn't do that in parallel.  I think it would be
sufficient to ensure that nobody starts using block N until the
initial writes of all blocks  N have completed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Robert Haas
On Tue, May 1, 2012 at 10:31 AM, Andres Freund and...@anarazel.de wrote:
 efficient than our current method - I'm guessing that it actually
 writes the updated metadata back to disk, where write() does not (this
 makes one wonder how safe it is to count on write to have the behavior
 we need here in the first place).
 Currently the write() doesn't need to be crashsafe because it will be repeated
 on crash-recovery and a checkpoint will fsync the file.

That's not what I'm worried about.  If the write() succeeds and then a
subsequent close() on the filehandle reports an ENOSPC condition that
means the write didn't really write after all, I am concerned that we
might not handle that cleanly.

 I don't really see why it would need to compare in the 8kb case. What reason
 would there be to further extend in that small increments?

In previous discussions, the concern has been that holding the
relation extension lock across a multi-block extension would cause
latency spikes for both the process doing the extensions and any other
concurrent processes that need the lock.  Obviously if it were
possible to extend by 64kB in the same time it takes to extend by 8kB
that would be awesome, but if it takes eight times longer then things
don't look so good.

 There is the question whether this should be done in the background though, so
 the relation extension lock is never hit in anything time-critical...

Yeah, although I'm fuzzy on how and whether that can be made to work,
which is not to say that it can't.

It might also be interesting to provide a mechanism to pre-extend a
relation to a certain number of blocks, though if we did that we'd
have to make sure that autovac got the memo not to truncate those
pages away again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Andres Freund
On Tuesday, May 01, 2012 05:06:11 PM Robert Haas wrote:
 On Tue, May 1, 2012 at 10:31 AM, Andres Freund and...@anarazel.de wrote:
  efficient than our current method - I'm guessing that it actually
  writes the updated metadata back to disk, where write() does not (this
  makes one wonder how safe it is to count on write to have the behavior
  we need here in the first place).
  
  Currently the write() doesn't need to be crashsafe because it will be
  repeated on crash-recovery and a checkpoint will fsync the file.
 
 That's not what I'm worried about.  If the write() succeeds and then a
 subsequent close() on the filehandle reports an ENOSPC condition that
 means the write didn't really write after all, I am concerned that we
 might not handle that cleanly.
Hm. While write() might not write its state to disk I don't think that can 
imply than that the *in memory* state is inconsistent.
Posix doesn't allow ENOSPC for close() as far as I can see.

  I don't really see why it would need to compare in the 8kb case. What
  reason would there be to further extend in that small increments?
 In previous discussions, the concern has been that holding the
 relation extension lock across a multi-block extension would cause
 latency spikes for both the process doing the extensions and any other
 concurrent processes that need the lock.  Obviously if it were
 possible to extend by 64kB in the same time it takes to extend by 8kB
 that would be awesome, but if it takes eight times longer then things
 don't look so good.
Yes, sure.

  There is the question whether this should be done in the background
  though, so the relation extension lock is never hit in anything
  time-critical...
 Yeah, although I'm fuzzy on how and whether that can be made to work,
 which is not to say that it can't.
The biggest problem I see is knowing when to trigger the extension of which 
file without scanning files all the time.

Using some limited size shm-queue of {reltblspc, relfilenode} of to-be-
extended files + a latch is the first thing I can think of. Every time a 
backend initializes a page with offset % EXTEND_SIZE == 0 it adds that table 
to the queue. The background writer extends the file by EXTEND_SIZE * 2 if 
necessary. If the queue is overflown all files are checked. Or the backends 
extend themselves again...
EXTEND_SIZE should probably scale with the table size up to 64MB or so...

 It might also be interesting to provide a mechanism to pre-extend a
 relation to a certain number of blocks, though if we did that we'd
 have to make sure that autovac got the memo not to truncate those
 pages away again.
Hm. I have to say I don't really see a big need to do this if the size of 
preallocation is adaptive to the file size. Sounds like it would add to much 
complications for little benefit.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Robert Haas
On Tue, May 1, 2012 at 11:42 AM, Andres Freund and...@anarazel.de wrote:
  There is the question whether this should be done in the background
  though, so the relation extension lock is never hit in anything
  time-critical...
 Yeah, although I'm fuzzy on how and whether that can be made to work,
 which is not to say that it can't.
 The biggest problem I see is knowing when to trigger the extension of which
 file without scanning files all the time.

 Using some limited size shm-queue of {reltblspc, relfilenode} of to-be-
 extended files + a latch is the first thing I can think of.

Perhaps.  An in-memory cache of file sizes would also let us eliminate
a pile of lseek system calls, but the trick is that the
synchronization can't be anything simple like a spinlock - Linux has
does it that way in the kernel in versions = 3.2 and it's
catastrophic on short read-only transactions.

I think the first thing we need here is a good test case, so we're
clear on what we're trying to solve.  I was just hoping to make file
extension *faster* and what you and Simon are talking about is making
it scale better in the face of heavy parallelism; obviously it would
be nice to do both things, but they are different problems.  Any old
bulk-loading test will benefit from a raw performance improvement, but
to test a scalability improvement we would need some kind of test case
involving parallel bulk loads, or some other kind of parallel activity
that causes rapid table growth.  That's not something I've frequently
run into, but I'd be willing to put a bit of time into it if we can
nail down what we're talking about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Christopher Browne
On Tue, May 1, 2012 at 10:22 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, May 1, 2012 at 3:08 PM, Robert Haas robertmh...@gmail.com wrote:

 We've previously discussed the possible desirability of extending
 relations in larger increments, rather than one block at a time, for
 performance reasons.  I attempted to determine how much performance we
 could possibly buy this way, and found that, as far as I can see, the
 answer is, basically, none.

 Fair enough, but my understanding was that tests showed that the
 extension lock was a bottleneck, so doing extensions in larger chunks
 should reduce the time we spend waiting for a lock and thus improve
 performance. So while your results here show no gain, there is gain to
 be had elsewhere as a result.

Try to make sure that one of the scenarios involves there being
multiple writers.

One of the relevant cases is where there are multiple writers, where
they may wind up fighting over the last page in the table.  If free
space is in that one page, they might serialize on the request for
access to that page.

Extending by several pages at a time may help *one* of the writers,
but if that adds some I/O work that takes place while other
connections are blocked, waiting, then this might worsen things
somewhat for the other writers.  If this reduces the frequency of that
sort of collision, it might nonetheless be a win.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Tom Lane
Christopher Browne cbbro...@gmail.com writes:
 Try to make sure that one of the scenarios involves there being
 multiple writers.

 One of the relevant cases is where there are multiple writers, where
 they may wind up fighting over the last page in the table.  If free
 space is in that one page, they might serialize on the request for
 access to that page.

The system is supposed to be designed to avoid that; we try to hand out
pages with free space to different backends.  One of the advantages of
the current page-at-a-time setup is that when there is *no* free space
according to the FSM, each such backend will create and fill its own
page independently.  They do serialize on the extension lock, but the
intervening tuple additions are parallelized.  We have to be careful to
not make that case worse in a multi-page-extension scheme.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mar may 01 13:19:48 -0400 2012:
 Christopher Browne cbbro...@gmail.com writes:
  Try to make sure that one of the scenarios involves there being
  multiple writers.
 
  One of the relevant cases is where there are multiple writers, where
  they may wind up fighting over the last page in the table.  If free
  space is in that one page, they might serialize on the request for
  access to that page.
 
 The system is supposed to be designed to avoid that; we try to hand out
 pages with free space to different backends.  One of the advantages of
 the current page-at-a-time setup is that when there is *no* free space
 according to the FSM, each such backend will create and fill its own
 page independently.  They do serialize on the extension lock, but the
 intervening tuple additions are parallelized.  We have to be careful to
 not make that case worse in a multi-page-extension scheme.

This didn't work all that well for toast tables that have lots of very
large tuples inserted concurrently, at least with older releases.  Each
backend would acquire the extension lock many times per tuple inserted.
The result was really high contention on the extension lock.

I haven't tested it recently to see if things have already improved,
though.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Robert Haas
On Tue, May 1, 2012 at 3:01 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 The system is supposed to be designed to avoid that; we try to hand out
 pages with free space to different backends.  One of the advantages of
 the current page-at-a-time setup is that when there is *no* free space
 according to the FSM, each such backend will create and fill its own
 page independently.  They do serialize on the extension lock, but the
 intervening tuple additions are parallelized.  We have to be careful to
 not make that case worse in a multi-page-extension scheme.

 This didn't work all that well for toast tables that have lots of very
 large tuples inserted concurrently, at least with older releases.  Each
 backend would acquire the extension lock many times per tuple inserted.
 The result was really high contention on the extension lock.

Hmm, this sounds like a good test case - a small number of SQL
statements triggering a large amount of relation extension work.
Using an unlogged table would probably make it easier to see the
relation-extension contention, too, since you'd get WALInsertLock
mostly out of the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 I think the first thing we need here is a good test case, so we're
 clear on what we're trying to solve.  I was just hoping to make file
 extension *faster* and what you and Simon are talking about is making
 it scale better in the face of heavy parallelism; obviously it would
 be nice to do both things, but they are different problems.  Any old
 bulk-loading test will benefit from a raw performance improvement, but
 to test a scalability improvement we would need some kind of test case
 involving parallel bulk loads, or some other kind of parallel activity
 that causes rapid table growth.  That's not something I've frequently
 run into, but I'd be willing to put a bit of time into it if we can
 nail down what we're talking about.

Try loading the TIGER 2011 data into a single table, where you load each
county (or perhaps state) in a separate, parallel, process.  That's what
I was doing when I hit this lock full-force and bitched about it on this
list.

I'd be happy to help construct that case, as well as test any changes in
this area which might help address it (on a 10G SSD-backed SAN..).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Robert Haas
On Tue, May 1, 2012 at 3:48 PM, Stephen Frost sfr...@snowman.net wrote:
 Try loading the TIGER 2011 data into a single table, where you load each
 county (or perhaps state) in a separate, parallel, process.  That's what
 I was doing when I hit this lock full-force and bitched about it on this
 list.

 I'd be happy to help construct that case, as well as test any changes in
 this area which might help address it (on a 10G SSD-backed SAN..).

Hmm, can you ship me one of those 10G SSD-backed SANs?  That'd be awesome.

In all seriousness, this is not a great test case unless you can
provide some scripts to make it easy to run it in a reproducible
fashion.  Can you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extending relations more efficiently

2012-05-01 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, May 1, 2012 at 3:48 PM, Stephen Frost sfr...@snowman.net wrote:
  I'd be happy to help construct that case, as well as test any changes in
  this area which might help address it (on a 10G SSD-backed SAN..).
 
 Hmm, can you ship me one of those 10G SSD-backed SANs?  That'd be awesome.

Yeah...  Not cheap. :)

 In all seriousness, this is not a great test case unless you can
 provide some scripts to make it easy to run it in a reproducible
 fashion.  Can you?

Yeah, sure, I'll do that.  The PostGIS folks have scripts, but they're
kind of ugly, tbh..  I'll give you what I used.

Thanks,

Stephen


signature.asc
Description: Digital signature