Re: [PERFORM] Building multiple indexes concurrently

2010-03-22 Thread Justin Pitts
It seems to me that a separate partition / tablespace would be a much simpler 
approach.
On Mar 17, 2010, at 5:18 PM, Hannu Krosing wrote:

 On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
 Alvaro Herrera wrote:
 Andres Freund escribió:
 
 
 I find it way much easier to believe such issues exist on a tables in 
 constrast to indexes. The likelihood to get sequential accesses on an 
 index is 
 small enough on a big table to make it unlikely to matter much.
 
 
 Vacuum walks indexes sequentially, for one.
 
 
 That and index-based range scans were the main two use-cases I was 
 concerned would be degraded by interleaving index builds, compared with 
 doing them in succession. 
 
 I guess that tweaking file systems to allocate in bigger chunks help
 here ? I know that xfs can be tuned in that regard, but how about other
 common file systems like ext3 ?
 
 - 
 Hannu Krosing   http://www.2ndQuadrant.com
 PostgreSQL Scalability and Availability 
   Services, Consulting and Training
 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-22 Thread Hannu Krosing
On Thu, 2010-03-18 at 16:12 -0400, Justin Pitts wrote:
 It seems to me that a separate partition / tablespace would be a much simpler 
 approach.

Do you mean a separate partition/ tablespace for _each_ index built
concurrently ?

 On Mar 17, 2010, at 5:18 PM, Hannu Krosing wrote:
 
  On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
  Alvaro Herrera wrote:
  Andres Freund escribió:
  
  
  I find it way much easier to believe such issues exist on a tables in 
  constrast to indexes. The likelihood to get sequential accesses on an 
  index is 
  small enough on a big table to make it unlikely to matter much.
  
  
  Vacuum walks indexes sequentially, for one.
  
  
  That and index-based range scans were the main two use-cases I was 
  concerned would be degraded by interleaving index builds, compared with 
  doing them in succession. 
  
  I guess that tweaking file systems to allocate in bigger chunks help
  here ? I know that xfs can be tuned in that regard, but how about other
  common file systems like ext3 ?
  
  - 
  Hannu Krosing   http://www.2ndQuadrant.com
  PostgreSQL Scalability and Availability 
Services, Consulting and Training
  
  
  
  -- 
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-22 Thread Justin Pitts
Yes.
On Mar 18, 2010, at 5:20 PM, Hannu Krosing wrote:

 On Thu, 2010-03-18 at 16:12 -0400, Justin Pitts wrote:
 It seems to me that a separate partition / tablespace would be a much 
 simpler approach.
 
 Do you mean a separate partition/ tablespace for _each_ index built
 concurrently ?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-19 Thread Andres Freund
On Wednesday 17 March 2010 22:18:47 Hannu Krosing wrote:
 On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
  Alvaro Herrera wrote:
   Andres Freund escribió:
   I find it way much easier to believe such issues exist on a tables in
   constrast to indexes. The likelihood to get sequential accesses on an
   index is small enough on a big table to make it unlikely to matter
   much.
   
   Vacuum walks indexes sequentially, for one.
  
  That and index-based range scans were the main two use-cases I was
  concerned would be degraded by interleaving index builds, compared with
  doing them in succession.
 
 I guess that tweaking file systems to allocate in bigger chunks help
 here ? I know that xfs can be tuned in that regard, but how about other
 common file systems like ext3 ?
ext4 should do that now by allocating the space for the files only after some 
time or uppon things like fsync (xfs does the same).
ext3 has, as far as I know, neither the ability to change allocation size nor 
can do delayed allocation.

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-18 Thread Hannu Krosing
On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
 Alvaro Herrera wrote:
  Andres Freund escribió:
 

  I find it way much easier to believe such issues exist on a tables in 
  constrast to indexes. The likelihood to get sequential accesses on an 
  index is 
  small enough on a big table to make it unlikely to matter much.
  
 
  Vacuum walks indexes sequentially, for one.

 
 That and index-based range scans were the main two use-cases I was 
 concerned would be degraded by interleaving index builds, compared with 
 doing them in succession. 

I guess that tweaking file systems to allocate in bigger chunks help
here ? I know that xfs can be tuned in that regard, but how about other
common file systems like ext3 ?

- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Greg Smith

Rob Wultsch wrote:

Are there any particular performance optimizations that would be in
play in such a scenario?
  


You'd want to increase maintenance_work_mem significantly, just for the 
sessions that are running these.  Something like this:


|SET maintenance_work_mem = '1GB';|

I don't know if that's a huge or tiny number relative to total RAM in 
your server, you get the idea though.


Also, you should have a larger than default value for 
checkpoint_segments in advance of this.  That you can't set per session, 
but you can adjust the value in the postgresql.conf and request a 
configuration reload--don't actually need to disrupt server operation by 
restarting to do it.  This will work for that:


pg_ctl reload



At a minimum I assume that if both of the commands were started at
about the same time they would each scan the table in the same
direction and whichever creation was slower would benefit from most of
the table data it needed being prepopulated in shared buffers. Is this
the case?
  


This might be optimistic; whether it will be the case depends a lot on 
how large your shared_buffers and OS buffer cache are relative to the 
table involved.  To pick an extreme example to demonstrate what I mean, 
if shared_buffers is the common default of 32MB, your table is 1TB, and 
you have a giant disk array that reads fast, it's not very likely that 
the second scan is going to find anything of interest left behind by the 
first one.  You could try and make some rough estimates of how long it 
will take to fill your RAM with table data at the expected I/O rate and 
guess how likely overlap is.


There's a trade-off here, which is that in return for making it possible 
the data you need to rebuild the index is more likely to be in RAM when 
you need it by building two at once, the resulting indexes are likely to 
end up interleaved on disk as they are written out.  If you're doing a 
lot of index scans, the increased seek penalties for that may ultimately 
make you regret having combined the two.  Really impossible to predict 
which approach is going to be better long term without gathering so much 
data that you might as well try and benchmark it on a test system 
instead if you can instead.  I am not a big fan of presuming one can 
predict performance instead of measuring it for complicated cases.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Rob Wultsch wrote:
 At a minimum I assume that if both of the commands were started at
 about the same time they would each scan the table in the same
 direction and whichever creation was slower would benefit from most of
 the table data it needed being prepopulated in shared buffers. Is this
 the case?

 This might be optimistic;

No, it's not optimistic in the least, at least not since we implemented
synchronized seqscans (in 8.3 or thereabouts).

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Rob Wultsch
On Wed, Mar 17, 2010 at 7:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Smith g...@2ndquadrant.com writes:
 Rob Wultsch wrote:
 At a minimum I assume that if both of the commands were started at
 about the same time they would each scan the table in the same
 direction and whichever creation was slower would benefit from most of
 the table data it needed being prepopulated in shared buffers. Is this
 the case?

 This might be optimistic;

 No, it's not optimistic in the least, at least not since we implemented
 synchronized seqscans (in 8.3 or thereabouts).

                        regards, tom lane


Where can I find details about this in the documentation?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Andres Freund
On Wednesday 17 March 2010 19:44:56 Greg Smith wrote:
 Rob Wultsch wrote:
  On Wed, Mar 17, 2010 at 7:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  No, it's not optimistic in the least, at least not since we implemented
  synchronized seqscans (in 8.3 or thereabouts).
  
  Where can I find details about this in the documentation?
 
 It's a behind the scenes optimization so it's not really documented on
 the user side very well as far as I know; easy to forget it's even there
 as I did this morning.
 http://j-davis.com/postgresql/syncscan/syncscan.pdf is a presentation
 covering it, and http://j-davis.com/postgresql/83v82_scans.html is also
 helpful.
 
 While my pessimism on this part may have been overwrought, note the
 message interleaved on the list today with this discussion from Bob
 Lunney discussing the other issue I brought up:  When using 8-way
 parallel restore against a six-disk RAID 10 group I found that table and
 index scan performance dropped by about 10x.  I/O performance was
 restored by either clustering the tables one at a time, or by dropping
 and restoring them one at a time.  The only reason I can come up with
 for this behavior is file fragmentation and increased seek times.  Now,
 Bob's situation may very well involve a heavy dose of table
 fragmentation from multiple active loading processes rather than index
 fragmentation, but this class of problem is common when trying to do too
 many things at the same time.  I'd hate to see you chase a short-term
 optimization (reduce total index built time) at the expense of long-term
 overhead (resulting indexes are not as efficient to scan).
I find it way much easier to believe such issues exist on a tables in 
constrast to indexes. The likelihood to get sequential accesses on an index is 
small enough on a big table to make it unlikely to matter much.

Whats your theory to make it matter much?

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Alvaro Herrera
Andres Freund escribió:

 I find it way much easier to believe such issues exist on a tables in 
 constrast to indexes. The likelihood to get sequential accesses on an index 
 is 
 small enough on a big table to make it unlikely to matter much.

Vacuum walks indexes sequentially, for one.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Greg Smith

Alvaro Herrera wrote:

Andres Freund escribió:

  
I find it way much easier to believe such issues exist on a tables in 
constrast to indexes. The likelihood to get sequential accesses on an index is 
small enough on a big table to make it unlikely to matter much.



Vacuum walks indexes sequentially, for one.
  


That and index-based range scans were the main two use-cases I was 
concerned would be degraded by interleaving index builds, compared with 
doing them in succession.  I work often with time-oriented apps that 
have heavy give me every record between a and b components to 
them, and good sequential index performance can be an important 
requirement for that kind of application.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Building multiple indexes concurrently

2010-03-16 Thread Rob Wultsch
Lets say I have a large table bigTable to which I would like to add
two btree indexes. Is there a more efficient way to create indexes
than:
CREATE INDEX idx_foo on bigTable (foo);
CREATE INDEX idx_baz on bigTable (baz);
Or
CREATE INDEX CONCURRENTLY idx_foo on bigTable (foo);
CREATE INDEX CONCURRENTLY idx_baz on bigTable (baz);

Are there any particular performance optimizations that would be in
play in such a scenario?

At a minimum I assume that if both of the commands were started at
about the same time they would each scan the table in the same
direction and whichever creation was slower would benefit from most of
the table data it needed being prepopulated in shared buffers. Is this
the case?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-16 Thread Ben Chobot
On Mar 16, 2010, at 6:04 PM, Rob Wultsch wrote:

 Lets say I have a large table bigTable to which I would like to add
 two btree indexes. Is there a more efficient way to create indexes
 than:
 CREATE INDEX idx_foo on bigTable (foo);
 CREATE INDEX idx_baz on bigTable (baz);
 Or
 CREATE INDEX CONCURRENTLY idx_foo on bigTable (foo);
 CREATE INDEX CONCURRENTLY idx_baz on bigTable (baz);
 
 Are there any particular performance optimizations that would be in
 play in such a scenario?
 
 At a minimum I assume that if both of the commands were started at
 about the same time they would each scan the table in the same
 direction and whichever creation was slower would benefit from most of
 the table data it needed being prepopulated in shared buffers. Is this
 the case?

That sounds reasonable to me. You might also look at upping your 
maintenance_work_mem for your session, as well.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance