Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-11 Thread Scott Marlowe
On Sun, Apr 10, 2011 at 8:29 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 04/09/2011 01:23 PM, Chris Ruprecht wrote:

 Maybe, in a future release, somebody will develop something that can
 create indexes as inactive and have a build tool build and activate them at
 the same time. Food for thought?


 Well, the most common case where this sort of thing happens is when people
 are using pg_restore to load a dump of an entire database.  In that case,
 you can use -j to run more than one loader job in parallel, which can
 easily end up doing a bunch of index builds at once, particularly at the
 end.  That already works about as well as it can because of the synchronized
 scan feature Tom mentioned.

FYI, in 8.3.13 I get this for all but one index:

ERROR:  deadlock detected
DETAIL:  Process 24488 waits for ShareLock on virtual transaction
64/825033; blocked by process 27505.
Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165
of database 278059474; blocked by process 24488.

I'll try it on a big server running 8.4 and see what happens.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-11 Thread Scott Marlowe
On Sun, Apr 10, 2011 at 11:35 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Sun, Apr 10, 2011 at 8:29 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 04/09/2011 01:23 PM, Chris Ruprecht wrote:

 Maybe, in a future release, somebody will develop something that can
 create indexes as inactive and have a build tool build and activate them at
 the same time. Food for thought?


 Well, the most common case where this sort of thing happens is when people
 are using pg_restore to load a dump of an entire database.  In that case,
 you can use -j to run more than one loader job in parallel, which can
 easily end up doing a bunch of index builds at once, particularly at the
 end.  That already works about as well as it can because of the synchronized
 scan feature Tom mentioned.

 FYI, in 8.3.13 I get this for all but one index:

 ERROR:  deadlock detected
 DETAIL:  Process 24488 waits for ShareLock on virtual transaction
 64/825033; blocked by process 27505.
 Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165
 of database 278059474; blocked by process 24488.

 I'll try it on a big server running 8.4 and see what happens.

Same error on pg 8.4.6

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-11 Thread Greg Smith

Scott Marlowe wrote:

FYI, in 8.3.13 I get this for all but one index:

ERROR:  deadlock detected
DETAIL:  Process 24488 waits for ShareLock on virtual transaction
64/825033; blocked by process 27505.
Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165
of database 278059474; blocked by process 24488.
  


Is that trying to build them by hand?  The upthread request here is 
actually already on the TODO list at 
http://wiki.postgresql.org/wiki/Todo and it talks a bit about what works 
and what doesn't right now:


Allow multiple indexes to be created concurrently, ideally via a single 
heap scan
-pg_restore allows parallel index builds, but it is done via 
subprocesses, and there is no SQL interface for this.


This whole idea was all the rage on these lists circa early 2008, but 
parallel restore seems to have satisfied enough of the demand in this 
general area that it doesn't come up quite as much now.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Marlowe wrote:

 FYI, in 8.3.13 I get this for all but one index:

 ERROR:  deadlock detected
 DETAIL:  Process 24488 waits for ShareLock on virtual transaction
 64/825033; blocked by process 27505.
 Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165
 of database 278059474; blocked by process 24488.


 Is that trying to build them by hand?  The upthread request here is actually
 already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it
 talks a bit about what works and what doesn't right now:

Yes, by hand.  It creates an entry for the index but lists but marks
it as INVALID

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-11 Thread Shaun Thomas

On 04/09/2011 11:28 AM, Chris Ruprecht wrote:


I'm wondering if there is a way to build these indexes in parallel
while reading the table only once for all indexes and building them
all at the same time. Is there an index build tool that I missed
somehow, that can do this?


I threw together a very crude duo of shell scripts to do this. I've 
attached them for you. To use them, you make a file named tablist.txt 
which contains the names of all the tables you want to reindex, and then 
you run them like this:


bash generate_rebuild_scripts.sh my_database 8
bash launch_rebuild_scripts.sh my_database

The first one in the above example would connect to my_database and 
create eight scripts that would run in parallel, with indexes ordered 
smallest to largest to prevent one script from getting stuck with 
several large indexes while the rest got small ones. The second script 
just launches them and makes a log directory so you can watch the progress.


I've run this with up to 16 concurrent threads without major issue. It 
comes in handy.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email


generate_rebuild_scripts.sh
Description: Bourne shell script


launch_rebuild_scripts.sh
Description: Bourne shell script

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-11 Thread Tom Lane
Scott Marlowe scott.marl...@gmail.com writes:
 On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Marlowe wrote:
 FYI, in 8.3.13 I get this for all but one index:
 ERROR:  deadlock detected

 Is that trying to build them by hand?  The upthread request here is actually
 already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it
 talks a bit about what works and what doesn't right now:

 Yes, by hand.  It creates an entry for the index but lists but marks
 it as INVALID

Are you trying to use CREATE INDEX CONCURRENTLY?  AFAIR that doesn't
support multiple index creations at the same time.  Usually you wouldn't
want that combination anyway, since the point of CREATE INDEX
CONCURRENTLY is to not prevent foreground use of the table while you're
making the index --- and multiple index creations are probably going to
eat enough I/O that you shouldn't be doing them during normal operations
anyhow.

Just use plain CREATE INDEX.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-11 Thread Scott Marlowe
On Mon, Apr 11, 2011 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Scott Marlowe scott.marl...@gmail.com writes:
 On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Marlowe wrote:
 FYI, in 8.3.13 I get this for all but one index:
 ERROR:  deadlock detected

 Is that trying to build them by hand?  The upthread request here is actually
 already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it
 talks a bit about what works and what doesn't right now:

 Yes, by hand.  It creates an entry for the index but lists but marks
 it as INVALID

 Are you trying to use CREATE INDEX CONCURRENTLY?  AFAIR that doesn't
 support multiple index creations at the same time.  Usually you wouldn't
 want that combination anyway, since the point of CREATE INDEX
 CONCURRENTLY is to not prevent foreground use of the table while you're
 making the index --- and multiple index creations are probably going to
 eat enough I/O that you shouldn't be doing them during normal operations
 anyhow.

 Just use plain CREATE INDEX.

I thought they'd stand in line waiting on each other.  I'll give it a try.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Multiple index builds on same table - in one sweep?

2011-04-09 Thread Chris Ruprecht
I have a table that I need to rebuild indexes on from time to time (records get 
loaded before indexes get build).

To build the indexes, I use 'create index ...', which reads the entire table 
and builds the index, one at a time.
I'm wondering if there is a way to build these indexes in parallel while 
reading the table only once for all indexes and building them all at the same 
time. Is there an index build tool that I missed somehow, that can do this?

Thanks,
Chris. 



best regards,
chris
-- 
chris ruprecht
database grunt and bit pusher extraordinaíre


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-09 Thread Tom Lane
Chris Ruprecht ch...@ruprecht.org writes:
 I have a table that I need to rebuild indexes on from time to time (records 
 get loaded before indexes get build).
 To build the indexes, I use 'create index ...', which reads the entire table 
 and builds the index, one at a time.
 I'm wondering if there is a way to build these indexes in parallel while 
 reading the table only once for all indexes and building them all at the same 
 time. Is there an index build tool that I missed somehow, that can do this?

I don't know of any automated tool, but if you launch several CREATE
INDEX operations on the same table at approximately the same time (in
separate sessions), they should share the I/O required to read the
table.  (The synchronized scans feature guarantees this in recent
PG releases, even if you're not very careful about starting them at
the same time.)

The downside of that is that you need N times the working memory and
you will have N times the subsidiary I/O for sort temp files and writes
to the finished indexes.  Depending on the characteristics of your I/O
system it's not hard to imagine this being a net loss ... but it'd be
interesting to experiment.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multiple index builds on same table - in one sweep?

2011-04-09 Thread Chris Ruprecht
I'm running 2 tests now, one, where I'm doing the traditional indexing, in 
sequence. The server isn't doing anything else, so I should get pretty accurate 
results.
Test 2 will win all the create index sessions in separate sessions in parallel 
(echo create index ...|psql ...  ) once the 'serial build' test is done.

Maybe, in a future release, somebody will develop something that can create 
indexes as inactive and have a build tool build and activate them at the same 
time. Food for thought?
 
On Apr 9, 2011, at 13:10 , Tom Lane wrote:

 Chris Ruprecht ch...@ruprecht.org writes:
 I have a table that I need to rebuild indexes on from time to time (records 
 get loaded before indexes get build).
 To build the indexes, I use 'create index ...', which reads the entire table 
 and builds the index, one at a time.
 I'm wondering if there is a way to build these indexes in parallel while 
 reading the table only once for all indexes and building them all at the 
 same time. Is there an index build tool that I missed somehow, that can do 
 this?
 
 I don't know of any automated tool, but if you launch several CREATE
 INDEX operations on the same table at approximately the same time (in
 separate sessions), they should share the I/O required to read the
 table.  (The synchronized scans feature guarantees this in recent
 PG releases, even if you're not very careful about starting them at
 the same time.)
 
 The downside of that is that you need N times the working memory and
 you will have N times the subsidiary I/O for sort temp files and writes
 to the finished indexes.  Depending on the characteristics of your I/O
 system it's not hard to imagine this being a net loss ... but it'd be
 interesting to experiment.
 
   regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance