Re: [PERFORM] Multiple index builds on same table - in one sweep?
On Sun, Apr 10, 2011 at 8:29 PM, Greg Smith g...@2ndquadrant.com wrote: On 04/09/2011 01:23 PM, Chris Ruprecht wrote: Maybe, in a future release, somebody will develop something that can create indexes as inactive and have a build tool build and activate them at the same time. Food for thought? Well, the most common case where this sort of thing happens is when people are using pg_restore to load a dump of an entire database. In that case, you can use -j to run more than one loader job in parallel, which can easily end up doing a bunch of index builds at once, particularly at the end. That already works about as well as it can because of the synchronized scan feature Tom mentioned. FYI, in 8.3.13 I get this for all but one index: ERROR: deadlock detected DETAIL: Process 24488 waits for ShareLock on virtual transaction 64/825033; blocked by process 27505. Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165 of database 278059474; blocked by process 24488. I'll try it on a big server running 8.4 and see what happens. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
On Sun, Apr 10, 2011 at 11:35 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Sun, Apr 10, 2011 at 8:29 PM, Greg Smith g...@2ndquadrant.com wrote: On 04/09/2011 01:23 PM, Chris Ruprecht wrote: Maybe, in a future release, somebody will develop something that can create indexes as inactive and have a build tool build and activate them at the same time. Food for thought? Well, the most common case where this sort of thing happens is when people are using pg_restore to load a dump of an entire database. In that case, you can use -j to run more than one loader job in parallel, which can easily end up doing a bunch of index builds at once, particularly at the end. That already works about as well as it can because of the synchronized scan feature Tom mentioned. FYI, in 8.3.13 I get this for all but one index: ERROR: deadlock detected DETAIL: Process 24488 waits for ShareLock on virtual transaction 64/825033; blocked by process 27505. Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165 of database 278059474; blocked by process 24488. I'll try it on a big server running 8.4 and see what happens. Same error on pg 8.4.6 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
Scott Marlowe wrote: FYI, in 8.3.13 I get this for all but one index: ERROR: deadlock detected DETAIL: Process 24488 waits for ShareLock on virtual transaction 64/825033; blocked by process 27505. Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165 of database 278059474; blocked by process 24488. Is that trying to build them by hand? The upthread request here is actually already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it talks a bit about what works and what doesn't right now: Allow multiple indexes to be created concurrently, ideally via a single heap scan -pg_restore allows parallel index builds, but it is done via subprocesses, and there is no SQL interface for this. This whole idea was all the rage on these lists circa early 2008, but parallel restore seems to have satisfied enough of the demand in this general area that it doesn't come up quite as much now. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith g...@2ndquadrant.com wrote: Scott Marlowe wrote: FYI, in 8.3.13 I get this for all but one index: ERROR: deadlock detected DETAIL: Process 24488 waits for ShareLock on virtual transaction 64/825033; blocked by process 27505. Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165 of database 278059474; blocked by process 24488. Is that trying to build them by hand? The upthread request here is actually already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it talks a bit about what works and what doesn't right now: Yes, by hand. It creates an entry for the index but lists but marks it as INVALID -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
On 04/09/2011 11:28 AM, Chris Ruprecht wrote: I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this? I threw together a very crude duo of shell scripts to do this. I've attached them for you. To use them, you make a file named tablist.txt which contains the names of all the tables you want to reindex, and then you run them like this: bash generate_rebuild_scripts.sh my_database 8 bash launch_rebuild_scripts.sh my_database The first one in the above example would connect to my_database and create eight scripts that would run in parallel, with indexes ordered smallest to largest to prevent one script from getting stuck with several large indexes while the rest got small ones. The second script just launches them and makes a log directory so you can watch the progress. I've run this with up to 16 concurrent threads without major issue. It comes in handy. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email generate_rebuild_scripts.sh Description: Bourne shell script launch_rebuild_scripts.sh Description: Bourne shell script -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
Scott Marlowe scott.marl...@gmail.com writes: On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith g...@2ndquadrant.com wrote: Scott Marlowe wrote: FYI, in 8.3.13 I get this for all but one index: ERROR: deadlock detected Is that trying to build them by hand? The upthread request here is actually already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it talks a bit about what works and what doesn't right now: Yes, by hand. It creates an entry for the index but lists but marks it as INVALID Are you trying to use CREATE INDEX CONCURRENTLY? AFAIR that doesn't support multiple index creations at the same time. Usually you wouldn't want that combination anyway, since the point of CREATE INDEX CONCURRENTLY is to not prevent foreground use of the table while you're making the index --- and multiple index creations are probably going to eat enough I/O that you shouldn't be doing them during normal operations anyhow. Just use plain CREATE INDEX. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
On Mon, Apr 11, 2011 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Scott Marlowe scott.marl...@gmail.com writes: On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith g...@2ndquadrant.com wrote: Scott Marlowe wrote: FYI, in 8.3.13 I get this for all but one index: ERROR: deadlock detected Is that trying to build them by hand? The upthread request here is actually already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it talks a bit about what works and what doesn't right now: Yes, by hand. It creates an entry for the index but lists but marks it as INVALID Are you trying to use CREATE INDEX CONCURRENTLY? AFAIR that doesn't support multiple index creations at the same time. Usually you wouldn't want that combination anyway, since the point of CREATE INDEX CONCURRENTLY is to not prevent foreground use of the table while you're making the index --- and multiple index creations are probably going to eat enough I/O that you shouldn't be doing them during normal operations anyhow. Just use plain CREATE INDEX. I thought they'd stand in line waiting on each other. I'll give it a try. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Multiple index builds on same table - in one sweep?
I have a table that I need to rebuild indexes on from time to time (records get loaded before indexes get build). To build the indexes, I use 'create index ...', which reads the entire table and builds the index, one at a time. I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this? Thanks, Chris. best regards, chris -- chris ruprecht database grunt and bit pusher extraordinaíre -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
Chris Ruprecht ch...@ruprecht.org writes: I have a table that I need to rebuild indexes on from time to time (records get loaded before indexes get build). To build the indexes, I use 'create index ...', which reads the entire table and builds the index, one at a time. I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this? I don't know of any automated tool, but if you launch several CREATE INDEX operations on the same table at approximately the same time (in separate sessions), they should share the I/O required to read the table. (The synchronized scans feature guarantees this in recent PG releases, even if you're not very careful about starting them at the same time.) The downside of that is that you need N times the working memory and you will have N times the subsidiary I/O for sort temp files and writes to the finished indexes. Depending on the characteristics of your I/O system it's not hard to imagine this being a net loss ... but it'd be interesting to experiment. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multiple index builds on same table - in one sweep?
I'm running 2 tests now, one, where I'm doing the traditional indexing, in sequence. The server isn't doing anything else, so I should get pretty accurate results. Test 2 will win all the create index sessions in separate sessions in parallel (echo create index ...|psql ... ) once the 'serial build' test is done. Maybe, in a future release, somebody will develop something that can create indexes as inactive and have a build tool build and activate them at the same time. Food for thought? On Apr 9, 2011, at 13:10 , Tom Lane wrote: Chris Ruprecht ch...@ruprecht.org writes: I have a table that I need to rebuild indexes on from time to time (records get loaded before indexes get build). To build the indexes, I use 'create index ...', which reads the entire table and builds the index, one at a time. I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this? I don't know of any automated tool, but if you launch several CREATE INDEX operations on the same table at approximately the same time (in separate sessions), they should share the I/O required to read the table. (The synchronized scans feature guarantees this in recent PG releases, even if you're not very careful about starting them at the same time.) The downside of that is that you need N times the working memory and you will have N times the subsidiary I/O for sort temp files and writes to the finished indexes. Depending on the characteristics of your I/O system it's not hard to imagine this being a net loss ... but it'd be interesting to experiment. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance