Re: [HACKERS] Online index builds

2006-08-01 Thread Bruce Momjian

Consindering the syntax for this, we currently allow read access during
index creation, just not write access, so I think the new syntax should
be:

CREATE [ UNIQUE ] INDEX name  ON table 
[ USING method ] [ [ENABLE] WRITE [ACCESS] ]
( { column | ( expression ) } [ opclass ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace ]
[ WHERE predicate ]

This is clear, and adds no new keywords.

---

Greg Stark wrote:
 
 I just sent in the patch for online index builds to -patches.
 
 . The work to combine the two phases into a single non-transactional command
   is done. I'm not sure how long to wait between lock checks or how verbose to
   be about why it's taking so long. I do think we have to print something or
   else the DBA won't know if it's hung waiting for something external.
   Currently it prints a notice the first time it sleeps. 
 
 . Also it prints out how many tuples it found which normal index doesn't.
   Probably that message should go away. On the other hand the index stats
   probably need to be filled in.
 
 . I need to check what locks I'm taking. I think I still have some old code
   with the wrong locks in it.
 
 . this includes the tid btree opclass sent earlier (with a warning I didn't
   notice before fixed up). opr_sanity now fails but I think that's due to the
   gin commits not this opclass.
 
 . In case of an error during phase2 the invalid index is left in place. It can
   be dropped with DROP INDEX. The footwork to get it dropped in case of an
   error would be quite tricky but there's a sketch of how to do it in the 
 source.
 
 . no documentation yet, there's not much to write though.
 
 . no regression tests yet. I don't see any way to test this reasonably in the
   regression tests. I've done some testing myself by building indexes while
   pgbench is running. Then I have to do index scans to see how many records
   are returned with index scans. It wouldn't be easy to automate and even if
   it were done it wouldn't really be all that great a test. The corner cases
   found during the development are pretty narrow and will be hard to reliably
   test.
 
 -- 
 greg
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Online index builds

2006-08-01 Thread Andrew Dunstan

Bruce Momjian wrote:

Consindering the syntax for this, we currently allow read access during
index creation, just not write access, so I think the new syntax should
be:

	CREATE [ UNIQUE ] INDEX name  ON table 
	[ USING method ] [ [ENABLE] WRITE [ACCESS] ]

( { column | ( expression ) } [ opclass ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace ]
[ WHERE predicate ]

This is clear, and adds no new keywords.

  


It's not particularly clear to me. On its face this seems to me to imply 
something about how the index will be able to be used, not about how it 
is to be built.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Online index builds

2006-08-01 Thread Bruce Momjian
Andrew Dunstan wrote:
 Bruce Momjian wrote:
  Consindering the syntax for this, we currently allow read access during
  index creation, just not write access, so I think the new syntax should
  be:
 
  CREATE [ UNIQUE ] INDEX name  ON table 
  [ USING method ] [ [ENABLE] WRITE [ACCESS] ]
  ( { column | ( expression ) } [ opclass ] [, ...] )
  [ WITH ( storage_parameter = value [, ... ] ) ]
  [ TABLESPACE tablespace ]
  [ WHERE predicate ]
 
  This is clear, and adds no new keywords.
 

 
 It's not particularly clear to me. On its face this seems to me to imply 
 something about how the index will be able to be used, not about how it 
 is to be built.

Yea, that was always a confusion.  CREATE CONCURRENT INDEX has the same
problem.  We need something that talks about the build-time behavior. 

How about NOWAIT?

  CREATE [ UNIQUE ] INDEX name  ON table 
  [ USING method ] [ NOWAIT ]
  ( { column | ( expression ) } [ opclass ] [, ...] )
  [ WITH ( storage_parameter = value [, ... ] ) ]
  [ TABLESPACE tablespace ]
  [ WHERE predicate ]



-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Online index builds

2006-08-01 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
   CREATE [ UNIQUE ] INDEX name  ON table 
   [ USING method ] [ [ENABLE] WRITE [ACCESS] ]
   ( { column | ( expression ) } [ opclass ] [, ...] )
   [ WITH ( storage_parameter = value [, ... ] ) ]
   [ TABLESPACE tablespace ]
   [ WHERE predicate ]

 This is clear, and adds no new keywords.

... and is very very painful for psql to parse.  If you want this,
*you* write the patch for command_no_begin().  Be sure to handle
quoted and schema-qualified identifiers properly.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Online index builds

2006-08-01 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  CREATE [ UNIQUE ] INDEX name  ON table 
  [ USING method ] [ [ENABLE] WRITE [ACCESS] ]
  ( { column | ( expression ) } [ opclass ] [, ...] )
  [ WITH ( storage_parameter = value [, ... ] ) ]
  [ TABLESPACE tablespace ]
  [ WHERE predicate ]
 
  This is clear, and adds no new keywords.
 
 ... and is very very painful for psql to parse.  If you want this,
 *you* write the patch for command_no_begin().  Be sure to handle
 quoted and schema-qualified identifiers properly.

I was going to use our tab-completion code, which I think handles all
these.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Online index builds

2006-08-01 Thread Alvaro Herrera
Bruce Momjian wrote:
 Andrew Dunstan wrote:

  It's not particularly clear to me. On its face this seems to me to imply 
  something about how the index will be able to be used, not about how it 
  is to be built.
 
 Yea, that was always a confusion.  CREATE CONCURRENT INDEX has the same
 problem.  We need something that talks about the build-time behavior. 

How about

CREATE INDEX CONCURRENTLY?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Online index builds

2006-08-01 Thread Bruce Momjian
Alvaro Herrera wrote:
 Bruce Momjian wrote:
  Andrew Dunstan wrote:
 
   It's not particularly clear to me. On its face this seems to me to imply 
   something about how the index will be able to be used, not about how it 
   is to be built.
  
  Yea, that was always a confusion.  CREATE CONCURRENT INDEX has the same
  problem.  We need something that talks about the build-time behavior. 
 
 How about
 
 CREATE INDEX CONCURRENTLY?

Yea, that actually sounds pretty good.  Though a new keyword, it might
be the best solution.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Online index builds

2006-07-19 Thread Jim C. Nasby
On Sat, Jul 15, 2006 at 09:10:46PM -0400, Greg Stark wrote:
 
 Hannu Krosing [EMAIL PROTECTED] writes:
 
  Maybe we can show progress indicators in status line (either
  pg_stat_activity.current_query or commandline shown in ps), like 
  
  WAITING TO START PHASE 1 - WAITING FOR TRANSACTION XXX TO COMPLETE 
  
  or 
  
  INSERTING INDEX ENTRY N OF M
  
  changing every few seconds.
 
 Hm. That would be very interesting. I'll say that one of the things that
 impressed me very much with Postgres moving from Oracle was the focus on
 usability. Progress indicators would be excellent for a lot of operations.
 
 That said I'm not sure how much I can do here. For a substantial index we
 should expect most of the time will be spent in the tuplesort. It's hard to
 see how to get any sort of progress indicator out of there and as long as we
 can't it's hard to see the point of getting one during the heap scan or any of
 the other i/o operations.

I'd love to have any kind of progress indication for any sorts that
spill to disk, and there's any number of other long-running operations
where progress info would be very welcome. I certainly wouldn't let lack
of a progress indicator for sorts prevent you from adding one. I like
the idea of periodically updating both current_query and the commandline
that ps shows.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Online index builds

2006-07-16 Thread Simon Riggs
On Sat, 2006-07-15 at 21:10 -0400, Greg Stark wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:

  Another related thing - throttling
  --
  
  Did you do any work on using vacuum_cost_* GUC vars to throttle the
  build process if desired ? 
 
 Actually no. While there is consensus that will be necessary I'm not sure I
 can do it with this patch. The problem is that most of the real heavy lifting
 here is done inside tuplesort. Even aside from that most of what's left is
 inside bulkdelete(*) and the code that handles regular index builds.
 
 So I think we'll need some global thinking about what options Postgres needs
 to control throttling in general. And probably someone needs to write a
 separate patch that adds all the hooks to the various places in a single go.
 Trying to throttle just one operation at a time when a lot of the code that
 implements these operations is shared will have us running in circles.
 
 (*) Hm. Come to think of it I wonder if the vacuum_cost parameters are already
 kicking in for this phase. That would be a bit strange since it's the fastest
 of the three scans.

Separate patches sounds more sensible. Using statement_cost_* in many
places sounds useful to me and not too hard to get into 8.2

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Online index builds

2006-07-16 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-07-15 kell 21:10, kirjutas Greg Stark:
 Hannu Krosing [EMAIL PROTECTED] writes:
 
  Maybe we can show progress indicators in status line (either
  pg_stat_activity.current_query or commandline shown in ps), like 
  
  WAITING TO START PHASE 1 - WAITING FOR TRANSACTION XXX TO COMPLETE 
  
  or 
  
  INSERTING INDEX ENTRY N OF M
  
  changing every few seconds.
 
 Hm. That would be very interesting. I'll say that one of the things that
 impressed me very much with Postgres moving from Oracle was the focus on
 usability. Progress indicators would be excellent for a lot of operations.
 
 That said I'm not sure how much I can do here. For a substantial index we
 should expect most of the time will be spent in the tuplesort.

If index is substantially bigger than shared memory/available RAM then
tuple by tuple index insertion part can also be quite slow and i/o
consuming even if the number of entries to insert is low compared to
total index size. That is where INSERTING INDEX ENTRY N OF M would be
a thing which helps the DBA to estimate time to completion.

 It's hard to
 see how to get any sort of progress indicator out of there and as long as we
 can't it's hard to see the point of getting one during the heap scan or any of
 the other i/o operations.

It would be useful to know, in what step we are in general , that is are
we doing some useful work or just waiting for something.


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Online index builds

2006-07-16 Thread Robert Treat
On Saturday 15 July 2006 21:37, Joshua D. Drake wrote:
  That said I'm not sure how much I can do here. For a substantial index we
  should expect most of the time will be spent in the tuplesort. It's hard
  to see how to get any sort of progress indicator out of there and as long
  as we can't it's hard to see the point of getting one during the heap
  scan or any of the other i/o operations.

 Well from a DBA perspective, just knowing that something productive is
 happening is useful.

 When using vacuum I almost always use vacuum verbose, just so I have an
 idea of what is going on.

+1

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Online index builds

2006-07-15 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-07-13 kell 01:07, kirjutas Greg Stark:
 Simon Riggs [EMAIL PROTECTED] writes:
 
  On Wed, 2006-07-12 at 12:09 -0400, Greg Stark wrote:
   no regression tests yet.
  
  We'll need some performance tests that show that lock-hold time is
  *actually* reduced, given the shenanigans needed to get there.
 
 I'm not sure what you mean by lock-hold time. Online index builds
 effectively take *no* locks in the user-visible sense that regular index
 builds do. Other transactions can insert, update, delete continuously
 throughout the entire process.
 
 The only locks that are taken are
 
 1) a ShareUpdateExclusiveLock which blocks vacuum from running on the table
being indexed. This is taken by both phase 1 and phase 2. (Actually I had
the wrong lock in the patch I emailed in one place. Fixed in my source tree
here)
 
 2) An ExclusiveLock that is taken momentarily and immediately released. Even
if that can never be acquired due to a busy system it can eventually
proceed anyways as long as there are no long-running transactions that are
refusing to commit.
 
 That said we do need some performance tests to get an idea how long phase 2
 takes for large tables. The additional index and heap scan and tid sort could
 take a substantial amount of time though never as long as the original index
 build done in phase 1.

Maybe we can show progress indicators in status line (either
pg_stat_activity.current_query or commandline shown in ps), like 

WAITING TO START PHASE 1 - WAITING FOR TRANSACTION XXX TO COMPLETE 

or 

INSERTING INDEX ENTRY N OF M

changing every few seconds.

that could give the sysadmin some idea of what is going on without too
much verbosity on console.

there could of course be a VERBOSE mode, which acts similar to VACUUM
VERBOSE.

And why not make t possible to add a verbosity level there as well:
'CREATE INDEX ... VEBOSE 3'. And to VACUUM VERBOSE as well 

At level 3 all status changes could also be sent to client as well.

 What's worse is that in some cases the merge could potentially be doing a lot
 of retail index inserts. I have no good intuition for how long those will take
 relative to the wholesale index build method, especially since for some index
 methods like GIN retail inserts are extremely expensive.
 
 So for indexes that don't have a lot of records that need to be inserted
 individually what I expect -- and what I put in the docs -- is something under
 100% time penalty for an online index build. In fact I expect it to be more
 like 50% though it depends on how wide the original index. For ones that do
 have lots of records mutated for phase 2 all bets are off.

the only quarantee seems to be, that if there are still some resources
left, the index build will eventualy complete.

but showing progress will let the DBA to make the decision to abort the
build if he sees that it takes too long.


Another related thing - throttling
--

Did you do any work on using vacuum_cost_* GUC vars to throttle the
build process if desired ? 

I guess that the initial seqscans are probably cheap enough ( or at
least they are no worse than if someone just did select * ...
concurrently), but the index merge can probably still not be as light on
OLTP queries as desirable.

In reality on OLTP dbs even SELECT COUNT(*) can sometimes be a
mainenance operation and thus the goal may not be to complete as fast
as possible, but to be as light as possible on concurrent OLTP queries.

Eventually it would nice to have special optimiser rules for any
maintenance queries and DDL ops as defined by DBA, but making use of
vacuum_cost_* vars for generic throttling would be a good first cut ;


 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Online index builds

2006-07-15 Thread Greg Stark

Hannu Krosing [EMAIL PROTECTED] writes:

 Maybe we can show progress indicators in status line (either
 pg_stat_activity.current_query or commandline shown in ps), like 
 
 WAITING TO START PHASE 1 - WAITING FOR TRANSACTION XXX TO COMPLETE 
 
 or 
 
 INSERTING INDEX ENTRY N OF M
 
 changing every few seconds.

Hm. That would be very interesting. I'll say that one of the things that
impressed me very much with Postgres moving from Oracle was the focus on
usability. Progress indicators would be excellent for a lot of operations.

That said I'm not sure how much I can do here. For a substantial index we
should expect most of the time will be spent in the tuplesort. It's hard to
see how to get any sort of progress indicator out of there and as long as we
can't it's hard to see the point of getting one during the heap scan or any of
the other i/o operations.

I think it does make sense to put something in current_query indicating when
it's waiting for transactions to end and when it's past that point. That's
something the DBA should be aware of.

 And why not make t possible to add a verbosity level there as well:
 'CREATE INDEX ... VEBOSE 3'. And to VACUUM VERBOSE as well 
 
 At level 3 all status changes could also be sent to client as well.

Wouldn't you just control this with log_min_messages? It seems unnecessary to
clutter the grammar for every command with verbose options.

 Another related thing - throttling
 --
 
 Did you do any work on using vacuum_cost_* GUC vars to throttle the
 build process if desired ? 

Actually no. While there is consensus that will be necessary I'm not sure I
can do it with this patch. The problem is that most of the real heavy lifting
here is done inside tuplesort. Even aside from that most of what's left is
inside bulkdelete(*) and the code that handles regular index builds.

So I think we'll need some global thinking about what options Postgres needs
to control throttling in general. And probably someone needs to write a
separate patch that adds all the hooks to the various places in a single go.
Trying to throttle just one operation at a time when a lot of the code that
implements these operations is shared will have us running in circles.

(*) Hm. Come to think of it I wonder if the vacuum_cost parameters are already
kicking in for this phase. That would be a bit strange since it's the fastest
of the three scans.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Online index builds

2006-07-15 Thread Joshua D. Drake



That said I'm not sure how much I can do here. For a substantial index we
should expect most of the time will be spent in the tuplesort. It's hard to
see how to get any sort of progress indicator out of there and as long as we
can't it's hard to see the point of getting one during the heap scan or any of
the other i/o operations.


Well from a DBA perspective, just knowing that something productive is 
happening is useful.


When using vacuum I almost always use vacuum verbose, just so I have an 
idea of what is going on.


Joshua D. Drake
--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Online index builds

2006-07-15 Thread Agent M
A great first step would be to add elog(INFO,...) in some standardized 
format over the wire so that clients can tell what's going on. It could 
be triggered by a GUC which is off by default.


-M

On Jul 15, 2006, at 9:10 PM, Greg Stark wrote:



Hannu Krosing [EMAIL PROTECTED] writes:


Maybe we can show progress indicators in status line (either
pg_stat_activity.current_query or commandline shown in ps), like

WAITING TO START PHASE 1 - WAITING FOR TRANSACTION XXX TO COMPLETE

or

INSERTING INDEX ENTRY N OF M

changing every few seconds.


Hm. That would be very interesting. I'll say that one of the things 
that
impressed me very much with Postgres moving from Oracle was the focus 
on
usability. Progress indicators would be excellent for a lot of 
operations.


That said I'm not sure how much I can do here. For a substantial index 
we
should expect most of the time will be spent in the tuplesort. It's 
hard to
see how to get any sort of progress indicator out of there and as long 
as we
can't it's hard to see the point of getting one during the heap scan 
or any of

the other i/o operations.

I think it does make sense to put something in current_query 
indicating when
it's waiting for transactions to end and when it's past that point. 
That's

something the DBA should be aware of.


¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬
AgentM
[EMAIL PROTECTED]
¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Online index builds

2006-07-12 Thread Greg Stark

I just sent in the patch for online index builds to -patches.

. The work to combine the two phases into a single non-transactional command
  is done. I'm not sure how long to wait between lock checks or how verbose to
  be about why it's taking so long. I do think we have to print something or
  else the DBA won't know if it's hung waiting for something external.
  Currently it prints a notice the first time it sleeps. 

. Also it prints out how many tuples it found which normal index doesn't.
  Probably that message should go away. On the other hand the index stats
  probably need to be filled in.

. I need to check what locks I'm taking. I think I still have some old code
  with the wrong locks in it.

. this includes the tid btree opclass sent earlier (with a warning I didn't
  notice before fixed up). opr_sanity now fails but I think that's due to the
  gin commits not this opclass.

. In case of an error during phase2 the invalid index is left in place. It can
  be dropped with DROP INDEX. The footwork to get it dropped in case of an
  error would be quite tricky but there's a sketch of how to do it in the 
source.

. no documentation yet, there's not much to write though.

. no regression tests yet. I don't see any way to test this reasonably in the
  regression tests. I've done some testing myself by building indexes while
  pgbench is running. Then I have to do index scans to see how many records
  are returned with index scans. It wouldn't be easy to automate and even if
  it were done it wouldn't really be all that great a test. The corner cases
  found during the development are pretty narrow and will be hard to reliably
  test.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Online index builds

2006-07-12 Thread Simon Riggs
On Wed, 2006-07-12 at 12:09 -0400, Greg Stark wrote:
 no regression tests yet.

We'll need some performance tests that show that lock-hold time is
*actually* reduced, given the shenanigans needed to get there.

We may need to have usage recommendations in the docs.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Online index builds

2006-07-12 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Wed, 2006-07-12 at 12:09 -0400, Greg Stark wrote:
 no regression tests yet.

 We'll need some performance tests that show that lock-hold time is
 *actually* reduced, given the shenanigans needed to get there.

Reducing lock hold time is not the point ... reducing the strength of
the lock at the cost of increased elapsed time is the point.

 We may need to have usage recommendations in the docs.

Agreed.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Online index builds

2006-07-12 Thread Greg Stark
Simon Riggs [EMAIL PROTECTED] writes:

 On Wed, 2006-07-12 at 12:09 -0400, Greg Stark wrote:
  no regression tests yet.
 
 We'll need some performance tests that show that lock-hold time is
 *actually* reduced, given the shenanigans needed to get there.

I'm not sure what you mean by lock-hold time. Online index builds
effectively take *no* locks in the user-visible sense that regular index
builds do. Other transactions can insert, update, delete continuously
throughout the entire process.

The only locks that are taken are

1) a ShareUpdateExclusiveLock which blocks vacuum from running on the table
   being indexed. This is taken by both phase 1 and phase 2. (Actually I had
   the wrong lock in the patch I emailed in one place. Fixed in my source tree
   here)

2) An ExclusiveLock that is taken momentarily and immediately released. Even
   if that can never be acquired due to a busy system it can eventually
   proceed anyways as long as there are no long-running transactions that are
   refusing to commit.

That said we do need some performance tests to get an idea how long phase 2
takes for large tables. The additional index and heap scan and tid sort could
take a substantial amount of time though never as long as the original index
build done in phase 1.

What's worse is that in some cases the merge could potentially be doing a lot
of retail index inserts. I have no good intuition for how long those will take
relative to the wholesale index build method, especially since for some index
methods like GIN retail inserts are extremely expensive.

So for indexes that don't have a lot of records that need to be inserted
individually what I expect -- and what I put in the docs -- is something under
100% time penalty for an online index build. In fact I expect it to be more
like 50% though it depends on how wide the original index. For ones that do
have lots of records mutated for phase 2 all bets are off.

 We may need to have usage recommendations in the docs.

I'm writing docs now. I'm trying to find a happy medium between explaining all
the issues and spamming the docs with lots of discussion. Right now what I
have is a single paragraph in the create_index man page that refers to the
Postgres manual where I list the issues in more depth.

I also still have to get some kind of regression tests. I don't think we have
any concurrent regression tests currently, do we? To thoroughly test it will
be quite hard. Some of the corner cases are extremely narrow or require very
particular types of transactions running with very specific timing.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend