Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-08 Thread Csaba Nagy
On Wed, 2005-12-07 at 19:36, Greg Stark wrote:
[snip]
 We periodically ran into problems with load spikes or other performance
 problems causing things to get very slow and stay slow for a while. Letting
 things settle out usually worked but occasionally we had to restart the whole
 system to clear out the queue of requests.

Just as a personal opinion: I would love a REINDEX which does not block
reads/writes, even if writes will be more expensive while it's running.
There's always a period of time I can schedule the REINDEX so there's
very low write activity, but it is impossible to find a time slot when
there's NO write activity... and I think most of the real world
applications are like this. I think it's very rare that an application
is constantly getting high load, but most of them are constantly getting
SOME important activity which makes downtime hard to schedule. Now if
the slowdown of writes is not more than the acceptable service level,
then it is a very workable solution to schedule the REINDEX on a not so
busy time slot.

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-07 Thread Jochem van Dieten
On 12/6/05, Hannu Krosing wrote:

 1) run a transaction repeatedly, trying to hit a point of no concurrent
 transactions, encance the odds by locking out starting other
 transactions for a few (tenths or hundredths of) seconds, if it
 succeeds, record SNAP1, commit and and continue, else rollback, then
 sleep a little and retry.

Which locks can be released by committing here?


 2) build index on all rows inserted before SNAP1

 3) run a transaction repeatedly, trying to hit a point of no concurrent
 transactions by locking out other transactions for a few (tenths or
 hundredths of) seconds, if it succeeds, record SNAP2, mark index as
 visible for inserts, commit. now all new transactions see the index and
 use it when inserting new tuples.

 4) scan over table, add all tuples between SNAP1 and SNAP2 to index

You can not guarantee that every tuple inserted in the table will be
visible to SNAP 2 if you take SNAP2 before the commit of the
insert-only index has dropped below the global XMIN-horizon.


 5) mark index as usable for query plans

How about:

- begin transaction X1
   - insert all visible tuples in an index
   - mark index incomplete
- commit

- wait for X1 to become visible to all running transactions (X1 is
known from the XMIN in pg_class / pg_index)

- begin transaction X2
   - insert all missing tuples in index
   - mark index complete
- commit

Jochem

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-07 Thread Greg Stark

Hannu Krosing [EMAIL PROTECTED] writes:

  But that said, realistically *any* solution has to obtain a lock at some 
  time
  to make the schema change. I would say pretty much any O(1) (constant time)
  outage is at least somewhat acceptable as contrasted with the normal index
  build which locks out other writers for at least O(n lg n) time. Anything on
  the order of 100ms is probably as good as it gets here.
 
 For me any delay less than the client timeout is acceptable and anything
 more than that is not. N sec is ok, N+1 is not. It's as simple as that.

I don't think the client timeout is directly relevant here. If your client
timeout is 20s and you take 19s, how many requests have queued up behind you?
If you normally process requests in under 200ms and receive 10 requests per
second (handling at least 2 simultaneously) then you now have 190 requests
queued up. Those requests take resources and will slow down your server. If
they slow things down too much then you will start failing to meet your 200ms
deadline.

It's more likely that your system is engineered to use queueing and
simultaneous dispatch to deal with spikes in load up to a certain margin. Say
you know it can deal with spikes in load of up to 2x the regular rate. Then
you can deal with service outage of up to the 200ms deadline. If you can deal
with spikes of up to 4x the regular rate then you can deal with an outage of
up to 600ms. 

Moreover even if you had the extra resources available to handle a 19s backlog
of requests, how long would it take you to clear that backlog? If you have a
narrow headroom on meeting the deadline in the first place, and now you have
even less headroom because of the resources dedicated to the queue, it'll take
you a long time to clear the backlog.

We periodically ran into problems with load spikes or other performance
problems causing things to get very slow and stay slow for a while. Letting
things settle out usually worked but occasionally we had to restart the whole
system to clear out the queue of requests.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-07 Thread Hannu Krosing
Ühel kenal päeval, K, 2005-12-07 kell 13:36, kirjutas Greg Stark:
 Hannu Krosing [EMAIL PROTECTED] writes:
 
   But that said, realistically *any* solution has to obtain a lock at some 
   time
   to make the schema change. I would say pretty much any O(1) (constant 
   time)
   outage is at least somewhat acceptable as contrasted with the normal index
   build which locks out other writers for at least O(n lg n) time. Anything 
   on
   the order of 100ms is probably as good as it gets here.
  
  For me any delay less than the client timeout is acceptable and anything
  more than that is not. N sec is ok, N+1 is not. It's as simple as that.
 
 I don't think the client timeout is directly relevant here. 

It is relevant. It is the ultimate check of success or failure :)

 If your client
 timeout is 20s and you take 19s, how many requests have queued up behind you?
 If you normally process requests in under 200ms and receive 10 requests per
 second (handling at least 2 simultaneously) then you now have 190 requests
 queued up.

Again, I'm handling 20 to 200 simultaneously quite nicely.

 Those requests take resources and will slow down your server. If
 they slow things down too much then you will start failing to meet your 200ms
 deadline.

If I can't meet the deadline, I've got a problem. The rest is
implementation detail.

 It's more likely that your system is engineered to use queueing and
 simultaneous dispatch to deal with spikes in load up to a certain margin. Say
 you know it can deal with spikes in load of up to 2x the regular rate.

I know it can, just that the 3x spike lasts for 6 hours :P

 Then
 you can deal with service outage of up to the 200ms deadline. If you can deal
 with spikes of up to 4x the regular rate then you can deal with an outage of
 up to 600ms. 

Small local fluctuations happen all the time. As a rule of a thumb I
want to stay below 50% of resource usage on average for any noticable
period and will start looking for code optimisations or additional
hardware if this is crossed.

 Moreover even if you had the extra resources available to handle a 19s backlog
 of requests, how long would it take you to clear that backlog? If you have a
 narrow headroom on meeting the deadline in the first place, and now you have
 even less headroom because of the resources dedicated to the queue, it'll take
 you a long time to clear the backlog.

While it feels heroic to run at 90% capacity, it is not usually a good
policy. All kinds of unforeseen stuff happens all the time -
checkpoints, backups, vacuums, unexpected growth, system cronjobs, ...
With too little headroom you are screwed anyway.

What I am aiming at with this CONCURRENT CREATE INDEX proposal, is being
no more disruptive than other stuff that keeps happening anyway. That
would be the baseline. Anything better is definitely desirable, but
should not be a stopper for implementing the baseline functionality.

-
Hannu








---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)

2005-12-06 Thread Jochem van Dieten
On 12/5/05, Hannu Krosing wrote:

 Concurrent CREATE INDEX
 

 Concurrent index NDX1 on table TAB1 is created like this:

 1) start transaction. take a snapshot SNAP1

 1.1) optionally, remove pages for TAB1 from FSM to force (?) all newer
 inserts/updates to happen at end of table (won't work for in-page
 updates without code changes)

 2) create the index as we do now, but only for pages which are visible
 in SNAP1

 3) record the index in pg_class, but mark it as do not use for lookups
 in a new field. Take snapshot SNAP2. commit transaction.

What happens if another transaction takes a snapshot between SNAP2 and
the commit? Wouldn't you need a lock to guard against that? (Not that
I don't know if that is possible or desirable.)

Jochem

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)

2005-12-06 Thread Tom Lane
Jochem van Dieten [EMAIL PROTECTED] writes:
 On 12/5/05, Hannu Krosing wrote:
 3) record the index in pg_class, but mark it as do not use for lookups
 in a new field. Take snapshot SNAP2. commit transaction.

 What happens if another transaction takes a snapshot between SNAP2 and
 the commit? Wouldn't you need a lock to guard against that? (Not that
 I don't know if that is possible or desirable.)

It's worse than that, because an updating command that is already
running has already made its list of which indexes to update.  You can't
say commit and expect transactions already in flight to react
magically to the presence of the new index.  If you take a lock that
excludes writes, and then release that lock with your commit (lock
release actually happens after commit btw), then you can be sure that
subsequent write transactions will see your new index, because they take
their writer's lock before they inspect pg_index to see what indexes
they need to update.

Short of taking such a lock, you have a race condition.

There's another little problem: it's not clear that present in SNAP2
but not in SNAP1 has anything to do with the condition you need.  This
would exclude rows made by transactions still in progress as of SNAP2,
but you can't know whether such rows were made before or after your
commit of the index.  It doesn't do the right thing for deleted rows
either (deleted rows may still need to be entered into the index),
though perhaps you could fix that with a creative reinterpretation of
what present in a snap means.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-06 Thread Hannu Krosing
Ühel kenal päeval, T, 2005-12-06 kell 20:50, kirjutas Jochem van Dieten:
 On 12/5/05, Hannu Krosing wrote:
 
  Concurrent CREATE INDEX
  
 
  Concurrent index NDX1 on table TAB1 is created like this:
 
  1) start transaction. take a snapshot SNAP1
 
  1.1) optionally, remove pages for TAB1 from FSM to force (?) all newer
  inserts/updates to happen at end of table (won't work for in-page
  updates without code changes)
 
  2) create the index as we do now, but only for pages which are visible
  in SNAP1
 
  3) record the index in pg_class, but mark it as do not use for lookups
  in a new field. Take snapshot SNAP2. commit transaction.
 
 What happens if another transaction takes a snapshot between SNAP2 and
 the commit? 

I'm hoping there to be some clever way to circumvent (the effects) of
it. But I can't see it yet.

 Wouldn't you need a lock to guard against that? (Not that
 I don't know if that is possible or desirable.)

That may be needed. At least I hope it to be possible in a way that can
quarantee avoiding deadlocks.

What I have in mind would be something like this to get both SNAP2 and
commit between any transactions:

LOOP:
LOCK AGAINST STARTING NEW TRANSACTIONS
LOOP UP TO N SEC :
IF NO OTHER TRANSACTIONS: BREAK
ELSE: CONTINUE
IF NO OTHER TRANSACTIONS: BREAK
ELSE:
UNLOCK AGAINST STARTING NEW TRANSACTIONS
SLEEP N SEC
TAKE SNAP2
COMMIT (AND UNLOCK)


This will eventually succeed (given right values for N ) and will
quarantee that SNAP2 and COMMIT are atomic wrt other backends.


--
Hannu



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-06 Thread Hannu Krosing
Ühel kenal päeval, T, 2005-12-06 kell 15:12, kirjutas Tom Lane:
 Jochem van Dieten [EMAIL PROTECTED] writes:
  On 12/5/05, Hannu Krosing wrote:
  3) record the index in pg_class, but mark it as do not use for lookups
  in a new field. Take snapshot SNAP2. commit transaction.
 
  What happens if another transaction takes a snapshot between SNAP2 and
  the commit? Wouldn't you need a lock to guard against that? (Not that
  I don't know if that is possible or desirable.)
 
 It's worse than that, because an updating command that is already
 running has already made its list of which indexes to update.  You can't
 say commit and expect transactions already in flight to react
 magically to the presence of the new index.  If you take a lock that
 excludes writes, and then release that lock with your commit (lock
 release actually happens after commit btw),

Is it possible to release a lock without commit ?

  then you can be sure that
 subsequent write transactions will see your new index, because they take
 their writer's lock before they inspect pg_index to see what indexes
 they need to update.
 
 Short of taking such a lock, you have a race condition.
 
 There's another little problem: it's not clear that present in SNAP2
 but not in SNAP1 has anything to do with the condition you need.  This
 would exclude rows made by transactions still in progress as of SNAP2,
 but you can't know whether such rows were made before or after your
 commit of the index.  It doesn't do the right thing for deleted rows
 either (deleted rows may still need to be entered into the index),
 though perhaps you could fix that with a creative reinterpretation of
 what present in a snap means.

It seems that taking SNAP1 also needs to be fitted between any other
transactions (i.e no transaction can be running at the time) which can
hopefully be done as I outlined to my other answer to grandparent.

-
Hannu



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)

2005-12-06 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 What I have in mind would be something like this to get both SNAP2 and
 commit between any transactions:

 LOOP:
 LOCK AGAINST STARTING NEW TRANSACTIONS

I can hardly credit that let's block startup of ALL new transactions
is a more desirable restriction than let's block writers to the table
we wish to reindex.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)

2005-12-06 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 Is it possible to release a lock without commit ?

Yes, but I don't see where that helps you here.

(To do any of this, you'd need to use the same kluge VACUUM does to hold
selected locks across a series of transactions.  So in reality you'd
probably be thinking of committing a startup transaction and letting
some of the locks be released by that.)

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-06 Thread Hannu Krosing
Ühel kenal päeval, T, 2005-12-06 kell 15:38, kirjutas Tom Lane:
 Hannu Krosing [EMAIL PROTECTED] writes:
  What I have in mind would be something like this to get both SNAP2 and
  commit between any transactions:
 
  LOOP:
  LOCK AGAINST STARTING NEW TRANSACTIONS
 
 I can hardly credit that let's block startup of ALL new transactions
 is a more desirable restriction than let's block writers to the table
 we wish to reindex.

If the block is short-time (will be removed one way or other in a few
(tenths of) seconds, then this is much more desirable than blocking
writers for a few hours.

The scenario where concurrent create index command is be needed is 24/7
OLTP databases, which can't be taken down for maintenance. Usully they
can be arranged to tolerate postponing a few transactions for one
second.

--
Hannu



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-06 Thread Hannu Krosing
Ühel kenal päeval, T, 2005-12-06 kell 15:41, kirjutas Tom Lane:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Is it possible to release a lock without commit ?
 
 Yes, but I don't see where that helps you here.
 
 (To do any of this, you'd need to use the same kluge VACUUM does to hold
 selected locks across a series of transactions.  So in reality you'd
 probably be thinking of committing a startup transaction and letting
 some of the locks be released by that.)

Hmm, that sounds like an plan:

1) run a transaction repeatedly, trying to hit a point of no concurrent
transactions, encance the odds by locking out starting other
transactions for a few (tenths or hundredths of) seconds, if it
succeeds, record SNAP1, commit and and continue, else rollback, then
sleep a little and retry.

2) build index on all rows inserted before SNAP1

3) run a transaction repeatedly, trying to hit a point of no concurrent
transactions by locking out other transactions for a few (tenths or
hundredths of) seconds, if it succeeds, record SNAP2, mark index as
visible for inserts, commit. now all new transactions see the index and
use it when inserting new tuples.

4) scan over table, add all tuples between SNAP1 and SNAP2 to index 

5) mark index as usable for query plans


-
Hannu



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-06 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 1) run a transaction repeatedly, trying to hit a point of no concurrent
 transactions,

In the sort of 24x7 environment that people are arguing this is needed
for, it's entirely possible that that will *never* succeed.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-06 Thread Hannu Krosing
Ühel kenal päeval, T, 2005-12-06 kell 16:01, kirjutas Tom Lane:
 Hannu Krosing [EMAIL PROTECTED] writes:
  1) run a transaction repeatedly, trying to hit a point of no concurrent
  transactions,
 
 In the sort of 24x7 environment that people are arguing this is needed
 for, it's entirely possible that that will *never* succeed.

My OLTP transactions are usually 5-50ms in length. common sense tells
me, that if I disallow new transactions for 100ms, I am more than likely
to have waited for all existing ones to have finished and can do my 1 ms
of take snapshot + commit and let all the waiting transactions to
commence.

If the database is running longer transactions, there can be a GUC to
adjust the time CUNCURRENT CREATE INDEX will wait. For example for
trx'es mostly in 0.5-2 sec range the wait could be set to 3 sec.

-
Hannu


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)

2005-12-06 Thread Greg Stark
Hannu Krosing [EMAIL PROTECTED] writes:

 The scenario where concurrent create index command is be needed is 24/7
 OLTP databases, which can't be taken down for maintenance. Usully they
 can be arranged to tolerate postponing a few transactions for one
 second.

Well, the dominant defining characteristic of OLTP is precisely that you do
*not* have under your control the timing requirements and can't make such
arrangements. That is, you have to process requests as fast as they come in
whatever that might be.

But that said, realistically *any* solution has to obtain a lock at some time
to make the schema change. I would say pretty much any O(1) (constant time)
outage is at least somewhat acceptable as contrasted with the normal index
build which locks out other writers for at least O(n lg n) time. Anything on
the order of 100ms is probably as good as it gets here.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-06 Thread Hannu Krosing
Ühel kenal päeval, T, 2005-12-06 kell 19:32, kirjutas Greg Stark:
 Hannu Krosing [EMAIL PROTECTED] writes:
 
  The scenario where concurrent create index command is be needed is 24/7
  OLTP databases, which can't be taken down for maintenance. Usully they
  can be arranged to tolerate postponing a few transactions for one
  second.
 
 Well, the dominant defining characteristic of OLTP is precisely that you do
 *not* have under your control the timing requirements and can't make such
 arrangements. That is, you have to process requests as fast as they come in
 whatever that might be.

While as fast as possible is a good goal when designing and optimising
a DB engine proper, you never need to design a real system to a spec as
fast as possible but rather to some given expected performance.

For me a 24/7 OLTP is more like a Real Time system, where all queries
have to be processed in not more than a certain time v.s. as fast as
possible. There as fast as possible is a secondary goal, a lot less
important than meeting the deadlines.

For example one real db processes requests usually in 50-200ms, but the
maximum the client is prepared to wait is set to 20 sec. Anything longer
than that and the bells start ringing.

 But that said, realistically *any* solution has to obtain a lock at some time
 to make the schema change. I would say pretty much any O(1) (constant time)
 outage is at least somewhat acceptable as contrasted with the normal index
 build which locks out other writers for at least O(n lg n) time. Anything on
 the order of 100ms is probably as good as it gets here.

For me any delay less than the client timeout is acceptable and anything
more than that is not. N sec is ok, N+1 is not. It's as simple as that.

And if the CREATE INDEX takes 2 weeks in order to let other OLTP
processing go on uninterrupted then it is completely OK. I can afford to
set the deadline for it accordingly. 

Thinking of it, maybe concurrent CREATE INDEX should also honour
vacuum_cost_* GUC's and throttle its progress accordingly in order to
not starve others on IO/CPU .


Hannu



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)

2005-12-05 Thread Hannu Krosing
Ühel kenal päeval, R, 2005-12-02 kell 02:14, kirjutas Tom Lane:
 Greg Stark [EMAIL PROTECTED] writes:
  It was a *major* new feature that many people were waiting for when Oracle
  finally implemented live CREATE INDEX and REINDEX. The ability to run create
  an index without blocking any operations on a table, even updates, was
  absolutely critical for 24x7 operation.
 
 Well, we're still not in *that* ballpark and I haven't seen any serious
 proposals to make us so.  How absolutely critical is it really?
 Is REINDEX-in-parallel-with-reads-but-not-writes, which is what we
 actually have at the moment, an absolutely critical facility?

I don't think REINDEX to be very critical (exept if for some reason you
have failed to vacuum a high-traffic table for some time and need to get
index sizes down).

OTOH, being able to add indexes on live database is sometimes required,
and neither of the current ways ( accept a few hours of downtime or use
slony relica and do a swithcover) are always acceptable. This capability
is reportedly present in MSSQL and available for Oracle if you get the
more expensive Enetrprise Edition.


So, after more thinking, I have come up with a proposal for fully
concurrent (read+write) create index, which should need minimal amount
of locking.

Concurrent CREATE INDEX 


Concurrent index NDX1 on table TAB1 is created like this:

1) start transaction. take a snapshot SNAP1

1.1) optionally, remove pages for TAB1 from FSM to force (?) all newer
inserts/updates to happen at end of table (won't work for in-page
updates without code changes) 

2) create the index as we do now, but only for pages which are visible
in SNAP1

3) record the index in pg_class, but mark it as do not use for lookups
in a new field. Take snapshot SNAP2. commit transaction.

-- at this point all new inserts and updates will be recorded in NDX1

4) Run a full scan over TAB1 and add all rows that are visible in SNAP2
but not in SNAP1 to NDX1. (if there is some way (like p1.1) to restrict
or record the area in heap that new tuples go to, then this can be done
more efficiently than full scan)

5) record the status of index as ready for use.

-- now the index is fully created and usable


This is in no way a final proposal, but rather starting point for
discussion of how things might be doable. For example p.3 is probably
tricky to do in a way that all backends pick up at the right time. This
will need most places that do table updates to be reviewed to make sure
that they check for new indexes.

Any comments are appreciated.

-
Hannu Krosing






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly