Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
On Wed, 2005-12-07 at 19:36, Greg Stark wrote: [snip] We periodically ran into problems with load spikes or other performance problems causing things to get very slow and stay slow for a while. Letting things settle out usually worked but occasionally we had to restart the whole system to clear out the queue of requests. Just as a personal opinion: I would love a REINDEX which does not block reads/writes, even if writes will be more expensive while it's running. There's always a period of time I can schedule the REINDEX so there's very low write activity, but it is impossible to find a time slot when there's NO write activity... and I think most of the real world applications are like this. I think it's very rare that an application is constantly getting high load, but most of them are constantly getting SOME important activity which makes downtime hard to schedule. Now if the slowdown of writes is not more than the acceptable service level, then it is a very workable solution to schedule the REINDEX on a not so busy time slot. Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
On 12/6/05, Hannu Krosing wrote: 1) run a transaction repeatedly, trying to hit a point of no concurrent transactions, encance the odds by locking out starting other transactions for a few (tenths or hundredths of) seconds, if it succeeds, record SNAP1, commit and and continue, else rollback, then sleep a little and retry. Which locks can be released by committing here? 2) build index on all rows inserted before SNAP1 3) run a transaction repeatedly, trying to hit a point of no concurrent transactions by locking out other transactions for a few (tenths or hundredths of) seconds, if it succeeds, record SNAP2, mark index as visible for inserts, commit. now all new transactions see the index and use it when inserting new tuples. 4) scan over table, add all tuples between SNAP1 and SNAP2 to index You can not guarantee that every tuple inserted in the table will be visible to SNAP 2 if you take SNAP2 before the commit of the insert-only index has dropped below the global XMIN-horizon. 5) mark index as usable for query plans How about: - begin transaction X1 - insert all visible tuples in an index - mark index incomplete - commit - wait for X1 to become visible to all running transactions (X1 is known from the XMIN in pg_class / pg_index) - begin transaction X2 - insert all missing tuples in index - mark index complete - commit Jochem ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Hannu Krosing [EMAIL PROTECTED] writes: But that said, realistically *any* solution has to obtain a lock at some time to make the schema change. I would say pretty much any O(1) (constant time) outage is at least somewhat acceptable as contrasted with the normal index build which locks out other writers for at least O(n lg n) time. Anything on the order of 100ms is probably as good as it gets here. For me any delay less than the client timeout is acceptable and anything more than that is not. N sec is ok, N+1 is not. It's as simple as that. I don't think the client timeout is directly relevant here. If your client timeout is 20s and you take 19s, how many requests have queued up behind you? If you normally process requests in under 200ms and receive 10 requests per second (handling at least 2 simultaneously) then you now have 190 requests queued up. Those requests take resources and will slow down your server. If they slow things down too much then you will start failing to meet your 200ms deadline. It's more likely that your system is engineered to use queueing and simultaneous dispatch to deal with spikes in load up to a certain margin. Say you know it can deal with spikes in load of up to 2x the regular rate. Then you can deal with service outage of up to the 200ms deadline. If you can deal with spikes of up to 4x the regular rate then you can deal with an outage of up to 600ms. Moreover even if you had the extra resources available to handle a 19s backlog of requests, how long would it take you to clear that backlog? If you have a narrow headroom on meeting the deadline in the first place, and now you have even less headroom because of the resources dedicated to the queue, it'll take you a long time to clear the backlog. We periodically ran into problems with load spikes or other performance problems causing things to get very slow and stay slow for a while. Letting things settle out usually worked but occasionally we had to restart the whole system to clear out the queue of requests. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Ühel kenal päeval, K, 2005-12-07 kell 13:36, kirjutas Greg Stark: Hannu Krosing [EMAIL PROTECTED] writes: But that said, realistically *any* solution has to obtain a lock at some time to make the schema change. I would say pretty much any O(1) (constant time) outage is at least somewhat acceptable as contrasted with the normal index build which locks out other writers for at least O(n lg n) time. Anything on the order of 100ms is probably as good as it gets here. For me any delay less than the client timeout is acceptable and anything more than that is not. N sec is ok, N+1 is not. It's as simple as that. I don't think the client timeout is directly relevant here. It is relevant. It is the ultimate check of success or failure :) If your client timeout is 20s and you take 19s, how many requests have queued up behind you? If you normally process requests in under 200ms and receive 10 requests per second (handling at least 2 simultaneously) then you now have 190 requests queued up. Again, I'm handling 20 to 200 simultaneously quite nicely. Those requests take resources and will slow down your server. If they slow things down too much then you will start failing to meet your 200ms deadline. If I can't meet the deadline, I've got a problem. The rest is implementation detail. It's more likely that your system is engineered to use queueing and simultaneous dispatch to deal with spikes in load up to a certain margin. Say you know it can deal with spikes in load of up to 2x the regular rate. I know it can, just that the 3x spike lasts for 6 hours :P Then you can deal with service outage of up to the 200ms deadline. If you can deal with spikes of up to 4x the regular rate then you can deal with an outage of up to 600ms. Small local fluctuations happen all the time. As a rule of a thumb I want to stay below 50% of resource usage on average for any noticable period and will start looking for code optimisations or additional hardware if this is crossed. Moreover even if you had the extra resources available to handle a 19s backlog of requests, how long would it take you to clear that backlog? If you have a narrow headroom on meeting the deadline in the first place, and now you have even less headroom because of the resources dedicated to the queue, it'll take you a long time to clear the backlog. While it feels heroic to run at 90% capacity, it is not usually a good policy. All kinds of unforeseen stuff happens all the time - checkpoints, backups, vacuums, unexpected growth, system cronjobs, ... With too little headroom you are screwed anyway. What I am aiming at with this CONCURRENT CREATE INDEX proposal, is being no more disruptive than other stuff that keeps happening anyway. That would be the baseline. Anything better is definitely desirable, but should not be a stopper for implementing the baseline functionality. - Hannu ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)
On 12/5/05, Hannu Krosing wrote: Concurrent CREATE INDEX Concurrent index NDX1 on table TAB1 is created like this: 1) start transaction. take a snapshot SNAP1 1.1) optionally, remove pages for TAB1 from FSM to force (?) all newer inserts/updates to happen at end of table (won't work for in-page updates without code changes) 2) create the index as we do now, but only for pages which are visible in SNAP1 3) record the index in pg_class, but mark it as do not use for lookups in a new field. Take snapshot SNAP2. commit transaction. What happens if another transaction takes a snapshot between SNAP2 and the commit? Wouldn't you need a lock to guard against that? (Not that I don't know if that is possible or desirable.) Jochem ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)
Jochem van Dieten [EMAIL PROTECTED] writes: On 12/5/05, Hannu Krosing wrote: 3) record the index in pg_class, but mark it as do not use for lookups in a new field. Take snapshot SNAP2. commit transaction. What happens if another transaction takes a snapshot between SNAP2 and the commit? Wouldn't you need a lock to guard against that? (Not that I don't know if that is possible or desirable.) It's worse than that, because an updating command that is already running has already made its list of which indexes to update. You can't say commit and expect transactions already in flight to react magically to the presence of the new index. If you take a lock that excludes writes, and then release that lock with your commit (lock release actually happens after commit btw), then you can be sure that subsequent write transactions will see your new index, because they take their writer's lock before they inspect pg_index to see what indexes they need to update. Short of taking such a lock, you have a race condition. There's another little problem: it's not clear that present in SNAP2 but not in SNAP1 has anything to do with the condition you need. This would exclude rows made by transactions still in progress as of SNAP2, but you can't know whether such rows were made before or after your commit of the index. It doesn't do the right thing for deleted rows either (deleted rows may still need to be entered into the index), though perhaps you could fix that with a creative reinterpretation of what present in a snap means. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Ühel kenal päeval, T, 2005-12-06 kell 20:50, kirjutas Jochem van Dieten: On 12/5/05, Hannu Krosing wrote: Concurrent CREATE INDEX Concurrent index NDX1 on table TAB1 is created like this: 1) start transaction. take a snapshot SNAP1 1.1) optionally, remove pages for TAB1 from FSM to force (?) all newer inserts/updates to happen at end of table (won't work for in-page updates without code changes) 2) create the index as we do now, but only for pages which are visible in SNAP1 3) record the index in pg_class, but mark it as do not use for lookups in a new field. Take snapshot SNAP2. commit transaction. What happens if another transaction takes a snapshot between SNAP2 and the commit? I'm hoping there to be some clever way to circumvent (the effects) of it. But I can't see it yet. Wouldn't you need a lock to guard against that? (Not that I don't know if that is possible or desirable.) That may be needed. At least I hope it to be possible in a way that can quarantee avoiding deadlocks. What I have in mind would be something like this to get both SNAP2 and commit between any transactions: LOOP: LOCK AGAINST STARTING NEW TRANSACTIONS LOOP UP TO N SEC : IF NO OTHER TRANSACTIONS: BREAK ELSE: CONTINUE IF NO OTHER TRANSACTIONS: BREAK ELSE: UNLOCK AGAINST STARTING NEW TRANSACTIONS SLEEP N SEC TAKE SNAP2 COMMIT (AND UNLOCK) This will eventually succeed (given right values for N ) and will quarantee that SNAP2 and COMMIT are atomic wrt other backends. -- Hannu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Ühel kenal päeval, T, 2005-12-06 kell 15:12, kirjutas Tom Lane: Jochem van Dieten [EMAIL PROTECTED] writes: On 12/5/05, Hannu Krosing wrote: 3) record the index in pg_class, but mark it as do not use for lookups in a new field. Take snapshot SNAP2. commit transaction. What happens if another transaction takes a snapshot between SNAP2 and the commit? Wouldn't you need a lock to guard against that? (Not that I don't know if that is possible or desirable.) It's worse than that, because an updating command that is already running has already made its list of which indexes to update. You can't say commit and expect transactions already in flight to react magically to the presence of the new index. If you take a lock that excludes writes, and then release that lock with your commit (lock release actually happens after commit btw), Is it possible to release a lock without commit ? then you can be sure that subsequent write transactions will see your new index, because they take their writer's lock before they inspect pg_index to see what indexes they need to update. Short of taking such a lock, you have a race condition. There's another little problem: it's not clear that present in SNAP2 but not in SNAP1 has anything to do with the condition you need. This would exclude rows made by transactions still in progress as of SNAP2, but you can't know whether such rows were made before or after your commit of the index. It doesn't do the right thing for deleted rows either (deleted rows may still need to be entered into the index), though perhaps you could fix that with a creative reinterpretation of what present in a snap means. It seems that taking SNAP1 also needs to be fitted between any other transactions (i.e no transaction can be running at the time) which can hopefully be done as I outlined to my other answer to grandparent. - Hannu ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)
Hannu Krosing [EMAIL PROTECTED] writes: What I have in mind would be something like this to get both SNAP2 and commit between any transactions: LOOP: LOCK AGAINST STARTING NEW TRANSACTIONS I can hardly credit that let's block startup of ALL new transactions is a more desirable restriction than let's block writers to the table we wish to reindex. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)
Hannu Krosing [EMAIL PROTECTED] writes: Is it possible to release a lock without commit ? Yes, but I don't see where that helps you here. (To do any of this, you'd need to use the same kluge VACUUM does to hold selected locks across a series of transactions. So in reality you'd probably be thinking of committing a startup transaction and letting some of the locks be released by that.) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Ühel kenal päeval, T, 2005-12-06 kell 15:38, kirjutas Tom Lane: Hannu Krosing [EMAIL PROTECTED] writes: What I have in mind would be something like this to get both SNAP2 and commit between any transactions: LOOP: LOCK AGAINST STARTING NEW TRANSACTIONS I can hardly credit that let's block startup of ALL new transactions is a more desirable restriction than let's block writers to the table we wish to reindex. If the block is short-time (will be removed one way or other in a few (tenths of) seconds, then this is much more desirable than blocking writers for a few hours. The scenario where concurrent create index command is be needed is 24/7 OLTP databases, which can't be taken down for maintenance. Usully they can be arranged to tolerate postponing a few transactions for one second. -- Hannu ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Ühel kenal päeval, T, 2005-12-06 kell 15:41, kirjutas Tom Lane: Hannu Krosing [EMAIL PROTECTED] writes: Is it possible to release a lock without commit ? Yes, but I don't see where that helps you here. (To do any of this, you'd need to use the same kluge VACUUM does to hold selected locks across a series of transactions. So in reality you'd probably be thinking of committing a startup transaction and letting some of the locks be released by that.) Hmm, that sounds like an plan: 1) run a transaction repeatedly, trying to hit a point of no concurrent transactions, encance the odds by locking out starting other transactions for a few (tenths or hundredths of) seconds, if it succeeds, record SNAP1, commit and and continue, else rollback, then sleep a little and retry. 2) build index on all rows inserted before SNAP1 3) run a transaction repeatedly, trying to hit a point of no concurrent transactions by locking out other transactions for a few (tenths or hundredths of) seconds, if it succeeds, record SNAP2, mark index as visible for inserts, commit. now all new transactions see the index and use it when inserting new tuples. 4) scan over table, add all tuples between SNAP1 and SNAP2 to index 5) mark index as usable for query plans - Hannu ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Hannu Krosing [EMAIL PROTECTED] writes: 1) run a transaction repeatedly, trying to hit a point of no concurrent transactions, In the sort of 24x7 environment that people are arguing this is needed for, it's entirely possible that that will *never* succeed. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Ühel kenal päeval, T, 2005-12-06 kell 16:01, kirjutas Tom Lane: Hannu Krosing [EMAIL PROTECTED] writes: 1) run a transaction repeatedly, trying to hit a point of no concurrent transactions, In the sort of 24x7 environment that people are arguing this is needed for, it's entirely possible that that will *never* succeed. My OLTP transactions are usually 5-50ms in length. common sense tells me, that if I disallow new transactions for 100ms, I am more than likely to have waited for all existing ones to have finished and can do my 1 ms of take snapshot + commit and let all the waiting transactions to commence. If the database is running longer transactions, there can be a GUC to adjust the time CUNCURRENT CREATE INDEX will wait. For example for trx'es mostly in 0.5-2 sec range the wait could be set to 3 sec. - Hannu ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)
Hannu Krosing [EMAIL PROTECTED] writes: The scenario where concurrent create index command is be needed is 24/7 OLTP databases, which can't be taken down for maintenance. Usully they can be arranged to tolerate postponing a few transactions for one second. Well, the dominant defining characteristic of OLTP is precisely that you do *not* have under your control the timing requirements and can't make such arrangements. That is, you have to process requests as fast as they come in whatever that might be. But that said, realistically *any* solution has to obtain a lock at some time to make the schema change. I would say pretty much any O(1) (constant time) outage is at least somewhat acceptable as contrasted with the normal index build which locks out other writers for at least O(n lg n) time. Anything on the order of 100ms is probably as good as it gets here. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing
Ühel kenal päeval, T, 2005-12-06 kell 19:32, kirjutas Greg Stark: Hannu Krosing [EMAIL PROTECTED] writes: The scenario where concurrent create index command is be needed is 24/7 OLTP databases, which can't be taken down for maintenance. Usully they can be arranged to tolerate postponing a few transactions for one second. Well, the dominant defining characteristic of OLTP is precisely that you do *not* have under your control the timing requirements and can't make such arrangements. That is, you have to process requests as fast as they come in whatever that might be. While as fast as possible is a good goal when designing and optimising a DB engine proper, you never need to design a real system to a spec as fast as possible but rather to some given expected performance. For me a 24/7 OLTP is more like a Real Time system, where all queries have to be processed in not more than a certain time v.s. as fast as possible. There as fast as possible is a secondary goal, a lot less important than meeting the deadlines. For example one real db processes requests usually in 50-200ms, but the maximum the client is prepared to wait is set to 20 sec. Anything longer than that and the bells start ringing. But that said, realistically *any* solution has to obtain a lock at some time to make the schema change. I would say pretty much any O(1) (constant time) outage is at least somewhat acceptable as contrasted with the normal index build which locks out other writers for at least O(n lg n) time. Anything on the order of 100ms is probably as good as it gets here. For me any delay less than the client timeout is acceptable and anything more than that is not. N sec is ok, N+1 is not. It's as simple as that. And if the CREATE INDEX takes 2 weeks in order to let other OLTP processing go on uninterrupted then it is completely OK. I can afford to set the deadline for it accordingly. Thinking of it, maybe concurrent CREATE INDEX should also honour vacuum_cost_* GUC's and throttle its progress accordingly in order to not starve others on IO/CPU . Hannu ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing relation locking overhead)
Ühel kenal päeval, R, 2005-12-02 kell 02:14, kirjutas Tom Lane: Greg Stark [EMAIL PROTECTED] writes: It was a *major* new feature that many people were waiting for when Oracle finally implemented live CREATE INDEX and REINDEX. The ability to run create an index without blocking any operations on a table, even updates, was absolutely critical for 24x7 operation. Well, we're still not in *that* ballpark and I haven't seen any serious proposals to make us so. How absolutely critical is it really? Is REINDEX-in-parallel-with-reads-but-not-writes, which is what we actually have at the moment, an absolutely critical facility? I don't think REINDEX to be very critical (exept if for some reason you have failed to vacuum a high-traffic table for some time and need to get index sizes down). OTOH, being able to add indexes on live database is sometimes required, and neither of the current ways ( accept a few hours of downtime or use slony relica and do a swithcover) are always acceptable. This capability is reportedly present in MSSQL and available for Oracle if you get the more expensive Enetrprise Edition. So, after more thinking, I have come up with a proposal for fully concurrent (read+write) create index, which should need minimal amount of locking. Concurrent CREATE INDEX Concurrent index NDX1 on table TAB1 is created like this: 1) start transaction. take a snapshot SNAP1 1.1) optionally, remove pages for TAB1 from FSM to force (?) all newer inserts/updates to happen at end of table (won't work for in-page updates without code changes) 2) create the index as we do now, but only for pages which are visible in SNAP1 3) record the index in pg_class, but mark it as do not use for lookups in a new field. Take snapshot SNAP2. commit transaction. -- at this point all new inserts and updates will be recorded in NDX1 4) Run a full scan over TAB1 and add all rows that are visible in SNAP2 but not in SNAP1 to NDX1. (if there is some way (like p1.1) to restrict or record the area in heap that new tuples go to, then this can be done more efficiently than full scan) 5) record the status of index as ready for use. -- now the index is fully created and usable This is in no way a final proposal, but rather starting point for discussion of how things might be doable. For example p.3 is probably tricky to do in a way that all backends pick up at the right time. This will need most places that do table updates to be reviewed to make sure that they check for new indexes. Any comments are appreciated. - Hannu Krosing ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly