Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-27 Thread Tom Lane
I wrote:
 I'm toying with the idea of adding a lock manager call defined as
 give me a list of XIDs that currently hold locks conflicting with
 lockmode X on object Y --- but not including XIDs merely waiting
 for such a lock.  Then we could get the list of XIDs currently blocking
 ExclusiveLock, and do XactLockTableWait on each one.

I've committed a patch along these lines.  I also reduced the strength
of the lock we check for from ExclusiveLock to ShareLock, which seems
sufficient --- did you have a reason for selecting ExclusiveLock in the
original coding?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-26 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Barring objections, I'm off to program this.

A few concerns

a) The use of ShareUpdateExclusiveLock is supposed to lock out concurrent
   vacuums. I just tried it and vacuum seemed to be unaffected. I'm going to
   retry it with a clean cvs checkout to be sure it isn't something in my
   local tree that's broken.

   Do we still need to block concurrent vacuums if we're using snapshots?
   Obviously we have to block them during phase 1 because it won't have a
   chance of removing the tuples from our private collection of index tuples
   that haven't been pushed live yet. But if phase 2 is ignoring tuples too
   new to be visible in its snapshot then it shouldn't care if dead tuples are
   deleted even if those slots are later reused.


b) You introduced a LockRelationIdForSession() call (I even didn't realize we
   had this capability when I wrote the patch). Does this introduce the
   possibility of a deadlock though? If one of the transactions we're waiting
   to finish has a shared lock on the relation and is waiting for an exclusive
   lock on the relation then it seems we'll wait forever for it to finish and
   never see either of our conditions for continuing. That would be fine
   except because we're waiting manually the deadlock detection code doesn't
   have a chance of firing.

   To solve that we would have to replace the pg_sleep call with a
   XactLockTableWait. But I'm not clear how to find a transaction id to wait
   on. What we would want to find is any transaction id that has an xmin older
   than our xmin. Even that isn't ideal since it wouldn't give us a chance to
   test our other out so if we choose a transaction to wait on that doesn't
   hold even a share lock on our table we could end up stuck longer than
   necessary (ie when we would have been able to momentarily acquire the
   exclusive lock on the table earlier).


c) It's a shame we don't support multiple concurrent concurrent index builds.
   We could create a ShareUpdateShareLock that conflicts with the same list of
   locks that ShareUpdateExclusiveLock conflicts with but not itself. From a
   UI point of view there's no excuse for not doing this, but from an
   implementation point of view there's a limit of 10 lock types and this
   would get up to 9. This is my first time looking at this code so I'm not
   sure how hard that limit is.

   One caveat is that the two jobs would see each other and that would make it
   hard for them to proceed to phase 2. I think what would happen is that the
   first one to finish phase 1 would be able to continue as soon as the other
   finishes phase 1. The second one would have to wait until the first one's
   phase 2 finished.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-26 Thread David Fetter
On Fri, Aug 25, 2006 at 08:02:21PM -0500, Jim C. Nasby wrote:
 On Fri, Aug 25, 2006 at 06:57:58PM +0100, Gregory Stark wrote:
  I'll use this opportunity to plug that feature again. I think most
  people should use autocommit off with on_error_rollack on for most
  of their daily use. Being able to double check the results of my
  ad-hoc updates before committing them saved me more headaches than
  I can count with Oracle.  Autocommit off only became practical for
  interactive use with postgres when savepoints showed up.
 
 +1

+1 here, too :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-26 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 A few concerns

 a) The use of ShareUpdateExclusiveLock is supposed to lock out concurrent
vacuums. I just tried it and vacuum seemed to be unaffected.

Can't reproduce such a problem here.

Do we still need to block concurrent vacuums if we're using snapshots?

That's an interesting question.  I don't think we need to lock out
vacuum in either pass as far as tuple removal goes: we are only
interested in tuples that are visible to a live transaction (ie ours)
and vacuum shouldn't remove them.  However one of the properties of
ShareUpdateExclusive is supposed to be that it locks out schema changes
... such as creation of a new index.  I wouldn't want to bet that vacuum
works correctly if a new index appears partway through its collection of
info about the relation.  I think we need to keep that lock so that
vacuum is safe against create index, rather than vice versa.

 b) You introduced a LockRelationIdForSession() call (I even didn't realize we
had this capability when I wrote the patch). Does this introduce the
possibility of a deadlock though?

Indeed, I had noticed this while testing your point (a).  I thought that
ConditionalLockRelation had enough smarts to grant itself a lock if
there would be a deadlock possibility otherwise, but it seems not to be
noticing.  We'll need to look into that.

 c) It's a shame we don't support multiple concurrent concurrent index builds.

I can't get excited about that.  The point of this facility is to build
indexes with minimal impact on production queries, and the I/O load
imposed by even one build is going to make that a bit questionable, let
alone multiple ones.  If you want to do some useful improvement on the
patch, look into making it throttle its I/O rate like vacuum can.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-26 Thread Tom Lane
I wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
 b) You introduced a LockRelationIdForSession() call (I even didn't realize we
 had this capability when I wrote the patch). Does this introduce the
 possibility of a deadlock though?

 Indeed, I had noticed this while testing your point (a).  I thought that
 ConditionalLockRelation had enough smarts to grant itself a lock if
 there would be a deadlock possibility otherwise, but it seems not to be
 noticing.  We'll need to look into that.

OK, the code I was remembering is the bit in ProcSleep to grant our proc
the lock if we can prove that deadlock detection would do so anyway.
It is probably possible to refactor things so that that's done before
we fail in the ConditionalLockAcquire case, but it would involve
uglifying ProcSleep's API even more.  And it doesn't sound like a
complete solution anyway --- the ProcSleep test is not capable of
detecting deadlocks involving more than one lock, and I'm not sure it
even intends to find all cases involving just one lock.  It was only
ever meant as a simple optimization to avoid a sleep in easy cases.

 To solve that we would have to replace the pg_sleep call with a
 XactLockTableWait. But I'm not clear how to find a transaction id to wait
 on.

I'm toying with the idea of adding a lock manager call defined as
give me a list of XIDs that currently hold locks conflicting with
lockmode X on object Y --- but not including XIDs merely waiting
for such a lock.  Then we could get the list of XIDs currently blocking
ExclusiveLock, and do XactLockTableWait on each one.  Once they are
all gone we are safe; we don't care about later acquirers of locks
because they must see the index.  This avoids both the arbitrary
pg_sleep and the need for a check on am I the oldest remaining XID;
and if we are in a deadlock situation it will be detected.

This looks like it should be easy enough to code (though I haven't
actually tried yet) ... do you see any logical flaws?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread stark
Tom Lane [EMAIL PROTECTED] writes:

 Greg Stark [EMAIL PROTECTED] writes:
 Hmmm. Or is that true. The problem may be somewhat easier since at least you
 can be sure every tuple in the heap is in the index. So if you see a
 DELETE_IN_PROGRESS either it *was* a constraint violation prior to the delete
 and failing is reasonable or it's an update in which case maybe it's possible
 to detect that they're part of the same chain?

 Unless we are willing to lock every single tuple while we insert it,
 this seems unfixable to me.  Without a lock, the tuple could become
 DELETE_IN_PROGRESS immediately after we look at it.

I think there's some confusion here. This above paragraph was taken from some
thoughts about Hannu's suggestion of having a separate ALTER INDEX SET UNIQUE
command. That command might have an advantage over CREATE INDEX CONCURRENTLY
because it knows the index is already complete; it doesn't have to worry about
potential conflicts with tuples that it will only find later in the scan.

Effectively this is equivalent to making CREATE UNIQUE INDEX CONCURRENTLY
three phases. The first two phases would be a regular CREATE INDEX
CONCURRENTLY and the third phase would be what ALTER INDEX SET UNIQUE does
which is scan the index and verify that it's unique.

ALTER INDEX SET UNIQUE would have to perform a similar two-transaction dance
though. It would have to set the index unique, wait until everyone has seen
the new constraint. Then verify that the property is indeed unique, possibly
rolling back the constraint creation if it's not.

That would make the whole process of creating a unique index quite long. On
the plus side it would be a useful command in itself. Doing an index scan
might be pretty slow but if the table is mostly clean of dead and recently
dead tuples it won't have to visit the heap much and should still be much
quicker than building a new index. And it would itself be a concurrent
command.

 Actually it's worse than that.  We could examine a tuple, see that
 it's good, include it in the uniqueness check.  Then someone updates
 the tuple and puts the new version near the end of the table.  By
 the time we reach that version, it could be committed good.  There
 is absolutely no way that we could notice an issue without applying
 extremely expensive tests to *every* apparently-good tuple.

I think ALTER INDEX SET UNIQUE would not have this problem. It would only have
to look at tuples using its own snapshot and see if there's a violation. If
there isn't a violation as of its own snapshot then it can be sure later
transactions will preserve this property since the index was always complete
and it waited after creating the constraint.

 [ thinks for a bit... ]  At least, it seems hopeless if we use
 SnapshotNow.  Does it help if we use a real snapshot?  I'm thinking
 pass 1 inserts exactly those tuples that are good according to a
 snap taken at its beginning, and then pass 2 considers only tuples
 that are good according to a snap taken at *its* beginning.  But
 having consumed no caffeine yet this morning, I'm not sure I can
 spot any flaws that might exist in this idea.

What about tuples that are inserted and committed in the window between the
two phases. Ie, they're RECENTLY_DEAD but not in phase2's snapshot.

Or do you mean we use SatisfiesVacuum to determine what to insert but
SatisfiesSnapshot to determine whether to check uniqueness?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Because of the way the AM API works changing how the initial heap scan works
 is a bit of a pain. It would require either having some global state or
 passing the concurrent flag through the AM methods or alternatively having a
 whole new AM method.

Yeah, I dealt with that by adding a 'concurrent build' flag to IndexInfo.
A bit grotty but it beats changing a lot of function signatures.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Gregory Stark

Do we want something like this? I just made this error myself so unless I'm
special (pauses for jokes) I imagine others would be prone to it as well.

I would normally be pretty leery of code like this but it seems unlikely
anyone would actually want an index named concurrently and the consequences
if you get it wrong in a production environment are pretty dire. We might even
consider making it an outright error.


--- gram.y  25 Aug 2006 10:14:17 +0100  2.558
+++ gram.y  25 Aug 2006 14:04:54 +0100  
@@ -56,6 +56,7 @@
 #include commands/defrem.h
 #include nodes/makefuncs.h
 #include parser/gramparse.h
+#include parser/scansup.h
 #include storage/lmgr.h
 #include utils/date.h
 #include utils/datetime.h
@@ -3653,6 +3654,12 @@
opt_definition OptTableSpace where_clause
{
IndexStmt *n = makeNode(IndexStmt);
+
+   if 
(!strcmp(downcase_truncate_identifier($4,20,false), concurrently))
+   ereport(WARNING,
+   
(errcode(ERRCODE_SYNTAX_ERROR),
+
errmsg(performing non-concurrent index build of index named 
\concurrently\)));
+
n-unique = $2;
n-concurrent = false;
n-idxname = $4;
-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Andrew Dunstan

Gregory Stark wrote:

Do we want something like this? I just made this error myself so unless I'm
special (pauses for jokes) I imagine others would be prone to it as well.

I would normally be pretty leery of code like this but it seems unlikely
anyone would actually want an index named concurrently and the consequences
if you get it wrong in a production environment are pretty dire. We might even
consider making it an outright error.


--- gram.y  25 Aug 2006 10:14:17 +0100  2.558
+++ gram.y  25 Aug 2006 14:04:54 +0100  
@@ -56,6 +56,7 @@
 #include commands/defrem.h
 #include nodes/makefuncs.h
 #include parser/gramparse.h
+#include parser/scansup.h
 #include storage/lmgr.h
 #include utils/date.h
 #include utils/datetime.h
@@ -3653,6 +3654,12 @@
opt_definition OptTableSpace where_clause
{
IndexStmt *n = makeNode(IndexStmt);
+
+   if 
(!strcmp(downcase_truncate_identifier($4,20,false), concurrently))
+   ereport(WARNING,
+   
(errcode(ERRCODE_SYNTAX_ERROR),
+errmsg(performing 
non-concurrent index build of index named \concurrently\)));
+
n-unique = $2;
n-concurrent = false;
n-idxname = $4;
  



I see we have:

CREATE index_opt_unique INDEX CONCURRENTLY index_name ...


which explains how this error occurs. But might it not be better to have 
this instead?


 CREATE CONCURRENTLY index_opt_unique INDEX index_name ...

Then ISTM no ambiguity could arise (and it's also closer to grammatical 
English, if that matters).


Just a thought

cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 I see we have:
  CREATE index_opt_unique INDEX CONCURRENTLY index_name ...
 which explains how this error occurs.

Maybe to you, but I'm still caffeine-deprived and don't exactly see what
it was that Greg mistyped.  AFAICS he'd have to type CONCURRENTLY twice
to get into a scenario where the proposed warning would fire.

 But might it not be better to have this instead?
   CREATE CONCURRENTLY index_opt_unique INDEX index_name ...

When I was fooling with gram.y I was thinking that actually

CREATE [UNIQUE] INDEX indexname [CONCURRENTLY] ...

would be the most grammatical thing.  But I can live with putting
it right after CREATE, too.  Or there was the proposal to put it
first:

[CONCURRENTLY] CREATE [UNIQUE] INDEX indexname ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Andrew Dunstan

Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  

I see we have:
 CREATE index_opt_unique INDEX CONCURRENTLY index_name ...
which explains how this error occurs.



Maybe to you, but I'm still caffeine-deprived and don't exactly see what
it was that Greg mistyped.  AFAICS he'd have to type CONCURRENTLY twice
to get into a scenario where the proposed warning would fire.

  


AAUI, he left off the index name so the first rule was matched rather 
than the second, with concurrently being the index name.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Stefan Kaltenbrunner
Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
 I see we have:
  CREATE index_opt_unique INDEX CONCURRENTLY index_name ...
 which explains how this error occurs.
 
 Maybe to you, but I'm still caffeine-deprived and don't exactly see what
 it was that Greg mistyped.  AFAICS he'd have to type CONCURRENTLY twice
 to get into a scenario where the proposed warning would fire.

i guess Greg is talking about something like(ie just forgetting to name
the index):


devel=# create index concurrently on foo ( a);
CREATE INDEX
devel=# \d foo
  Table public.foo
 Column |  Type   | Modifiers
+-+---
 a  | integer |
 b  | text|
 c  | integer |
 d  | text|
Indexes:
concurrently btree (a)


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Bruce Momjian
Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
  I see we have:
   CREATE index_opt_unique INDEX CONCURRENTLY index_name ...
  which explains how this error occurs.
 
 Maybe to you, but I'm still caffeine-deprived and don't exactly see what
 it was that Greg mistyped.  AFAICS he'd have to type CONCURRENTLY twice
 to get into a scenario where the proposed warning would fire.
 
  But might it not be better to have this instead?
CREATE CONCURRENTLY index_opt_unique INDEX index_name ...
 
 When I was fooling with gram.y I was thinking that actually
 
   CREATE [UNIQUE] INDEX indexname [CONCURRENTLY] ...
 
 would be the most grammatical thing.  But I can live with putting

The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
sounded like a different type of index, not a different way to build the
index.  I don't think CONCURRENTLY has that problem, so CREATE
CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
CREATE CONCURRENTLY, INDEX ii.

 it right after CREATE, too.  Or there was the proposal to put it
 first:
 
   [CONCURRENTLY] CREATE [UNIQUE] INDEX indexname ...

I think this suggested the command was CONCURRENTLY, which isn't good.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
 sounded like a different type of index, not a different way to build the
 index.  I don't think CONCURRENTLY has that problem, so CREATE
 CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
 CREATE CONCURRENTLY, INDEX ii.

OK, we've got two votes for that, so I'll make it so.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
 sounded like a different type of index, not a different way to build the
 index.  I don't think CONCURRENTLY has that problem, so CREATE
 CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
 CREATE CONCURRENTLY, INDEX ii.

That doesn't sound like English at all to me.

Fwiw, I think the best option was what Tom did. The gotcha I tripped on seems
pretty minor to me.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Bruce Momjian
Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
  sounded like a different type of index, not a different way to build the
  index.  I don't think CONCURRENTLY has that problem, so CREATE
  CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
  CREATE CONCURRENTLY, INDEX ii.
 
 That doesn't sound like English at all to me.
 
 Fwiw, I think the best option was what Tom did. The gotcha I tripped on seems
 pretty minor to me.

What bothers me about what we have now is that we have optional keywords
before and after INDEX, rather than only between CREATE and INDEX.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 What bothers me about what we have now is that we have optional keywords
 before and after INDEX, rather than only between CREATE and INDEX.

Yeah, putting them both into that space seems consistent to me, and
it will fix the problem of making an omitted index name look like
a valid command.

I'm not sure I should be opening this can of worms, but do we want to
use a different keyword than CONCURRENTLY to make it read better there?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Alvaro Herrera
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  What bothers me about what we have now is that we have optional keywords
  before and after INDEX, rather than only between CREATE and INDEX.
 
 Yeah, putting them both into that space seems consistent to me, and
 it will fix the problem of making an omitted index name look like
 a valid command.
 
 I'm not sure I should be opening this can of worms, but do we want to
 use a different keyword than CONCURRENTLY to make it read better there?

The problem is that what the qualifier is doing is modifying the
operation itself, not the properties of the index to be created, like
UNIQUE, which modifies the index.  So the qualifier should be something
that modifies the CREATE, that is, an adverb (AFAIK).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 The problem is that what the qualifier is doing is modifying the
 operation itself, not the properties of the index to be created, like
 UNIQUE, which modifies the index.

Right, which was the same point Bruce made earlier.  And really you
can't respect that difference while putting them into the same place in
the word order.  So I'm starting to feel like maybe we should leave
well enough alone.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Andrew Dunstan

Gregory Stark wrote:

Bruce Momjian [EMAIL PROTECTED] writes:

  

The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
sounded like a different type of index, not a different way to build the
index.  I don't think CONCURRENTLY has that problem, so CREATE
CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
CREATE CONCURRENTLY, INDEX ii.



That doesn't sound like English at all to me.

Fwiw, I think the best option was what Tom did. The gotcha I tripped on seems
pretty minor to me.

  


It's a form of construction my father (who was a noted orator) loved to 
use, maybe a little too much. It is arguably slightly archaic, but 
nevertheless quite grammatical ;-) I agree that these days it is more 
idiomatic to defer the adverb until after the object of the verb in most 
cases.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Zeugswetter Andreas DCP SD

  What bothers me about what we have now is that we have optional 
  keywords before and after INDEX, rather than only between 
 CREATE and INDEX.
 
 Yeah, putting them both into that space seems consistent to 
 me, and it will fix the problem of making an omitted index 
 name look like a valid command.
 
 I'm not sure I should be opening this can of worms, but do we 
 want to use a different keyword than CONCURRENTLY to make it 
 read better there?

precedent syntax (Oracle, Informix) uses the keyword ONLINE at the end:
 CREATE INDEX blabla_x0 ON blabla (a,b) ONLINE;

I'd stick with that.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Alvaro Herrera
Zeugswetter Andreas DCP SD wrote:
 
   What bothers me about what we have now is that we have optional 
   keywords before and after INDEX, rather than only between 
  CREATE and INDEX.
  
  Yeah, putting them both into that space seems consistent to 
  me, and it will fix the problem of making an omitted index 
  name look like a valid command.
  
  I'm not sure I should be opening this can of worms, but do we 
  want to use a different keyword than CONCURRENTLY to make it 
  read better there?
 
 precedent syntax (Oracle, Informix) uses the keyword ONLINE at the end:
  CREATE INDEX blabla_x0 ON blabla (a,b) ONLINE;

That was what the patch originally used, but it was changed because it
made difficult for psql to auto-complete that.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane
Zeugswetter Andreas DCP SD [EMAIL PROTECTED] writes:
 precedent syntax (Oracle, Informix) uses the keyword ONLINE at the end:
  CREATE INDEX blabla_x0 ON blabla (a,b) ONLINE;

We rejected that one already ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Zeugswetter Andreas DCP SD

  precedent syntax (Oracle, Informix) uses the keyword ONLINE 
 at the end:
   CREATE INDEX blabla_x0 ON blabla (a,b) ONLINE;
 
 That was what the patch originally used, but it was changed 
 because it made difficult for psql to auto-complete that.

That is imho not enough of a reason to divert.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 That was what the patch originally used, but it was changed because it
 made difficult for psql to auto-complete that.

Psql has to be able to parse it not for auto-completion but because it needs
to know that it's not a transactional command. The regular CREATE INDEX can be
run from within a transaction but online index builds use two transactions on
their own so psql has to know not to insert a BEGIN and savepoint around it.

I'll use this opportunity to plug that feature again. I think most people
should use autocommit off with on_error_rollack on for most of their daily
use. Being able to double check the results of my ad-hoc updates before
committing them saved me more headaches than I can count with Oracle.
Autocommit off only became practical for interactive use with postgres when
savepoints showed up.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane
Zeugswetter Andreas DCP SD [EMAIL PROTECTED] writes:
 That was what the patch originally used, but it was changed 
 because it made difficult for psql to auto-complete that.

 That is imho not enough of a reason to divert.

My recollection is that the principal argument against ONLINE was
that it didn't convey the function of the option to anyone who
didn't already know Oracle's usage of the term.

Also, psql's problem is not with auto-completion, it's with
detecting whether the command is allowed inside a transaction
block.  That's not a function we can just blow off.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Jim C. Nasby
On Fri, Aug 25, 2006 at 06:57:58PM +0100, Gregory Stark wrote:
 I'll use this opportunity to plug that feature again. I think most people
 should use autocommit off with on_error_rollack on for most of their daily
 use. Being able to double check the results of my ad-hoc updates before
 committing them saved me more headaches than I can count with Oracle.
 Autocommit off only became practical for interactive use with postgres when
 savepoints showed up.

+1
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Jim C. Nasby
On Fri, Aug 25, 2006 at 11:25:43AM -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  The problem is that what the qualifier is doing is modifying the
  operation itself, not the properties of the index to be created, like
  UNIQUE, which modifies the index.
 
 Right, which was the same point Bruce made earlier.  And really you
 can't respect that difference while putting them into the same place in
 the word order.  So I'm starting to feel like maybe we should leave
 well enough alone.

Since we might eventually have other 'concurrent commands', perhaps

CONCURRENT CREATE INDEX ...

would be best.

BTW, if we started to consider lazy vacuum a concurrent command we could
ditch the use of FULL, which is always confusing if you're talking about
database-wide vacuums. I know it'd take many versions to fully make that
change, but it seems worth it to me to reduce confusion. There's also an
issue of newbies thinking they should use vacuum full regularly because
it's somehow better than lazyvac.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-24 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 I wrote:
 The problem case is that we take a tuple and try to insert it into the index.
 Meanwhile someone else updates the tuple, and they're faster than us so
 they get the new version into the index first.  Now our aminsert sees a
 conflicting index entry, and as soon as it commits good aminsert will
 raise a uniqueness error.  There's no backoff for oh, the tuple I'm
 inserting stopped being live while I was inserting it.

 It's possible that the problem could be solved by introducing such a
 backoff, ie, make aminsert recheck liveness of the tuple-to-be-inserted
 before declaring error.  Since we're about to fail anyway, performance
 of this code path probably isn't a huge issue.  But I haven't thought
 through whether it can be made to work with that addition.

Yesterday I considered if I could just catch the error in validate_index and
retest HeapSatisfiesVacuum after the insert but found that didn't work any
better. I don't remember the problem though and it's possible it would work if
it were inside aminsert.

 Unless someone's got a brilliant idea, my recommendation at this point
 is that we restrict the patch to building only non-unique indexes.
 Per discussion upthread, that's still a useful feature.  We can revisit
 the problem of doing uniqueness checks correctly in some future release,
 but time to work on it for 8.2 is running out fast.

I agree. There's other functionality in this area that would be nice too such
as REINDEX CONCURRENTLY and deleting the invalid index in case of error. Once
one chunk gets into CVS it makes it easier to extend it without making for a
bigger divergence to merge in one day.

I was also considering going ahead and implementing Hannu's ALTER INDEX SET
UNIQUE too. We would have the option of making CREATE UNIQUE INDEX
CONCURRENTLY automatically invoke that code afterwards. It would require a
second waiting phase though and a full index scan so it would be a much slower
option than handling it in the index build. On the plus side it would never
have to lock anything -- locking things inside a command explicitly billed as
concurrent strikes me as undesirable.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-24 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I was also considering going ahead and implementing Hannu's ALTER INDEX SET
 UNIQUE too.

Don't waste your time, when we don't have an algorithm that would make
it work.  It's too late for 8.2 anyway...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-24 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 I was also considering going ahead and implementing Hannu's ALTER INDEX SET
 UNIQUE too.

 Don't waste your time, when we don't have an algorithm that would make
 it work.  It's too late for 8.2 anyway...

Oh, I think ALTER INDEX SET UNIQUE is easy, at least algorithm-wise. We set
the index to be unique, then wait until everyone can see it. Then we scan to
make sure there's only one live tuple for each key value. As long as it's
unique in our snapshot and we can be sure that any concurrent changes will
preserve that property then we can be sure it's good.

Hm, I suppose we have to keep the index marked invalid during this operation
so it doesn't appear as if there's a promise that the data is unique. That's
fine for freshly built concurrent indexes but not so nice for an existing
non-unique index. We might need a new column induniqueisvalid that indicates
that a unique constraint can't be trusted yet.

I suppose it's 8.3 material. And maybe not even the most urgent item either.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-24 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Unless someone's got a brilliant idea, my recommendation at this point
 is that we restrict the patch to building only non-unique indexes.

 I assume you'll add the check?

Yeah, I'll take care of it.

At the moment it may be moot, because I've realized that validate_index
doesn't work anyway.  It is scanning the index and then assuming that
any tuple inserted into the index subsequent to that scan will still be
INSERT_IN_PROGRESS when the heapscan reaches it.  That's completely
bogus of course --- the tuple could be committed live or even already
deleted by the time the heapscan reaches it.  This would result in
making duplicate index entries, which is unacceptable even in a
nonunique index (it'd cause indexscans to return the same row twice).

I'm trying to work out whether we can fix this by taking an MVCC
snapshot before we scan the index, and then inserting all tuples we find
in the heap that are live according to the snap but are not present in
the indexscan data.  There are still race conditions in this but I think
sufficient delay would fix them.  Considerations:

* If a tuple is good according to the snap, it must have been inserted
by an xact that committed before the snap, therefore there is no
still-in-progress index insertion happening for it.  So if it's not in
the sorted indexscan data we know we must insert it.  If it is in the
indexscan data, of course we don't have to.

* If a tuple is not good according to the snap, there are three
possibilities:

** It was never committed good at all.  We need not index it.

** It was inserted by a transaction that committed post-snap.  We assume
that that transaction has or will index it.

** It was deleted by a transaction that committed pre-snap.  We assume
we need not index it because no remaining transaction will care about it.

The trick is that we must wait long enough to ensure that those two
assumptions are good.  We already know about waiting long enough to
ensure that all live transactions are aware of the index, which takes
care of the first assumption as long as we take the snap after that
wait.  However, the second assumption means that we must be sure there
are no other backends with serializable snapshots older than the snap
we are using for this.  I think this means that we wait for all xacts
to be aware of our index, take the reference snap, then wait for all
xacts except ours to die --- without exception --- and then we can
get on with the second phase of the work.

It might be OK to do the indexscan and sort before we do the second
wait, though, so we could get at least something done.

Comments?  Have I missed any case for a tuple to fail the snap?

Does this analysis change our conclusions about whether we can
support unique index builds?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-24 Thread Tom Lane
I wrote:
 I'm trying to work out whether we can fix this by taking an MVCC
 snapshot before we scan the index, and then inserting all tuples we find
 in the heap that are live according to the snap but are not present in
 the indexscan data.  There are still race conditions in this but I think
 sufficient delay would fix them.  Considerations:

 * If a tuple is good according to the snap, it must have been inserted
 by an xact that committed before the snap, therefore there is no
 still-in-progress index insertion happening for it.  So if it's not in
 the sorted indexscan data we know we must insert it.  If it is in the
 indexscan data, of course we don't have to.

 * If a tuple is not good according to the snap, there are three
 possibilities:

 ** It was never committed good at all.  We need not index it.

 ** It was inserted by a transaction that committed post-snap.  We assume
 that that transaction has or will index it.

 ** It was deleted by a transaction that committed pre-snap.  We assume
 we need not index it because no remaining transaction will care about it.

 The trick is that we must wait long enough to ensure that those two
 assumptions are good.  We already know about waiting long enough to
 ensure that all live transactions are aware of the index, which takes
 care of the first assumption as long as we take the snap after that
 wait.  However, the second assumption means that we must be sure there
 are no other backends with serializable snapshots older than the snap
 we are using for this.  I think this means that we wait for all xacts
 to be aware of our index, take the reference snap, then wait for all
 xacts except ours to die --- without exception --- and then we can
 get on with the second phase of the work.

After a couple hours' thought I have not found any flaws in the above
analysis; furthermore I believe that the second wait need not
happen until just before we are ready to mark the index valid and
commit.  So that means that after the indexscan and second heapscan,
we wait for any xacts alive before that to end.  That's not too bad.

 Does this analysis change our conclusions about whether we can
 support unique index builds?

I also believe that we can support unique index builds after all.
There are two further changes needed besides the above:

* The first pass has to insert tuples that are good according to a
snapshot, rather than using HeapTupleSatisfiesVacuum.  This prevents
the problem of trying to unique-check multiple versions of the same
row that all chanced to be committed live at the instants we passed
over them.  If ambuild gets a failure it means the dataset was not
unique at the instant of the snap, so the user can hardly complain
about getting the error.  This will somewhat reduce the coverage of
the initial index build, but that seems like a small price.  (Perhaps
the docs should recommend using a smaller-than-normal fillfactor during
concurrent index creation?)

* We have to add the already-discussed logic to bt_check_unique to
recheck liveness of the to-be-inserted tuple immediately before it
raises a unique-index-violation error.  This is a waste of cycles
in the normal case, but since it's in an error path that doesn't
seem like a big problem.  (Is it worth extending the aminsert API
so that aminsert knows whether it's being called by CREATE INDEX
CONCURRENTLY or not?  I'm inclined not to bother, at least not
right now with the bitmap AM patch still unmerged.)

* When about to insert a tuple in pass 2, we can make the code
check liveness of the tuple and suppress the unique check if it's
already committed dead.  This means the bt_check_unique addition
only matters if the tuple goes dead in the interval between starting the
index insert and finishing it.  However that's an important case
to handle the concurrent-UPDATE scenario: if we find a conflicting
entry in the index, we'll wait for its inserter to commit, and that
is the same transaction that will have committed our new entry dead.
So the recheck is necessary and sufficient to cover this case.

Barring objections, I'm off to program this.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-08-22 kell 16:48, kirjutas Tom Lane:
 Joshua D. Drake [EMAIL PROTECTED] writes:
  It's fairly clear that we could support concurrent builds of nonunique
  indexes, but is that enough of a use-case to justify it?
 
  I believe there would be. Most PostgreSQL users I run into, develop in 
  production, which means being able to add an index they forgot when 
  doing query analysis.
 
 True, unique constraints are usually something you should get right to
 start with.  But it'll be annoying if we can do everything BUT that :-(

Maybe we could find a way to build a non-unique index first and then
convert it to a unique one later, in yet another pass ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-08-23 kell 11:05, kirjutas Hannu Krosing:
 Ühel kenal päeval, T, 2006-08-22 kell 16:48, kirjutas Tom Lane:
  Joshua D. Drake [EMAIL PROTECTED] writes:
   It's fairly clear that we could support concurrent builds of nonunique
   indexes, but is that enough of a use-case to justify it?
  
   I believe there would be. Most PostgreSQL users I run into, develop in 
   production, which means being able to add an index they forgot when 
   doing query analysis.
  
  True, unique constraints are usually something you should get right to
  start with.  But it'll be annoying if we can do everything BUT that :-(
 
 Maybe we could find a way to build a non-unique index first and then
 convert it to a unique one later, in yet another pass ?

Or even add ALTER INDEX myindex ADD/DROP UNIQUE; command

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Zeugswetter Andreas DCP SD

  Is it not possible to brute force this adding an AM method to insert

  without the uniqueness check?
 
 Hm.  Actually there already is a feature of aminsert to allow 
 suppressing the unique check, but I'm not sure whether using 
 it for RECENTLY_DEAD tuples helps.  Seems like we have to 
 wait to see whether DELETE_IN_PROGRESS deleters commit in any case.

Um, but if we wait for the DELETE_IN_PROGRESS tuple, after the wait we
can
add it eighter with or without the unique check (depending on
commit/abort).

Then at least we don't need to wait in a 3rd pass for readers ?

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 What I think we can do about this is to include DELETE_IN_PROGRESS
 tuples in the set of candidate tuples to insert in the second pass.
 During the merge step that verifies whether the tuple is already
 in the index, if we find that it's not, then we must wait for the
 deleter to commit or roll back.  If the deleter commits then we
 ignore the tuple.  If the deleter rolls back then we have to insert
 the tuple in the index.  (I think we have to actually take a FOR
 UPDATE or possibly FOR SHARE lock on the tuple while we do this,
 else we have race conditions against someone else starting a new
 deletion attempt on the tuple.)  

Hm, my first thought was to just try to get a lock on the record which would
inherently wait until the deleter commits or aborts.

But then wouldn't we have deadlock risks? If we come across these records in a
different order from someone else (possibly even the deleter) who also wants
to lock them? Or would it be safe to lock and release them one by one so we
only every hold one lock at a time?

I'm also pondering whether it might be worth saving up all the
DELETE_IN_PROGRESS tuples in a second tuplesort and processing them all in a
third phase. That seems like it would reduce the amount of waiting that might
be involved. The fear I have though is that this third phase could become
quite large.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 In the past, the only way we could see HEAPTUPLE_INSERT_IN_PROGRESS
 or HEAPTUPLE_DELETE_IN_PROGRESS was for tuples created/deleted by our
 own transaction, and so the actions taken by IndexBuildHeapScan are
 to include in the index in both cases, but exclude DELETE_IN_PROGRESS
 tuples from the uniqueness check.
 
 This does not work for a concurrent build, though, because if the
 in-progress delete is from another transaction, it could roll back after
 we look.  In that case we have an entry that is in the index and has
 escaped the uniqueness check.  If it conflicts with another tuple also
 entered into the index in the first pass, we'll never notice that.




 I think we can solve this by having IndexBuildHeapScan not index
 DELETE_IN_PROGRESS tuples if it's doing a concurrent build.  The problem
 of old transactions trying to use the index does not exist, because
 we'll wait 'em out before marking the index valid, so we need not
 worry about preserving validity for old snapshots.  And if the deletion
 does in fact roll back, we'll insert the tuple during the second pass,
 and catch any uniqueness violation at that point.

Actually that's a bit of a pain. This function is called from the AM functions
so it doesn't have any access to whether it's doing a concurrent build or not.
I would have to stuff a flag in the indexInfo or change the AM api.

The API is pretty bizarre around this. The core calls the AM to build the
index, the AM calls back this function in core to do the scan, then this
function calls back into the AM to handle the inserts. 

It seems like it would be simpler to leave the core in charge the whole time.
It would call an AM method to initialize state, then call an AM method for
each tuple that should be indexed, and lastly call a finalize method.


Also, I think there may be another problem here with INSERT_IN_PROGRESS. I'm
currently testing unique index builds while pgbench is running and I'm
consistently getting unique index violations from phase 1. I think what's
happening is that insert that haven't committed yet (and hence ought to be
invisible to us) are hitting unique constraint violations against older
versions that are still alive to us. 

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Greg Stark

[Sorry for the duplicate -- I accidentally sent the previous before I was
finished editing it]

Tom Lane [EMAIL PROTECTED] writes:

 I think we can solve this by having IndexBuildHeapScan not index
 DELETE_IN_PROGRESS tuples if it's doing a concurrent build.  The problem
 of old transactions trying to use the index does not exist, because
 we'll wait 'em out before marking the index valid, so we need not
 worry about preserving validity for old snapshots.  And if the deletion
 does in fact roll back, we'll insert the tuple during the second pass,
 and catch any uniqueness violation at that point.

Actually that's a bit of a pain. This function is called from the AM functions
so it doesn't have any access to whether it's doing a concurrent build or not.
I would have to stuff a flag in the indexInfo or change the AM api.

The API is pretty bizarre around this. The core calls the AM to build the
index, the AM calls back this function in core to do the scan, then this
function calls back into the AM to handle the inserts. 

It seems like it would be simpler to leave the core in charge the whole time.
It would call an AM method to initialize state, then call an AM method for
each tuple that should be indexed, and lastly call a finalize method.


Also, I think there may be another problem here with INSERT_IN_PROGRESS. I'm
currently testing unique index builds while pgbench is running and I'm
consistently getting unique index violations from phase 1. I think what's
happening is that insert that haven't committed yet (and hence ought to be
invisible to us) are hitting unique constraint violations against older
versions that are still alive to us. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Greg Stark
Hannu Krosing [EMAIL PROTECTED] writes:

 Ühel kenal päeval, K, 2006-08-23 kell 11:05, kirjutas Hannu Krosing:
  
  Maybe we could find a way to build a non-unique index first and then
  convert it to a unique one later, in yet another pass ?
 
 Or even add ALTER INDEX myindex ADD/DROP UNIQUE; command

That would be great. But note that it suffers from precisely the same problem.
If you come across a couple of records with the same key and one of them is
DELETE_IN_PROGRESS then you'll have to wait until you can acquire a sharelock
on it before you can determine if there's a constraint violation.


Hmmm. Or is that true. The problem may be somewhat easier since at least you
can be sure every tuple in the heap is in the index. So if you see a
DELETE_IN_PROGRESS either it *was* a constraint violation prior to the delete
and failing is reasonable or it's an update in which case maybe it's possible
to detect that they're part of the same chain?

(Actually there is another corner case. a transaction that inserts a value,
then deletes it in the same transaction then inserts that same value again.
Now you have a INSERT_IN_PROGRESS and a DELETE_IN_PROGRESS that conflict but
should be allowed since they come from the same transaction. Hopefully the
ALTER INDEX command would be able to determine they come from the same
transaction.)

In the case of concurrent index builds that's not really safe since you don't
have the other tuples you're conflicting with together at the same time and
even if you did you may or may not have a complete set of them.

Tom's right. This stuff is tricky.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 But then wouldn't we have deadlock risks? If we come across these records in a
 different order from someone else (possibly even the deleter) who also wants
 to lock them? Or would it be safe to lock and release them one by one so we
 only every hold one lock at a time?

AFAICS we could release the lock as soon as we've inserted the index
entry.  (Whether there is any infrastructure to do that is another
question...)

 I'm also pondering whether it might be worth saving up all the
 DELETE_IN_PROGRESS tuples in a second tuplesort and processing them all in a
 third phase. That seems like it would reduce the amount of waiting that might
 be involved. The fear I have though is that this third phase could become
 quite large.

Actually --- a tuple that is live when we do the second pass scan
could well be DELETE_IN_PROGRESS (or even RECENTLY_DEAD) by the time we
do the merge and discover that it hasn't got an index entry.  So offhand
I'm thinking that we *must* take a tuple lock on *every* tuple we insert
in stage two.  Ugh.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 It seems like it would be simpler to leave the core in charge the whole time.
 It would call an AM method to initialize state, then call an AM method for
 each tuple that should be indexed, and lastly call a finalize method.

[ shrug... ]  I'm uninterested in refactoring the AM API right now.
We've got enough stuff to deal with before beta, not to mention an
uncommitted bitmap AM patch that it would certainly break.

 Also, I think there may be another problem here with INSERT_IN_PROGRESS. I'm
 currently testing unique index builds while pgbench is running and I'm
 consistently getting unique index violations from phase 1. I think what's
 happening is that insert that haven't committed yet (and hence ought to be
 invisible to us) are hitting unique constraint violations against older
 versions that are still alive to us. 

Hmm ... it's certainly highly likely that we could pick up multiple
versions of a row during pass 1, but the uniqueness checker should
notice that some versions are dead?  Oooh, no there's a problem:
the tuple we are inserting could become dead in the interval between
when we pick it up and when we put it into the index.  So we could
try to put multiple versions of the same row into the uniqueness check.

Right at the moment, unique index builds with this mechanism are looking
unfixably broken :-(.  Anyone see any chance at all of making them work?
Maybe we should just cut our losses and go with the nonunique case only.
That one is pretty easy: just stuff everything in the index ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Hmmm. Or is that true. The problem may be somewhat easier since at least you
 can be sure every tuple in the heap is in the index. So if you see a
 DELETE_IN_PROGRESS either it *was* a constraint violation prior to the delete
 and failing is reasonable or it's an update in which case maybe it's possible
 to detect that they're part of the same chain?

Unless we are willing to lock every single tuple while we insert it,
this seems unfixable to me.  Without a lock, the tuple could become
DELETE_IN_PROGRESS immediately after we look at it.

Actually it's worse than that.  We could examine a tuple, see that
it's good, include it in the uniqueness check.  Then someone updates
the tuple and puts the new version near the end of the table.  By
the time we reach that version, it could be committed good.  There
is absolutely no way that we could notice an issue without applying
extremely expensive tests to *every* apparently-good tuple.

[ thinks for a bit... ]  At least, it seems hopeless if we use
SnapshotNow.  Does it help if we use a real snapshot?  I'm thinking
pass 1 inserts exactly those tuples that are good according to a
snap taken at its beginning, and then pass 2 considers only tuples
that are good according to a snap taken at *its* beginning.  But
having consumed no caffeine yet this morning, I'm not sure I can
spot any flaws that might exist in this idea.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-08-23 kell 09:01, kirjutas Tom Lane:
 Greg Stark [EMAIL PROTECTED] writes:
  Hmmm. Or is that true. The problem may be somewhat easier since at least you
  can be sure every tuple in the heap is in the index. So if you see a
  DELETE_IN_PROGRESS either it *was* a constraint violation prior to the 
  delete
  and failing is reasonable or it's an update in which case maybe it's 
  possible
  to detect that they're part of the same chain?
 
 Unless we are willing to lock every single tuple while we insert it,
 this seems unfixable to me.  Without a lock, the tuple could become
 DELETE_IN_PROGRESS immediately after we look at it.
 
 Actually it's worse than that.  We could examine a tuple, see that
 it's good, include it in the uniqueness check.  Then someone updates
 the tuple and puts the new version near the end of the table.  By
 the time we reach that version, it could be committed good. 

Perhaps we should scan the index in index order, and set the unique flag
per index page.

That would 
a) help us avoid going to heap unless there are multiple entries for
some value and 
b) enable us, once all index pages containing pointers to some possibly
duplicate value are checked, to release locks on those tuples.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Tom Lane
stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 [ thinks for a bit... ]  At least, it seems hopeless if we use
 SnapshotNow.  Does it help if we use a real snapshot?  I'm thinking
 pass 1 inserts exactly those tuples that are good according to a
 snap taken at its beginning, and then pass 2 considers only tuples
 that are good according to a snap taken at *its* beginning.  But
 having consumed no caffeine yet this morning, I'm not sure I can
 spot any flaws that might exist in this idea.

 What about tuples that are inserted and committed in the window between the
 two phases. Ie, they're RECENTLY_DEAD but not in phase2's snapshot.

We'd put them in the index but skip uniqueness check.

 Or do you mean we use SatisfiesVacuum to determine what to insert but
 SatisfiesSnapshot to determine whether to check uniqueness?

Right.  The problems seem to all stem from the risk of trying to
unique-check more than one version of a tuple, and using a snap would
stop that.  We need to think through all the cases though and be sure
they all work.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread stark
Tom Lane [EMAIL PROTECTED] writes:

 Or do you mean we use SatisfiesVacuum to determine what to insert but
 SatisfiesSnapshot to determine whether to check uniqueness?

 Right.  The problems seem to all stem from the risk of trying to
 unique-check more than one version of a tuple, and using a snap would
 stop that.  We need to think through all the cases though and be sure
 they all work.

What happens if someone inserts a record that we miss, but it gets deleted by
the same phase 2 starts. So it's not visible to phase 2 but conflicts with
some other record we find. I suppose that's ok since the delete would have to
have comitted for that to happen. It just means that having a unique
constraint doesn't guarantee uniqueness if your transaction started before the
index was finished being built.

Or what if there's an insert that occurs before phase 2 starts and hasn't
committed yet. There's a conflicting record in the heap that's missing in the
index. I guess the build would have to block when it finds the missing record
until the new insert either commits or aborts just like inserts do when a user
inserts a potential conflict. Would I have to handle that myself or does
index_insert handle that automatically?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Tom Lane
stark [EMAIL PROTECTED] writes:
 What happens if someone inserts a record that we miss, but it gets deleted by
 the same phase 2 starts. So it's not visible to phase 2 but conflicts with
 some other record we find. I suppose that's ok since the delete would have to
 have comitted for that to happen.

I think that's OK, but the whole idea of using an MVCC snap in phase 2
doesn't work on closer inspection.  The problem is still the same one
that you need to take (at least) share lock on each tuple you insert
into the index.  Telling aminsert to check uniqueness implicitly assumes
the new tuple is live, and without any lock on the tuple you can't
promise that.  So there's a significant risk of falsely declaring a
uniqueness failure due to someone else's perfectly innocent UPDATE.
Using an MVCC snap would actually make this worse, by widening the
window for that UPDATE to happen.

Even though we're hoping not to have to process too many tuples here,
having to row-lock each one sounds pretty awful.  Remember that that
involves writing a WAL record these days.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 I think that's OK, but the whole idea of using an MVCC snap in phase 2
 doesn't work on closer inspection.  The problem is still the same one
 that you need to take (at least) share lock on each tuple you insert
 into the index.  Telling aminsert to check uniqueness implicitly assumes
 the new tuple is live, and without any lock on the tuple you can't
 promise that.  

No wait. It's still live according to my snapshot. How could it be possible
for a single snapshot to see two different versions of the same tuple as live?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 I think that's OK, but the whole idea of using an MVCC snap in phase 2
 doesn't work on closer inspection.  The problem is still the same one
 that you need to take (at least) share lock on each tuple you insert
 into the index.  Telling aminsert to check uniqueness implicitly assumes
 the new tuple is live, and without any lock on the tuple you can't
 promise that.  

 No wait. It's still live according to my snapshot. How could it be possible
 for a single snapshot to see two different versions of the same tuple as live?

The problem case is that we take a tuple and try to insert it into the index.
Meanwhile someone else updates the tuple, and they're faster than us so
they get the new version into the index first.  Now our aminsert sees a
conflicting index entry, and as soon as it commits good aminsert will
raise a uniqueness error.  There's no backoff for oh, the tuple I'm
inserting stopped being live while I was inserting it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-23 Thread Tom Lane
I wrote:
 The problem case is that we take a tuple and try to insert it into the index.
 Meanwhile someone else updates the tuple, and they're faster than us so
 they get the new version into the index first.  Now our aminsert sees a
 conflicting index entry, and as soon as it commits good aminsert will
 raise a uniqueness error.  There's no backoff for oh, the tuple I'm
 inserting stopped being live while I was inserting it.

It's possible that the problem could be solved by introducing such a
backoff, ie, make aminsert recheck liveness of the tuple-to-be-inserted
before declaring error.  Since we're about to fail anyway, performance
of this code path probably isn't a huge issue.  But I haven't thought
through whether it can be made to work with that addition.

Unless someone's got a brilliant idea, my recommendation at this point
is that we restrict the patch to building only non-unique indexes.
Per discussion upthread, that's still a useful feature.  We can revisit
the problem of doing uniqueness checks correctly in some future release,
but time to work on it for 8.2 is running out fast.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-22 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 I think we can solve this by having IndexBuildHeapScan not index
 DELETE_IN_PROGRESS tuples if it's doing a concurrent build.  

Sure

 It's entirely possible for a tuple that is RECENTLY_DEAD or
 DELETE_IN_PROGRESS to have no entry in the index, if it was inserted
 during the first pass, and then the deletion occurred after the first
 pass (and either has or hasn't committed yet).  

Egads. That's nasty indeed.

 Furthermore, the patch also tries to insert RECENTLY_DEAD tuples, which is
 good for MVCC coverage, but wrong for uniqueness checking --- keep in mind
 that in the second pass, we are just doing normal index insertions, and so
 anything we insert into the index will be uniqueness-checked as though still
 alive. We could get a uniqueness failure that should not occur, eg from
 trying to insert the old version of a recently-updated row.

Hm, I hadn't absorbed the purpose of isAlive and the distinction between live
for uniqueness checks and live for index build purposes.

Is it not possible to brute force this adding an AM method to insert without
the uniqueness check? That would mean the index build would fail even if the
transaction eventually aborts though. (or even if it has already aborted?)

[ extended description of complex footwork involving more waiting while
holding locks ]

 Have I missed anything?  This is tricky stuff.

Wow, that seems pretty unsatisfactory, all the waiting and locking sounds
awful. If you have a lot of update transactions starting continuously you
could keep bumping into this situation and repeatedly have to wait for new
transactions to end.

It also seems like a lot of code :(

 In any case it's clear that the patch still needs major work.
 Greg, do you have cycles to spare now?

I do. But I'll have to spend some time just rereading the code and your
comments to convince myself that all this waiting and locking is the best
solution.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-22 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Is it not possible to brute force this adding an AM method to insert without
 the uniqueness check?

Hm.  Actually there already is a feature of aminsert to allow
suppressing the unique check, but I'm not sure whether using it for
RECENTLY_DEAD tuples helps.  Seems like we have to wait to see whether
DELETE_IN_PROGRESS deleters commit in any case.

 Have I missed anything?  This is tricky stuff.

 Wow, that seems pretty unsatisfactory, all the waiting and locking sounds
 awful.

Yeah, I'm very unhappy.  The whole idea may be going down in flames :-(
It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-22 Thread Joshua D. Drake



Wow, that seems pretty unsatisfactory, all the waiting and locking sounds
awful.


Yeah, I'm very unhappy.  The whole idea may be going down in flames :-(
It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?


I believe there would be. Most PostgreSQL users I run into, develop in 
production, which means being able to add an index they forgot when 
doing query analysis.


Most of the time (I would say 95%) this is not a unique index.

Sincerely,

Joshua D. Drake




regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-22 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 It's fairly clear that we could support concurrent builds of nonunique
 indexes, but is that enough of a use-case to justify it?

 I believe there would be. Most PostgreSQL users I run into, develop in 
 production, which means being able to add an index they forgot when 
 doing query analysis.

True, unique constraints are usually something you should get right to
start with.  But it'll be annoying if we can do everything BUT that :-(

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-22 Thread Joshua D. Drake

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

It's fairly clear that we could support concurrent builds of nonunique
indexes, but is that enough of a use-case to justify it?


I believe there would be. Most PostgreSQL users I run into, develop in 
production, which means being able to add an index they forgot when 
doing query analysis.


True, unique constraints are usually something you should get right to
start with.  But it'll be annoying if we can do everything BUT that :-(


Agreed, but better then nothing :).

Sincerely,

Joshua D. Drake



regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-22 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 Greg Stark [EMAIL PROTECTED] writes:
  Is it not possible to brute force this adding an AM method to insert without
  the uniqueness check?
 
 Hm.  Actually there already is a feature of aminsert to allow
 suppressing the unique check, but I'm not sure whether using it for
 RECENTLY_DEAD tuples helps.  Seems like we have to wait to see whether
 DELETE_IN_PROGRESS deleters commit in any case.

Hm, actually don't we need both of these to make it work? We need to see
whether the deleter commits to determine whether to enforce the uniqueness
constraint on the missing tuple. 

. If the deleter aborts we need to insert the tuple normally including
  enforcing the constraint.

. If the deleter commits then we still need to insert the tuple but without
  enforcing the constraint in case some old transaction queries the index

What would happen if we just insert DELETE_IN_PROGRESS tuples normally? Would
the only risk be that the index build would fail with a spurious unique
constraint violation? I suppose it would be pretty common though given how
updates work.


Incidentally does this point out a problem with the planner depending on
unique constraints? For old (serializable) transactions the unique index
exists and the constraint is enforced but they can still find tuples that were
deleted before the index was built and might violate the unique constraint.
Even if we had the plan invalidation mechanism that's frequently mentioned you
would still have to check constraints against your snapshot and not just
snapshotnow for planning purposes.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-22 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 What would happen if we just insert DELETE_IN_PROGRESS tuples normally? Would
 the only risk be that the index build would fail with a spurious unique
 constraint violation? I suppose it would be pretty common though given how
 updates work.

Yeah, that's the problem: if we can't support UPDATEs that don't change
the to-be-unique column, it ain't much of a feature.

 Incidentally does this point out a problem with the planner depending on
 unique constraints?

Good point.  It doesn't depend on them yet, but we've been hoping to
make it do so once we have plan invalidation capability.  We shall have
to think very carefully about timing semantics of all that.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster