subject:"\[HACKERS\] CREATE INDEX and HOT \- revised design"

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Tom Lane

Pavan Deolasee [EMAIL PROTECTED] writes:
 How about storing the snapshot which we used during planning in
 CachedPlanSource, if at least one index was seen unusable because
 its CREATE INDEX transaction was seen as in-progress ?

I'm getting tired of repeating this, but: the planner doesn't use a
snapshot.  System catalogs run on SnapshotNow.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee


Tom Lane wrote:

 I'm getting tired of repeating this, but: the planner doesn't use a
 snapshot.  System catalogs run on SnapshotNow.


I am really sorry if I sound foolish here. I am NOT suggesting
that we use snapshot to read system catalogs. I understand
that system catalogs run on SnapshotNow and all transactions,
irrespective of when they started, would see the changes to
system catalogs as soon as the transaction updating the
system catalog commits.

What I am suggesting is to use ActiveSnapshot (actually
Florian's idea) to decide whether the transaction that created
index was still running when we started. Isn't it the case that
some snapshot will be active when we plan ? The active
snapshot may change later in the same transaction if we
are running in read-committed mode, and we may need to
invalidate the plan.

Here is what I suggest to do in get_relation_info():

+if (index-indcreatexid != InvalidTransactionId)
+{
+Assert(ActiveSnapshot);
+if (XidInMVCCSnapshot(index-indcreatexid, ActiveSnapshot))
+{
+index_close(indexRelation, NoLock);
+continue;
+}
+/*
+ * Otherwise the index is usable
+ */
+}


Is there a problem with this ?

I really appreciate all the help I am receiving on this. But
there is hardly anything else that I can do than post my
thoughts and get feedback, until we find a clean solution :-(

Thanks,
Pavan


--


EnterpriseDBhttp://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Florian G. Pflug


Tom Lane wrote:

Pavan Deolasee [EMAIL PROTECTED] writes:

How about storing the snapshot which we used during planning in
CachedPlanSource, if at least one index was seen unusable because
its CREATE INDEX transaction was seen as in-progress ?


I'm getting tired of repeating this, but: the planner doesn't use a
snapshot.  System catalogs run on SnapshotNow.


But it would still do that - it would just compare the createxid of
the index against some snapshot, and the query would be replanned
if the cached result of this comparison differs from the one the
current snapshot yields.

It might well be that this won't work, because the planner is invoked
in situations where there is no active snapshot - I'm not sure if your 
comment refers to that case, or not.


greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Tom Lane

Pavan Deolasee [EMAIL PROTECTED] writes:
 What I am suggesting is to use ActiveSnapshot (actually
 Florian's idea) to decide whether the transaction that created
 index was still running when we started. Isn't it the case that
 some snapshot will be active when we plan ?

I do not think you can assume that the plan won't be used later with
some older snapshot.  Consider recursive plpgsql functions for a
counterexample: the inner occurrence might be the first to arrive at
a given line of the function, hence the first to plan it, yet when we
return to the outer instance we might revert to an older snapshot.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Florian G. Pflug


Tom Lane wrote:

Pavan Deolasee [EMAIL PROTECTED] writes:

What I am suggesting is to use ActiveSnapshot (actually
Florian's idea) to decide whether the transaction that created
index was still running when we started. Isn't it the case that
some snapshot will be active when we plan ?


I do not think you can assume that the plan won't be used later with
some older snapshot.  Consider recursive plpgsql functions for a
counterexample: the inner occurrence might be the first to arrive at
a given line of the function, hence the first to plan it, yet when we
return to the outer instance we might revert to an older snapshot.


So maybe we'd need to use the SerializableSnapshot created at the start
of each transaction for this check, and not the ActiveSnapshot? Could
that work?

What about doing
PREPARE myplan select ... ;
outside of a transaction? Will this be execute inside a transaction?
Is is a query always planned upon it's first execution, and not when
PREPARE is issued?

greetings, Florian Pflug



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee


On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote:




What about doing
PREPARE myplan select ... ;
outside of a transaction? Will this be execute inside a transaction?



I checked that. PREPARE runs with ActiveSnapshot set.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Tom Lane

Florian G. Pflug [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I do not think you can assume that the plan won't be used later with
 some older snapshot.

 So maybe we'd need to use the SerializableSnapshot created at the start
 of each transaction for this check, and not the ActiveSnapshot? Could
 that work?

That might work, but it doesn't seem to address the core objection:
there's no mechanism to cause the query to be replanned once the
snapshot is new enough, because no relcache inval will happen.  So
most likely existing backends will keep using old plans that don't
consider the index.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Simon Riggs

On Fri, 2007-03-30 at 11:44 +0530, Pavan Deolasee wrote:
 ISTM that the run-another-transaction-afterwards idea would have same
 problem with plan invalidation. When the second transaction commits,
 the relcache invalidation event is generated. The event may get
 consumed 
 by other backends, but the index may still not be usable to them
 because
 their xid  xcreat. If no more relcache invalidation events are
 generated
 after that, the backends would continue to use the cached plan, even 
 if index becomes usable to them later. 

Sounds like we need to allow create index invalidation events to be
processed at the the end of the current transaction in the *receiving*
backend. That way we don't need to do the run-another-transaction thing
and seems a helluva lot cleaner way of doing this.

Messages of SHAREDINVALRELCACHE_ID, would be augmented by a boolean
deferred flag on the SharedInvalRelcacheMsg struct. Received messages
would be stored in a third kind of InvalidationList, then processed
during AtEOXact_Inval() whether the receiving transaction commits or
not. (see src/backend/utils/cache/inval.c)

Not sure how we'd know to *send* the message marked as deferred, but
seems like we can work that out also.

That seems to allow CCI to not have to wait until the end of time
either.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee


On 3/30/07, Tom Lane [EMAIL PROTECTED] wrote:




That might work, but it doesn't seem to address the core objection:
there's no mechanism to cause the query to be replanned once the
snapshot is new enough, because no relcache inval will happen.  So
most likely existing backends will keep using old plans that don't
consider the index.




Can't we store the snapshot (or may be the transaction id) which was
used to plan the query in CachedPlanSource if and only if at least one
index was seen unusable  ? In RevalidateCachedPlan() we then check if
the snapshot has changed and replan the query in that case.

That would make the index usable in the subsequent transactions
in the same session, though we may not be able to use the index
in the same transaction, even if its running in read-commited mode.
Would that be acceptable ?

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Florian G. Pflug


Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

Tom Lane wrote:

I do not think you can assume that the plan won't be used later with
some older snapshot.



So maybe we'd need to use the SerializableSnapshot created at the start
of each transaction for this check, and not the ActiveSnapshot? Could
that work?


That might work, but it doesn't seem to address the core objection:
there's no mechanism to cause the query to be replanned once the
snapshot is new enough, because no relcache inval will happen.  So
most likely existing backends will keep using old plans that don't
consider the index.


Pavan suggested storing the IndexSnapshot in the cached plan, and to 
compare it to the IndexSnapshot when the query is executed.

If those two snapshots differ, the query would be replanned.

My idea was to store a list of xid's together with the cached plan that
are assumed to be uncommitted accoring to the IndexSnapshot. The query
is replanned if upon execution the IndexSnapshot assumes that one of
these xid's is committed.

Those two ideas seem to be mostly equivalent, mine seems to be a bit
more fine-grained, but at the cost of more work upon each query execution.

greetings, Florian Pflug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee


On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote:




My idea was to store a list of xid's together with the cached plan that
are assumed to be uncommitted accoring to the IndexSnapshot. The query
is replanned if upon execution the IndexSnapshot assumes that one of
these xid's is committed.



Actually, if we are using Serializable Snapshot then there is no chance
to replan the query before the transaction completes and the next
transaction to start in the session must see the index and hence
we must replan. So it would be enough just to associate a transaction
id with the cached plan. If this xid is set and our transaction id is
different than that, we replan.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Florian G. Pflug


Pavan Deolasee wrote:

On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote:


My idea was to store a list of xid's together with the cached plan that
are assumed to be uncommitted accoring to the IndexSnapshot. The query
is replanned if upon execution the IndexSnapshot assumes that one of
these xid's is committed.



Actually, if we are using Serializable Snapshot then there is no chance
to replan the query before the transaction completes and the next
transaction to start in the session must see the index and hence
we must replan. So it would be enough just to associate a transaction
id with the cached plan. If this xid is set and our transaction id is
different than that, we replan.


I believe this is true for the CREATE INDEX scenario. However, comparing
either the snapshot or the result of xid checks seems like it might
be useful for other things beside CREATE INDEX. I'm specifically 
thinking about TRUNCATE here - the create index + HOT problems sound

quite similar to the problems a non-exclusive-locking TRUNCATE would face.

greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Tom Lane

Pavan Deolasee [EMAIL PROTECTED] writes:
 Actually, if we are using Serializable Snapshot then there is no chance
 to replan the query before the transaction completes and the next
 transaction to start in the session must see the index and hence
 we must replan. So it would be enough just to associate a transaction
 id with the cached plan. If this xid is set and our transaction id is
 different than that, we replan.

Hm.  So anytime we reject a potentially useful index as being not valid
yet, we mark the plan as only good for this top-level transaction?
That seems possibly workable --- in particular it doesn't get more
complicated as soon as you consider multiple such indexes.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Simon Riggs

On Fri, 2007-03-30 at 13:54 -0400, Tom Lane wrote:
 Pavan Deolasee [EMAIL PROTECTED] writes:
  Actually, if we are using Serializable Snapshot then there is no chance
  to replan the query before the transaction completes and the next
  transaction to start in the session must see the index and hence
  we must replan. So it would be enough just to associate a transaction
  id with the cached plan. If this xid is set and our transaction id is
  different than that, we replan.
 
 Hm.  So anytime we reject a potentially useful index as being not valid
 yet, we mark the plan as only good for this top-level transaction?
 That seems possibly workable --- in particular it doesn't get more
 complicated as soon as you consider multiple such indexes.

I like that because its specific in dealing with the exact issue we have
- it doesn't rely on many other things happening correctly.

...and it also seems to provide a new route to avoiding the CIC wait.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee


On 3/31/07, Simon Riggs [EMAIL PROTECTED] wrote:


On Fri, 2007-03-30 at 13:54 -0400, Tom Lane wrote:


 Hm.  So anytime we reject a potentially useful index as being not valid
 yet, we mark the plan as only good for this top-level transaction?
 That seems possibly workable --- in particular it doesn't get more
 complicated as soon as you consider multiple such indexes.

I like that because its specific in dealing with the exact issue we have
- it doesn't rely on many other things happening correctly.



Ok. Cool. I would finish this work then.


...and it also seems to provide a new route to avoiding the CIC wait.


Yeah, though I would like to take that up later.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-30 Thread Pavan Deolasee


On 3/30/07, Tom Lane [EMAIL PROTECTED] wrote:



I do not think you can assume that the plan won't be used later with
some older snapshot.  Consider recursive plpgsql functions for a
counterexample: the inner occurrence might be the first to arrive at
a given line of the function, hence the first to plan it, yet when we
return to the outer instance we might revert to an older snapshot.



Thanks for making me aware of such scenarios. What it also means
is that a transaction may use an older snapshot after it created the
index. So to be on safer side, we should not use an index created
in the same transaction if we saw HOT-updated RECENTLY_DEAD
or DELETE_IN_PROGRESS tuples while building the index (with HOT
we don't want to index these tuples). Not such a bad restriction to
live with. May be we can do something with command ids to solve
this once we get the basic stuff ready.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Florian G. Pflug


Pavan Deolasee wrote:

In this specific context, this particular case is easy to handle because
we are only concerned about the serializable transactions started before
CREATE INDEX commits. If PREPARE can see the new index, it
implies that the CI transaction is committed. So the transaction
starting after than can only see the tuple version that we have indexed.


Yes, but the non-index plan PREPARE generated will be used until the end
of the session, nut only until the end of the transaction. Imagine that
it wasn't explicitly PREPARED (where you might say this is acceptable),
but rather just a query inside a plpgsql function, maybe even called
from some app using connection pooling. This means that the non-index
using plan might get used for a quite long time, which contradics the
work Tom did on plan invalidation I think.

Maybe Tom can comment on wheter it's possible to use plan invalidation
to eventually get rid of a stale plan in this context?

greetings, Florian Pflug


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Pavan Deolasee


On 3/29/07, Florian G. Pflug [EMAIL PROTECTED] wrote:



Yes, but the non-index plan PREPARE generated will be used until the end
of the session, nut only until the end of the transaction.



Frankly I don't know this works, but are you sure that the plan will
be used until the end of the session ? Even if thats the case, it can
happen even today if we create a new index, but the existing sessions
will use the stale plan (assuming what you said is true)

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Florian G. Pflug


Pavan Deolasee wrote:

On 3/29/07, Florian G. Pflug [EMAIL PROTECTED] wrote:


Yes, but the non-index plan PREPARE generated will be used until the end
of the session, nut only until the end of the transaction.


Frankly I don't know this works, but are you sure that the plan will
be used until the end of the session ? Even if thats the case, it can
happen even today if we create a new index, but the existing sessions
will use the stale plan (assuming what you said is true)


I've checked that:

test=# prepare myplan as select * from test where id=1 ;
PREPARE

test=# explain execute myplan ;
 QUERY PLAN

 Seq Scan on test  (cost=0.00..22897.70 rows=5421 width=36)
   Filter: (id = 1)
(2 rows)

 Now I create an index in another session 

test=# explain select * from test where id=1 ;
  QUERY PLAN
--
 Bitmap Heap Scan on test  (cost=95.11..8248.45 rows=5000 width=36)
   Recheck Cond: (id = 1)
   -  Bitmap Index Scan on idx  (cost=0.00..93.86 rows=5000 width=0)
 Index Cond: (id = 1)
(4 rows)

test=# explain execute myplan ; 
  QUERY PLAN


 Seq Scan on test  (cost=0.00..22897.70 rows=5421 width=36)
   Filter: (id = 1)
(2 rows)

!!! Index got used by the select ..  but not by execute myplan ... !!!

test=# prepare myplan2 as select * from test where id=1 ;
PREPARE
test=# explain execute myplan2 ;
   QUERY PLAN
-
 Index Scan using idx on test  (cost=0.00..8.38 rows=1 width=37)
   Index Cond: (id = 1)
(2 rows)

!!! A newly prepared plan of course uses the index !!!



So yes, plans get cached until the end of the session, and
yes, 8.2 won't notice index creation either ;-)

The open question is how CVS HEAD with plan invalidation behaves.
If it replans after the index-creating transaction commits, then
basing index validity on a snapshot will break this, because upon
replay they index might not be useable, but later on it may very
well be (but that plan invalidation machinery won't realize that)

So this might not introduce a regression compared to 8.2, but to
a future 8.3 with plan invalidation...

Sorry for being so unclear in my previous emails - I had confused
myself ;-)

greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Tom Lane

Florian G. Pflug [EMAIL PROTECTED] writes:
 Pavan Deolasee wrote:
 Frankly I don't know this works, but are you sure that the plan will
 be used until the end of the session ? Even if thats the case, it can
 happen even today if we create a new index, but the existing sessions
 will use the stale plan (assuming what you said is true)

 I've checked that:

Evidently you weren't testing on HEAD.

 The open question is how CVS HEAD with plan invalidation behaves.
 If it replans after the index-creating transaction commits, then
 basing index validity on a snapshot will break this, because upon
 replay they index might not be useable, but later on it may very
 well be (but that plan invalidation machinery won't realize that)

It will replan at the first use of the plan after seeing the relcache
inval sent by commit of the index-creating transaction.  If you have
two separate transactions to create an index and then mark it valid
later, everything's fine because there are two inval events.
However, if you design something where an index becomes usable due
to the passage of time rather than an explicit mark-valid step,
there's gonna be a problem.  I'd suggest trying to stick to the
way CREATE INDEX CONCURRENTLY does it...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Pavan Deolasee


On 3/29/07, Tom Lane [EMAIL PROTECTED] wrote:



It will replan at the first use of the plan after seeing the relcache
inval sent by commit of the index-creating transaction.  If you have
two separate transactions to create an index and then mark it valid
later, everything's fine because there are two inval events.
However, if you design something where an index becomes usable due
to the passage of time rather than an explicit mark-valid step,
there's gonna be a problem.  I'd suggest trying to stick to the
way CREATE INDEX CONCURRENTLY does it...



I had earlier proposed to do things CIC way. But there were objections
to the additional wait introduced in CREATE INDEX, and I don't
think they were unreasonable. May be if we can avoid waits if there
are no HOT-chains in the table, but still we need agreement on that.

OTOH ISTM that the pg_index:xcreate solution may work fine if
we can keep index unusable to those transactions which started
before CREATE INDEX could commit. I coded a quick prototype
where I use ActiveSnapshot in get_relation_info() to test if the
CREATE INDEX transaction is seen as in-progress to the
transaction. If seen as in-progress, the index is not used (even
though the CREATE INDEX is committed and hence can be seen
by SnapshotNow).

If an index which was earlier seen unusable is marked as valid as
time progresses, could there be some trouble ? I mean, as long as
we don't make the index usable before all tuples which are not
indexed are DEAD, we should be fine.

Is there something I am missing ? Would it help to explain the idea
if I post the patch ?

CREATE INDEX and CREATE INDEX CONCURRENTLY turned
out to be much more difficult than I imagined earlier. While we are
discussing CREATE INDEX, I would post a design for CIC. I
restrained myself till now to avoid confusion, but with time running
out, it would be helpful to get agreement so that we can finish
the patch on time.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Gregory Stark


Tom Lane [EMAIL PROTECTED] writes:

 However, if you design something where an index becomes usable due
 to the passage of time rather than an explicit mark-valid step,
 there's gonna be a problem.  I'd suggest trying to stick to the
 way CREATE INDEX CONCURRENTLY does it...

I'm a bit skeptical about the idea of CREATE INDEX (ie, non-concurrent)
creating an index that won't be used for a while. We get enough people asking
why Postgres isn't using an index as it is... Besides, it seems if people are
happy to have indexes take a long time to build they could just do a
concurrent build. The reason they do non-concurrent builds is precisely
because they're willing to take an exclusive lock in order to have them
complete as soon as possible.

Earlier we were talking about not inserting any HOT tuples until the index
became valid. The goal of having an xid on the index was so we would know when
we could start doing HOT updates again. That seems like a much lesser cost
than not being able to use the index until all live transactions exit.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Pavan Deolasee


On 3/29/07, Gregory Stark [EMAIL PROTECTED] wrote:


Besides, it seems if people are
happy to have indexes take a long time to build they could just do a
concurrent build.



I think we discussed this earlier. One of the down-side of CIC is that
it needs two complete heap scans. Apart from that CIC itself needs
to wait for all existing transactions to finish and more than one
instance of CIC can not be run on a table.



Earlier we were talking about not inserting any HOT tuples until the index
became valid. The goal of having an xid on the index was so we would know
when
we could start doing HOT updates again. That seems like a much lesser cost
than not being able to use the index until all live transactions exit.



What I am proposing is to keep index unusable for existing transactions.
The index is available for all new transactions even if there are unfinished
existing transactions. Is that a big problem ? Well, I still need buy-in and
review from Tom and others on the design, but it seems workable to me.


Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Bruce Momjian

Pavan Deolasee wrote:
  Earlier we were talking about not inserting any HOT tuples until the index
  became valid. The goal of having an xid on the index was so we would know
  when
  we could start doing HOT updates again. That seems like a much lesser cost
  than not being able to use the index until all live transactions exit.
 
 
 What I am proposing is to keep index unusable for existing transactions.
 The index is available for all new transactions even if there are unfinished
 existing transactions. Is that a big problem ? Well, I still need buy-in and
 review from Tom and others on the design, but it seems workable to me.

Yes, that seems totally acceptable to me.  As I remember, the index is
usable by the transaction that created it, and new transactions.  Hard
to see how someone would have a problem with that.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Gregory Stark

Pavan Deolasee [EMAIL PROTECTED] writes:

 What I am proposing is to keep index unusable for existing transactions.
 The index is available for all new transactions even if there are unfinished
 existing transactions. 

Ah thanks, that makes a lot more sense. Sorry for the false alarm.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Simon Riggs

On Thu, 2007-03-29 at 22:08 +0530, Pavan Deolasee wrote:
 
 
 On 3/29/07, Tom Lane [EMAIL PROTECTED] wrote:
 
 It will replan at the first use of the plan after seeing the
 relcache
 inval sent by commit of the index-creating transaction.  If
 you have
 two separate transactions to create an index and then mark it
 valid
 later, everything's fine because there are two inval events. 
 However, if you design something where an index becomes usable
 due
 to the passage of time rather than an explicit mark-valid
 step,
 there's gonna be a problem.  I'd suggest trying to stick to
 the
 way CREATE INDEX CONCURRENTLY does it... 
 
 
 I had earlier proposed to do things CIC way. But there were objections
 to the additional wait introduced in CREATE INDEX, and I don't
 think they were unreasonable. May be if we can avoid waits if there 
 are no HOT-chains in the table, but still we need agreement on that.
 
 OTOH ISTM that the pg_index:xcreate solution may work fine if
 we can keep index unusable to those transactions which started
 before CREATE INDEX could commit. 

Pavan, ISTM you have misunderstood Tom slightly.

Having the index invisible to all current transactions is acceptable.

However, the other backends will not receive an invalidation event,
which means even when they start new transactions they will still not
see it, which is not acceptable.

ISTM that the run-another-transaction-afterwards idea is the only one
that does everything I think we need. I really do wish we could put in a
wait, like CIC, but I just think it will break existing programs.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Simon Riggs

On Thu, 2007-03-29 at 13:55 -0400, Bruce Momjian wrote:
 Pavan Deolasee wrote:
   Earlier we were talking about not inserting any HOT tuples until the index
   became valid. The goal of having an xid on the index was so we would know
   when
   we could start doing HOT updates again. That seems like a much lesser cost
   than not being able to use the index until all live transactions exit.
  
  
  What I am proposing is to keep index unusable for existing transactions.
  The index is available for all new transactions even if there are unfinished
  existing transactions. Is that a big problem ? Well, I still need buy-in and
  review from Tom and others on the design, but it seems workable to me.
 
 Yes, that seems totally acceptable to me.  As I remember, the index is
 usable by the transaction that created it, and new transactions.  Hard
 to see how someone would have a problem with that.

Agreed.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 ISTM that the run-another-transaction-afterwards idea is the only one
 that does everything I think we need. I really do wish we could put in a
 wait, like CIC, but I just think it will break existing programs.

Actually, there's a showstopper objection to that: plain CREATE INDEX
has to be able to run within a larger transaction.  (To do otherwise
breaks pg_dump --single-transaction, just for starters.)  This means
it can *not* commit partway through.

Back to the drawing board :-(

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Florian G. Pflug


Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

ISTM that the run-another-transaction-afterwards idea is the only one
that does everything I think we need. I really do wish we could put in a
wait, like CIC, but I just think it will break existing programs.


Actually, there's a showstopper objection to that: plain CREATE INDEX
has to be able to run within a larger transaction.  (To do otherwise
breaks pg_dump --single-transaction, just for starters.)  This means
it can *not* commit partway through.


I believe the original idea was to invent some kind of on commit run
this transaction hook - similar to how files are deleted on commit,
I think. At least I understood the Run another transaction on commit
that way...

greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Simon Riggs

On Thu, 2007-03-29 at 17:27 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  ISTM that the run-another-transaction-afterwards idea is the only one
  that does everything I think we need. I really do wish we could put in a
  wait, like CIC, but I just think it will break existing programs.
 
 Actually, there's a showstopper objection to that: plain CREATE INDEX
 has to be able to run within a larger transaction.  (To do otherwise
 breaks pg_dump --single-transaction, just for starters.)  This means
 it can *not* commit partway through.

I agree with most of that, but thats why we-are-where-we-are and I don't
think its a showstopper at all.

The idea is to make note that the transaction has created an index
within a transaction block, so that after the top level transaction
commits we sneak in an extra hidden transaction to update the pg_index
tuple with the xcreate of the first transaction. 

We don't do this after the CREATE INDEX statement ends, only at the end
of the transaction in which it ran. We only do this if we are creating
an index on a table that is not a temporary table and was not created
during the transaction (so --single-transaction isn't broken and doesn't
require this additional action).

i.e. MyTransactionCreatedVisibleIndex, with special processing in
xact.c.

The only other alternative is to forcibly throw a relcache inval event
in the same circumstances without running the additional transaction,
but the solution is mostly the same.

I agree this is weird, but no more weird a solution as CIC was when that
first came out. I don't like it, or think its clever; I just think its
the only thing on the table. 

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Florian G. Pflug


Simon Riggs wrote:

On Thu, 2007-03-29 at 17:27 -0400, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

ISTM that the run-another-transaction-afterwards idea is the only one
that does everything I think we need. I really do wish we could put in a
wait, like CIC, but I just think it will break existing programs.

Actually, there's a showstopper objection to that: plain CREATE INDEX
has to be able to run within a larger transaction.  (To do otherwise
breaks pg_dump --single-transaction, just for starters.)  This means
it can *not* commit partway through.


The idea is to make note that the transaction has created an index
within a transaction block, so that after the top level transaction
commits we sneak in an extra hidden transaction to update the pg_index
tuple with the xcreate of the first transaction. 


The only other alternative is to forcibly throw a relcache inval event
in the same circumstances without running the additional transaction,
but the solution is mostly the same.


I think one alternative might be to store a list of xid's together with
a cached plan, and replan if the commit status (as percieved by the
transaction the plan will be executed in) of one of those xid's changes.

greetings, Florian Pflug


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-29 Thread Pavan Deolasee


On 3/30/07, Simon Riggs [EMAIL PROTECTED] wrote:



Pavan, ISTM you have misunderstood Tom slightly.



Oh, yes. Now that I re-read Tom's comment, his plan invalidation
design and code, I understand things better.



Having the index invisible to all current transactions is acceptable.



Ok.



However, the other backends will not receive an invalidation event,
which means even when they start new transactions they will still not
see it, which is not acceptable.



Agree.


ISTM that the run-another-transaction-afterwards idea is the only one

that does everything I think we need. I really do wish we could put in a
wait, like CIC, but I just think it will break existing programs.



ISTM that the run-another-transaction-afterwards idea would have same
problem with plan invalidation. When the second transaction commits,
the relcache invalidation event is generated. The event may get consumed
by other backends, but the index may still not be usable to them because
their xid  xcreat. If no more relcache invalidation events are generated
after that, the backends would continue to use the cached plan, even
if index becomes usable to them later.

How about storing the snapshot which we used during planning in
CachedPlanSource, if at least one index was seen unusable because
its CREATE INDEX transaction was seen as in-progress ? In
RevalidateCachedPlan(), we check if snapshot is set in
CachedPlanSource and  check if we are now using a different snapshot.
This triggers plan invalidation and re-planning. This would also help us
to use index early in read-committed transactions even if the transaction
was started before CREATE INDEX committed.

Does this sound good ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Pavan Deolasee


On 3/23/07, Pavan Deolasee [EMAIL PROTECTED] wrote:




Its slightly different for the HOT-chains created by this transaction
which
is creating the index. We should index the latest version of the row which
is not yet committed. But thats ok because when CREATE INDEX commits
this latest version would also get committed.


Sounds like you'll need to store the Next TransactionId rather than the
 TransactionId of the CREATE INDEX.



Just when I thought we have nailed down CREATE INDEX, I realized
that there something more to worry. The problem is with the HOT-chains
created by our own transaction which is creating the index. We thought
it will be enough to index the tuple at the head-of-the-chain since that
would be the visible copy once the transaction commits. We thought
of keeping the index unavailable for queries in pre-existing transactions
by setting a new xid attribute in pg_index. The question is what value
to assign to xid. I though we would assign ReadNewTransactionId().

Now, a new transaction can start before we commit and hence have
transaction_id  xid. This transaction can still see the old tuple
(because the transaction creating the index is not yet committed)
which we did not index while creating the index. Once the transaction
creating the index commits, the index is also available to this new
transaction and we are in trouble at that point.

I think Tom had already seen this, but his comment got overlooked
in the flow discussion. If thats the case, I regret that.

Any idea how to handle this case ? One ugly hack I can think of
is to remember all those indexes created in the transaction for which
we had seen DELETE_IN_PROGRESS tuples while building
the index. At the commit time, we (somehow) stop new transactions
to start, note the next transaction id and set it in pg_index and commit
the transaction. New transactions are then enabled again.

I know it looks ugly. Is there something better that we can do ?

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Simon Riggs

On Wed, 2007-03-28 at 22:24 +0530, Pavan Deolasee wrote:

 Just when I thought we have nailed down CREATE INDEX, I realized 
 that there something more to worry. The problem is with the HOT-chains
 created by our own transaction which is creating the index. We thought
 it will be enough to index the tuple at the head-of-the-chain since
 that 
 would be the visible copy once the transaction commits. We thought
 of keeping the index unavailable for queries in pre-existing
 transactions
 by setting a new xid attribute in pg_index. The question is what
 value 
 to assign to xid. I though we would assign ReadNewTransactionId().

 Any idea how to handle this case ?

Set it at the end, not the beginning.

If you are indexing a table that hasn't just been created by you, set
the xcreate field on pg_index at the *end* of the build using
ReadNewTransactionId(). Any xid less than that sees the index as
invalid. If you created the table in this transaction (i.e.
createSubId != 0) then set xcreate to creating xid.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Florian G. Pflug


Simon Riggs wrote:

On Wed, 2007-03-28 at 22:24 +0530, Pavan Deolasee wrote:

Just when I thought we have nailed down CREATE INDEX, I realized 
that there something more to worry. The problem is with the HOT-chains

created by our own transaction which is creating the index. We thought
it will be enough to index the tuple at the head-of-the-chain since
that 
would be the visible copy once the transaction commits. We thought

of keeping the index unavailable for queries in pre-existing
transactions
by setting a new xid attribute in pg_index. The question is what
value 
to assign to xid. I though we would assign ReadNewTransactionId().



If you are indexing a table that hasn't just been created by you, set
the xcreate field on pg_index at the *end* of the build using
ReadNewTransactionId(). Any xid less than that sees the index as
invalid. If you created the table in this transaction (i.e.
createSubId != 0) then set xcreate to creating xid.


Couldn't you store the creating transaction's xid in pg_index, and
let other transaction check that against their snapshot like they
would for any tuple's xmin or xmax? (With one exception - the creating
transaction would consider indices it built itself invalid, which
is not how things usually work for xmin/xmax).

This would mean that any transaction that believes that the creating
transaction has committed also consideres the index to be valid.

greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Pavan Deolasee


On 3/28/07, Tom Lane [EMAIL PROTECTED] wrote:


Florian G. Pflug [EMAIL PROTECTED] writes:
 Couldn't you store the creating transaction's xid in pg_index, and
 let other transaction check that against their snapshot like they
 would for any tuple's xmin or xmax?

What snapshot?  I keep having to remind people that system catalog
operations are SnapshotNow.  In the particular context here, the
place where doing something else would break down is that the planner
has no idea when it makes a plan what snapshot(s) the plan might later
be used with.





Tom, please correct me if I am wrong. But ISTM that this idea might
work in this context. In get_relation_info(), we would check if xcreate
xid stored in pg_index for the index under consideration is seen
committed with respect to the snapshot at that point of time.
Even if the snapshot changes later and index becomes valid, we
might not replan and hence not use index. But that doesn't seem
like a big problem to me.

So in get_relation_info():

We get the transaction snapshot. If its a serializable transaction,
it can't change later. If its a read-commited transaction, we anyways
don't care because the transaction can only see the last committed
version of the tuple in the table and we have indexed that. And so
we are safe.

- if xcreate  snapshot-xmin, index is valid
- if xcreate  snapshot-xmax, index is invalid
- if snapshot-xmax  xcreate  snapshot-xmin, we check xids in the
snapshot
 to determine validity of the index.

I might have got something wrong here, but I hope I made the point.

Am I missing something here ?

Thanks,
Pavan



--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Tom Lane

Florian G. Pflug [EMAIL PROTECTED] writes:
 Couldn't you store the creating transaction's xid in pg_index, and
 let other transaction check that against their snapshot like they
 would for any tuple's xmin or xmax?

What snapshot?  I keep having to remind people that system catalog
operations are SnapshotNow.  In the particular context here, the
place where doing something else would break down is that the planner
has no idea when it makes a plan what snapshot(s) the plan might later
be used with.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Pavan Deolasee


On 3/28/07, Simon Riggs [EMAIL PROTECTED] wrote:




Set it at the end, not the beginning.




At the end of what ? It does not help to set it at the end of CREATE
INDEX because the transaction may not commit immediately. In
the meantime, many new transactions may start with
transaction id  xcreate. All these transactions can see the old
tuple (which we did not index) and can also see the index once
CREATE INDEX commits.



If you are indexing a table that hasn't just been created by you, set

the xcreate field on pg_index at the *end* of the build using
ReadNewTransactionId(). Any xid less than that sees the index as
invalid. If you created the table in this transaction (i.e.
createSubId != 0) then set xcreate to creating xid.





Why do we need to handle the case where  table is created in
the same transaction ? Neither the table nor the index is visible
until we commit. So thats a simple case.


Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Florian G. Pflug


Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

Couldn't you store the creating transaction's xid in pg_index, and
let other transaction check that against their snapshot like they
would for any tuple's xmin or xmax?


What snapshot?  I keep having to remind people that system catalog
operations are SnapshotNow.  In the particular context here, the
place where doing something else would break down is that the planner
has no idea when it makes a plan what snapshot(s) the plan might later
be used with.


Sorry - now that you say it, I remember that you've said that already
multiple times...

So the question is, why did this work until now, and CREATE INDEX+HOT
just doesn't seem to fit into this scheme?

I think the answer is that  all other DDL statements manage to assure
that any database objects they create or modify are usable for everybody
else immediatly after they are committed. This usually implies pretty
strong locking requirements - for example, I think that the core reason
why TRUNCATE needs an exclusive lock is precisely that guarantee it has
to make.

Maybe this could somehow be relaxed? Could, for example, the planner
be allowed to base some of it's decisions on the SerializableSnapshot
the every transaction (even read-only ones) posseses? It seems that
this would prevent plans from living longer than a transaction,
but maybe plan invalidation could help here?

greetings, Florian Pflug


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Simon Riggs

On Wed, 2007-03-28 at 23:42 +0530, Pavan Deolasee wrote:
 
 
 On 3/28/07, Simon Riggs [EMAIL PROTECTED] wrote:
 
 
 Set it at the end, not the beginning.
 
 
 At the end of what ? It does not help to set it at the end of CREATE
 INDEX because the transaction may not commit immediately. In
 the meantime, many new transactions may start with 
 transaction id  xcreate. All these transactions can see the old
 tuple (which we did not index) and can also see the index once
 CREATE INDEX commits.

AtEOX_Reincarnate()... :-)

Set xcreate to InvalidTransactionId when we build the index

If we created an index in this transaction, as soon as we commit the top
level transaction, run another top level transaction to set xcreate
using ReadNewTransactionId().

During WAL replay, we remember any index creations and reset xcreate if
we were unlucky enough to crash between the two transactions.

(I'll be offline now for a few hours until the flames subside.)

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Florian G. Pflug


Pavan Deolasee wrote:

On 3/28/07, Tom Lane [EMAIL PROTECTED] wrote:


Florian G. Pflug [EMAIL PROTECTED] writes:
 Couldn't you store the creating transaction's xid in pg_index, and
 let other transaction check that against their snapshot like they
 would for any tuple's xmin or xmax?

What snapshot?  I keep having to remind people that system catalog
operations are SnapshotNow.  In the particular context here, the
place where doing something else would break down is that the planner
has no idea when it makes a plan what snapshot(s) the plan might later
be used with.


Tom, please correct me if I am wrong. But ISTM that this idea might
work in this context. In get_relation_info(), we would check if xcreate
xid stored in pg_index for the index under consideration is seen
committed with respect to the snapshot at that point of time.
Even if the snapshot changes later and index becomes valid, we
might not replan and hence not use index. But that doesn't seem
like a big problem to me.


That problem are usecases like
PREPARE my_plan  ;
BEGIN;
EXECUTE my_plan  ;
COMMIT ;

Is that PREPARE even run inside a transaction? Even if it is, it
probably won't have created a snapshot...

I think allowing the use of some sort of snapshot from inside the planner
would allow some locking to be relaxed, but there seems be a lot of
corner cases to consider :-(

OTOH, if you manage to make this work, a TRUNCATE that doesn't block
concurrent selects might become possible to do. This would for example
allow dropping and rebuilding subscriptions on a slony node while it
is in use.

greetings, Florian Pflug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-28 Thread Pavan Deolasee


On 3/29/07, Florian G. Pflug [EMAIL PROTECTED] wrote:


Pavan Deolasee wrote:

 Tom, please correct me if I am wrong. But ISTM that this idea might
 work in this context. In get_relation_info(), we would check if
xcreate
 xid stored in pg_index for the index under consideration is seen
 committed with respect to the snapshot at that point of time.
 Even if the snapshot changes later and index becomes valid, we
 might not replan and hence not use index. But that doesn't seem
 like a big problem to me.

That problem are usecases like
PREPARE my_plan  ;
BEGIN;
EXECUTE my_plan  ;
COMMIT ;




Oh, I see.


Is that PREPARE even run inside a transaction? Even if it is, it

probably won't have created a snapshot...




In this specific context, this particular case is easy to handle because
we are only concerned about the serializable transactions started before
CREATE INDEX commits. If PREPARE can see the new index, it
implies that the CI transaction is committed. So the transaction
starting after than can only see the tuple version that we have indexed.


But I don't know if this is the only case or there are more cases to
consider :-(


Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Hannu Krosing

Ühel kenal päeval, N, 2007-03-22 kell 23:30, kirjutas Pavan Deolasee:
 
 
 On 3/22/07, Tom Lane [EMAIL PROTECTED] wrote:
 Pavan Deolasee [EMAIL PROTECTED] writes:
  When CREATE INDEX starts, it acquires ShareLock on the
 table.
  At this point we may have one or more HOT-update chains in
 the 
  table. Tuples in this chain may be visible to one or more
 running
  transactions. The fact that we have ShareLock on the table
 means
  that all tuples in the chain except the one at the head
 either
  RECENTLY_DEAD or were UPDATEd by the same transaction
  that is now running CREATE INDEX.
 
 This logic seems pretty questionable to me in view of the
 recent VACUUM
 FULL bug fixes.  We now know that an update chain can contain
 tuples that 
 appear DEAD but are later than ones that are
 RECENTLY_DEAD.  How are
 you defining a HOT chain exactly --- will it be just a part of
 an
 update chain that consists only of contiguous tuples that
 appear live or
 RECENTLY_DEAD to the CREATE INDEX transaction?
 
 
 HOT-chain is something which contains contiguous tuples that share
 the same index keys for all the existing indexes and share a single
 index entry. It does not really matter if there are RECENTLY_DEAD 
 tuples that appear before a DEAD tuple because from VACUUM FULL
 bug fix we know that they are DEAD too.
 
 My argument is that its enough to index only the  LIVE tuple which
 is at the end of the chain if we don't use the new index for queries 
 in transactions which were started before CREATE INDEX.

You mean, which were started before CREATE INDEX completes ?

Just wait for all concurrent transactions to complete before marking the
index as usable in plans.

  I am
 proposing to do that by storing an xid in the pg_index row. 

I don't think it is a good idea to store xid's anywhere but in xmin/xmax
columns, as doing so would cause nasty xid wraparound problems.

Instead you should wait, after completeing the index , for all
concurrent transactions to end before you mark the index as usable for
queries, similar to the way CREATE INDEX CONCURRENTLY does.

 A
 special case is where a tuple is UPDATEd multiple times by
 the same transaction which is also creating the index, in which case 
 there are more than one LIVE versions of the tuple. But again
 we are safe by indexing only the latest version because all other
 versions would be invisible (even to us) once CREATE INDEX commits.
 
 
  In fact, the serializable transactions started before CREATE
 INDEX
  can not anyway see the index so all this is done to handle
  read-committed transactions.
 
 You are laboring under an illusion that system catalog
 accesses are MVCC. 
 SnapshotNow does not behave that way: the system can see the
 new index
 as soon as it's committed.  (It had better, since it has to
 start
 updating the index immediately, whether it's safe to scan it
 or not.) 
 I'm not sure whether that's fundamental to your argument or
 not, but
 it's certainly wrong.
 
 
 Oh, thanks for pointing that out. But thats certainly not fundamental
 to the argument as you probably already guessed. The xid still
 controls the usage of index for query planning, somewhat similar to
 isindvalid flag for CREATE INDEX CONCURRENTLY.

Xids are unstable and will come back to bite you after 2G transactions.

Why not just use the isindvalid flag ?


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Hannu Krosing

Ühel kenal päeval, N, 2007-03-22 kell 07:09, kirjutas Andrew Dunstan:
 Pavan Deolasee wrote:
  What I am hearing from many users is that its probably not such
  a nice thing to put such restriction. Thats fair. It really helps to think
  about a solution once you know what is acceptable and what is not.
 
 
 That's likely to be the reaction for almost any restriction you can 
 imagine. Performance improvements are great, but you can't ask people 
 for whom current performance is adequate to pay a price in functionality 
 for them.

An easy solution would be to not enable HOT by default, so people who
dont want to pay the price of no in-transaction CREATE INDEX for HOT
improvements don't have to.

 cheers
 
 andrew
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Hannu Krosing

Ühel kenal päeval, K, 2007-03-21 kell 14:06, kirjutas Merlin Moncure:
 On 3/21/07, Florian G. Pflug [EMAIL PROTECTED] wrote:
  Pavan Deolasee wrote:
   On 3/21/07, Merlin Moncure [EMAIL PROTECTED] wrote:
  
   On 3/21/07, Pavan Deolasee [EMAIL PROTECTED] wrote:
It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?
  
   yikes -- this is huge, huge price to pay, IMHO.  Think about DDL that
   implies index creation such as adding unique constraint to a table,
   many postgresql users (including me) take advantage of that in update
   systems to production databases.
  
   I didn't understand that quite well. How does it help to run CREATE
   INDEX inside a transaction ?
  The problem is not so much CREATE INDEX per se, but other DDL commands
  that implicitly create an index, for example adding a PRIMARY KEY to a
  table.

What I would really like is an official way to promote an UNIQUE index
to Primary Key, so that PK could also be added without locking the table
for long periods.

  Some applications that I have written would fail badly if CREATE INDEX
  was disallowed inside a transaction - mostly, because they use plpgsql
  functions to manipulate database objects, and disallowing CREATE INDEX
  inside a transaction presumably means disallowing it from inside
  stored procedures.

we don't have stored procedures :( only functions.

Maybe your problem will be fixed once we get proper stored procedures,
which can begin and commit/rollback transactions inside their code.

 speaking with pavan off list he seems to think that only 'create
 index' is outside transaction, not the other ddl flavors of it because
 they are generally acquiring a excl lock.  so, in that sense it is
 possibly acceptable to me although still a pretty tough pill to
 swallow (thinking, guc time).  It would also preclude ever integrating
 vanilla 'create index' to create table command, fwiw.

I don't think that any of the HOT restrictions apply when creating an
index on an empty table.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Pavan Deolasee


On 3/23/07, Hannu Krosing [EMAIL PROTECTED] wrote:




 My argument is that its enough to index only the  LIVE tuple which
 is at the end of the chain if we don't use the new index for queries
 in transactions which were started before CREATE INDEX.

You mean, which were started before CREATE INDEX completes ?

Just wait for all concurrent transactions to complete before marking the
index as usable in plans.




I thought there will be objections to that approach since it changes
the CREATE INDEX behavior and may also lead to new deadlock
conditions.


 I am
 proposing to do that by storing an xid in the pg_index row.

I don't think it is a good idea to store xid's anywhere but in xmin/xmax
columns, as doing so would cause nasty xid wraparound problems.




May be we can use VACUUM to cleanup the xid. If xid is less than
OldtestXmin, the index is visible to all the transactions and can be
marked as InvalidTransactionId.






 Oh, thanks for pointing that out. But thats certainly not fundamental
 to the argument as you probably already guessed. The xid still
 controls the usage of index for query planning, somewhat similar to
 isindvalid flag for CREATE INDEX CONCURRENTLY.

Xids are unstable and will come back to bite you after 2G transactions.

Why not just use the isindvalid flag ?



Who would set the flag to true ? Unless of course we are waiting in
CREATE INDEX. But that seems to be less acceptable to me.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Florian G. Pflug


Pavan Deolasee wrote:

There is a slight hole in that SERIALIZABLE transactions won't be able

to use any indexes they build during their transaction, since they may
need to be able to see prior data, but I don't think anybody is going to
complain about that restriction. Anyone?


Oh, I did not see that. If there are no HOT-chains in the table, we can
set the xid to InvalidTransactionId so that the index is usable immediately
after it is created in the current transaction, as well as those
transactions
which were started before CREATE INDEX. We can possibly further
improve it by checking if there are no HOT-chains except those created
by this transaction and set xid to InvalidTransactionId. IMO with that we
shall address most of the use cases. There are few which might
still get impacted, but even for them there won't be any correctness
problem.


Why exactly can't a SERIALIZABLE transaction use the index it created
itself? If you add a pointer to the root of all HOT update chains where
either the HEAD is alive, or some tuple is visible to the transaction
creating the index, shouldn't this be sufficient for using the index
in the creating transaction?

greetings, Florian Pflug


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Pavan Deolasee


On 3/23/07, Florian G. Pflug [EMAIL PROTECTED] wrote:



Why exactly can't a SERIALIZABLE transaction use the index it created
itself? If you add a pointer to the root of all HOT update chains where
either the HEAD is alive, or some tuple is visible to the transaction
creating the index, shouldn't this be sufficient for using the index
in the creating transaction?




Tuples in the HOT-chain may not share the same index keys with
respect to the new index being built (they share the same keys for
all existing indexes though). So we don't know which index key
to use while building the index.


Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Florian G. Pflug


Pavan Deolasee wrote:

On 3/23/07, Florian G. Pflug [EMAIL PROTECTED] wrote:



Why exactly can't a SERIALIZABLE transaction use the index it created
itself? If you add a pointer to the root of all HOT update chains where
either the HEAD is alive, or some tuple is visible to the transaction
creating the index, shouldn't this be sufficient for using the index
in the creating transaction?


Tuples in the HOT-chain may not share the same index keys with
respect to the new index being built (they share the same keys for
all existing indexes though). So we don't know which index key
to use while building the index.


Ah, of course - thanks for pointing that out.

greetings, Florian Pflug


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Bruce Momjian

Hannu Krosing wrote:
 I don't think it is a good idea to store xid's anywhere but in xmin/xmax
 columns, as doing so would cause nasty xid wraparound problems.
 
 Instead you should wait, after completeing the index , for all
 concurrent transactions to end before you mark the index as usable for
 queries, similar to the way CREATE INDEX CONCURRENTLY does.

We already discussed having VACUUM FREEZE deal with the pg_index xid
column.  I don't see how having CREATE INDEX wait for all completed
transactions helps us from a usability perspective.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-23 Thread Bruce Momjian

Pavan Deolasee wrote:
  Xids are unstable and will come back to bite you after 2G transactions.
 
  Why not just use the isindvalid flag ?
 
 
 Who would set the flag to true ? Unless of course we are waiting in
 CREATE INDEX. But that seems to be less acceptable to me.

Agreed, and we have the problem of crash recovery if we do that.  The
xid seems to handle that automatically.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Csaba Nagy

 speaking with pavan off list he seems to think that only 'create
 index' is outside transaction, not the other ddl flavors of it because
 they are generally acquiring a excl lock.  so, in that sense it is
 possibly acceptable to me although still a pretty tough pill to
 swallow (thinking, guc time).  It would also preclude ever integrating
 vanilla 'create index' to create table command, fwiw.

Just to signal that it is in use: we did use create index in
transactions occasionally when we had to do DB schema upgrade on
production systems for application upgrades which span multiple versions
of our application (normally we upgrade versions one by one, but we have
some systems which are upgraded rarely). In these occasions it was
riskier than usually to run the cumulated upgrade scripts outside a
transaction block.

But that was mostly a convenience feature, we could always rearrange our
upgrade scripts to do all the rest first and then all the index creation
at the end if all the rest succeeded... but if implicit index creation
fails (e.g. when adding a new field to a table which happens also to be
a primary key) inside the transaction, that would hurt... mostly in more
work/more risks of extended downtime, but it will have a factor of
inconvenience.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Pavan Deolasee


On 3/22/07, Csaba Nagy [EMAIL PROTECTED] wrote:


 speaking with pavan off list he seems to think that only 'create
 index' is outside transaction, not the other ddl flavors of it because
 they are generally acquiring a excl lock.  so, in that sense it is
 possibly acceptable to me although still a pretty tough pill to
 swallow (thinking, guc time).  It would also preclude ever integrating
 vanilla 'create index' to create table command, fwiw.

Just to signal that it is in use: we did use create index in
transactions occasionally when we had to do DB schema upgrade on
production systems for application upgrades which span multiple versions
of our application (normally we upgrade versions one by one, but we have
some systems which are upgraded rarely). In these occasions it was
riskier than usually to run the cumulated upgrade scripts outside a
transaction block.

But that was mostly a convenience feature, we could always rearrange our
upgrade scripts to do all the rest first and then all the index creation
at the end if all the rest succeeded... but if implicit index creation
fails (e.g. when adding a new field to a table which happens also to be
a primary key) inside the transaction, that would hurt... mostly in more
work/more risks of extended downtime, but it will have a factor of
inconvenience.



What I am hearing from many users is that its probably not such
a nice thing to put such restriction. Thats fair. It really helps to think
about a solution once you know what is acceptable and what is not.

I am back to the drawing board.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Andrew Dunstan


Pavan Deolasee wrote:

What I am hearing from many users is that its probably not such
a nice thing to put such restriction. Thats fair. It really helps to think
about a solution once you know what is acceptable and what is not.



That's likely to be the reaction for almost any restriction you can 
imagine. Performance improvements are great, but you can't ask people 
for whom current performance is adequate to pay a price in functionality 
for them.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Merlin Moncure


On 3/22/07, Pavan Deolasee [EMAIL PROTECTED] wrote:

On 3/22/07, Csaba Nagy [EMAIL PROTECTED] wrote:
  speaking with pavan off list he seems to think that only 'create
  index' is outside transaction, not the other ddl flavors of it because
  they are generally acquiring a excl lock.  so, in that sense it is
  possibly acceptable to me although still a pretty tough pill to
  swallow (thinking, guc time).  It would also preclude ever integrating
  vanilla 'create index' to create table command, fwiw.

 Just to signal that it is in use: we did use create index in
 transactions occasionally when we had to do DB schema upgrade on
 production systems for application upgrades which span multiple versions
 of our application (normally we upgrade versions one by one, but we have
 some systems which are upgraded rarely). In these occasions it was
 riskier than usually to run the cumulated upgrade scripts outside a
 transaction block.

 But that was mostly a convenience feature, we could always rearrange our
 upgrade scripts to do all the rest first and then all the index creation
 at the end if all the rest succeeded... but if implicit index creation
 fails (e.g. when adding a new field to a table which happens also to be
 a primary key) inside the transaction, that would hurt... mostly in more
 work/more risks of extended downtime, but it will have a factor of
 inconvenience.



What I am hearing from many users is that its probably not such
a nice thing to put such restriction. Thats fair. It really helps to think
about a solution once you know what is acceptable and what is not.

I am back to the drawing board.


The objections to 'create index' forced to being non-transactional
come from what I can see two general cases:
* update scripts
* functions that create tables, etc

ISTM that in both cases nobody would complain too much if the lock was
escalated to exclusive lock in those cases.  So, maybe an alternative
solution is this:

* Make create index/create index concurrently extra-transactional as
Pavan suggested.
* Introduce index creation ability to alter/create table.  If as Pavan
suggests this integrates well with HOT due to excl lock, alter table
can be left transactional.  Now, there is at least alternative path to
take in use cases previously covered by create index + transaction.
now, there is a clean break between classic DDL (alter table, etc) and
non-transactional maintenance commands, like vacuum which create index
becomes part of.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Pavan Deolasee


On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:




A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index.  The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains.  Then doing lookups using that index, the new index
does not follow HOT chains.  We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created.  Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.

A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it.  The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created.  Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index.  Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.




Bruce, thanks for bringing up this idea.

As I think more about this idea, I think I am able to extend this further
to solve the problems we discussed around it. One of my conerns
were that the change the basic structure of heap
with HOT-chains so that it should be possible to just look at the
heap tuple and say whether it has any index pointer or not.

The way I propose to extend/modify the idea is to use pg_index xid
as suggested by Bruce to mark the index. This xid would guide the
visibility of the index. As we all know, CREATE INDEX locks out
UPDATEs on the table and further UPDATEs are possible only after
the transaction creating the new index commits.

When CREATE INDEX starts, it acquires ShareLock on the table.
At this point we may have one or more HOT-update chains in the
table. Tuples in this chain may be visible to one or more running
transactions. The fact that we have ShareLock on the table means
that all tuples in the chain except the one at the head either
RECENTLY_DEAD or were UPDATEd by the same transaction
that is now running CREATE INDEX.

With this background, I propose to index ONLY the head of the
HOT-chain. The TID of the root tuple is used instead of the actual
TID of the tuple being indexed. This index will not be available to
the transactions which are started before the CREATE INDEX
transaction. Just like we use indisvalid flag to avoid including
an invalid index in the plan, we use the pg_index xid to decide
whether to use the index in the plan or not. Only transactions with
txid  pg_index:xid can see the index and use it.

In fact, the serializable transactions started before CREATE INDEX
can not anyway see the index so all this is done to handle
read-committed transactions.

In this proposal we indexed only the latest version. But none of the
transactions started after CREATE INDEX can anyway see the
older tuples and hence we should be fine even if we don't index
them in the new index. And none of the older transaction can see
the index, so again we are safe. The design also helps us to
preserve the heap HOT semantics and chain pruning and does not
need VACUUM or any special handling.

Can anyone spot a hole in this logic ? Comments ?

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Tom Lane

Pavan Deolasee [EMAIL PROTECTED] writes:
 When CREATE INDEX starts, it acquires ShareLock on the table.
 At this point we may have one or more HOT-update chains in the
 table. Tuples in this chain may be visible to one or more running
 transactions. The fact that we have ShareLock on the table means
 that all tuples in the chain except the one at the head either
 RECENTLY_DEAD or were UPDATEd by the same transaction
 that is now running CREATE INDEX.

This logic seems pretty questionable to me in view of the recent VACUUM
FULL bug fixes.  We now know that an update chain can contain tuples that
appear DEAD but are later than ones that are RECENTLY_DEAD.  How are
you defining a HOT chain exactly --- will it be just a part of an
update chain that consists only of contiguous tuples that appear live or
RECENTLY_DEAD to the CREATE INDEX transaction?

 In fact, the serializable transactions started before CREATE INDEX
 can not anyway see the index so all this is done to handle
 read-committed transactions.

You are laboring under an illusion that system catalog accesses are MVCC.
SnapshotNow does not behave that way: the system can see the new index
as soon as it's committed.  (It had better, since it has to start
updating the index immediately, whether it's safe to scan it or not.)
I'm not sure whether that's fundamental to your argument or not, but
it's certainly wrong.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Pavan Deolasee


On 3/22/07, Tom Lane [EMAIL PROTECTED] wrote:


Pavan Deolasee [EMAIL PROTECTED] writes:
 When CREATE INDEX starts, it acquires ShareLock on the table.
 At this point we may have one or more HOT-update chains in the
 table. Tuples in this chain may be visible to one or more running
 transactions. The fact that we have ShareLock on the table means
 that all tuples in the chain except the one at the head either
 RECENTLY_DEAD or were UPDATEd by the same transaction
 that is now running CREATE INDEX.

This logic seems pretty questionable to me in view of the recent VACUUM
FULL bug fixes.  We now know that an update chain can contain tuples that
appear DEAD but are later than ones that are RECENTLY_DEAD.  How are
you defining a HOT chain exactly --- will it be just a part of an
update chain that consists only of contiguous tuples that appear live or
RECENTLY_DEAD to the CREATE INDEX transaction?




HOT-chain is something which contains contiguous tuples that share
the same index keys for all the existing indexes and share a single
index entry. It does not really matter if there are RECENTLY_DEAD
tuples that appear before a DEAD tuple because from VACUUM FULL
bug fix we know that they are DEAD too.

My argument is that its enough to index only the  LIVE tuple which
is at the end of the chain if we don't use the new index for queries
in transactions which were started before CREATE INDEX. I am
proposing to do that by storing an xid in the pg_index row. A
special case is where a tuple is UPDATEd multiple times by
the same transaction which is also creating the index, in which case
there are more than one LIVE versions of the tuple. But again
we are safe by indexing only the latest version because all other
versions would be invisible (even to us) once CREATE INDEX commits.


In fact, the serializable transactions started before CREATE INDEX
 can not anyway see the index so all this is done to handle
 read-committed transactions.

You are laboring under an illusion that system catalog accesses are MVCC.
SnapshotNow does not behave that way: the system can see the new index
as soon as it's committed.  (It had better, since it has to start
updating the index immediately, whether it's safe to scan it or not.)
I'm not sure whether that's fundamental to your argument or not, but
it's certainly wrong.




Oh, thanks for pointing that out. But thats certainly not fundamental
to the argument as you probably already guessed. The xid still controls
the usage of index for query planning, somewhat similar to isindvalid
flag for CREATE INDEX CONCURRENTLY.


Thanks,
Pavan



--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Bruce Momjian

Pavan Deolasee wrote:
 My argument is that its enough to index only the  LIVE tuple which
 is at the end of the chain if we don't use the new index for queries
 in transactions which were started before CREATE INDEX. I am
 proposing to do that by storing an xid in the pg_index row. A
 special case is where a tuple is UPDATEd multiple times by
 the same transaction which is also creating the index, in which case
 there are more than one LIVE versions of the tuple. But again
 we are safe by indexing only the latest version because all other
 versions would be invisible (even to us) once CREATE INDEX commits.

What if CREATE INDEX is run in a SERIALIZABLE transaction?

  In fact, the serializable transactions started before CREATE INDEX
   can not anyway see the index so all this is done to handle
   read-committed transactions.
 
  You are laboring under an illusion that system catalog accesses are MVCC.
  SnapshotNow does not behave that way: the system can see the new index
  as soon as it's committed.  (It had better, since it has to start
  updating the index immediately, whether it's safe to scan it or not.)
  I'm not sure whether that's fundamental to your argument or not, but
  it's certainly wrong.
 
 
 
 Oh, thanks for pointing that out. But thats certainly not fundamental
 to the argument as you probably already guessed. The xid still controls
 the usage of index for query planning, somewhat similar to isindvalid
 flag for CREATE INDEX CONCURRENTLY.

I am glad you found the pg_index xid actually helps in other ways.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Simon Riggs

On Thu, 2007-03-22 at 22:11 +0530, Pavan Deolasee wrote:

 With this background, I propose to index ONLY the head of the
 HOT-chain. The TID of the root tuple is used instead of the actual
 TID of the tuple being indexed. This index will not be available to
 the transactions which are started before the CREATE INDEX 
 transaction. Just like we use indisvalid flag to avoid including
 an invalid index in the plan, we use the pg_index xid to decide
 whether to use the index in the plan or not. Only transactions with 
 txid  pg_index:xid can see the index and use it.
 
 In fact, the serializable transactions started before CREATE INDEX
 can not anyway see the index so all this is done to handle
 read-committed transactions. 
 
 In this proposal we indexed only the latest version. But none of the
 transactions started after CREATE INDEX can anyway see the
 older tuples and hence we should be fine even if we don't index
 them in the new index. And none of the older transaction can see 
 the index, so again we are safe. The design also helps us to
 preserve the heap HOT semantics and chain pruning and does not
 need VACUUM or any special handling.

Well, ISTM you've nailed it.

CREATE INDEX returns as soon as possible, but people will have to wait
for their next transaction before they can see it and use it too. Nice
role reversal to avoid having CREATE INDEX wait. No restrictions on the
number of indexes, no restrictions on multiple concurrent index builders
and we can do this in just one pass.

The ShareLock taken by CREATE INDEX guarantees all transactions that
wrote data to the table have completed and that no new data can be added
until after the index build commits. So the end of the chain is visible
to CREATE INDEX and won't change. As long as you index the latest
committed version on each HOT chain, then I think it works. Clearly want
to ignore aborted versions. Sounds like you'll need to read the HOT
chains in sequence to ensure we don't repeat the VACUUM FULL error. If
there are no HOT chains then it will be just a normal seq scan of each
block, so there's no real speed loss for situations where no HOT updates
have taken place, such as reload from pg_dump.

Sounds like you'll need to store the Next TransactionId rather than the
TransactionId of the CREATE INDEX. We don't need to store the ComboId as
well, since all commands are planned in ComboId sequence, assuming plan
invalidation blows away any earlier plans held by our own backend.

There is a slight hole in that SERIALIZABLE transactions won't be able
to use any indexes they build during their transaction, since they may
need to be able to see prior data, but I don't think anybody is going to
complain about that restriction. Anyone?

Maybe we can use this technique for CREATE INDEX CONCURRENTLY as well,
so that it doesn't have to wait either. That needs some careful
thinking... it may not work the same because of the locking differences.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 There is a slight hole in that SERIALIZABLE transactions won't be able
 to use any indexes they build during their transaction, since they may
 need to be able to see prior data, but I don't think anybody is going to
 complain about that restriction. Anyone?

Practically every statement I've seen in this thread that used the
phrase SERIALIZABLE transaction was wrong to some extent, and this
one is no different.

The issue is not whether the whole transaction is serializable or not,
it's how old is the oldest still-live snapshot, a thing that CREATE
INDEX can't tell with any certainty in READ COMMITTED mode.  So if your
solution involves any explicit dependence on the transaction
serializability mode, it's probably wrong.  I'm not totally sure if you
are expecting to be able to tell that, but I do know that the planner
has no idea what snapshots a plan it makes will be used with.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Simon Riggs

On Thu, 2007-03-22 at 16:16 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  There is a slight hole in that SERIALIZABLE transactions won't be able
  to use any indexes they build during their transaction, since they may
  need to be able to see prior data, but I don't think anybody is going to
  complain about that restriction. Anyone?
 
 Practically every statement I've seen in this thread that used the
 phrase SERIALIZABLE transaction was wrong to some extent, and this
 one is no different.
 
 The issue is not whether the whole transaction is serializable or not,
 it's how old is the oldest still-live snapshot, a thing that CREATE
 INDEX can't tell with any certainty in READ COMMITTED mode.  So if your
 solution involves any explicit dependence on the transaction
 serializability mode, it's probably wrong.  I'm not totally sure if you
 are expecting to be able to tell that, but I do know that the planner
 has no idea what snapshots a plan it makes will be used with.

Thanks for correcting me.

Reworded: There is a slight hole in that snapshots older than the CREATE
INDEX must never be allowed to use the index. That means that
SERIALIZABLE transactions and some other situations will need to be
restricted. Personally, I would argue that such a restriction was an
acceptable loss of functionality, since I can't think of a situation
where such a thing would need to occur, though one may turn up.

Currently, I don't know how to prevent this from happening. We'll need
to examine this in more detail to see if there is a way.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Russell Smith


Simon Riggs wrote:

On Thu, 2007-03-22 at 16:16 -0400, Tom Lane wrote:
  

Simon Riggs [EMAIL PROTECTED] writes:


There is a slight hole in that SERIALIZABLE transactions won't be able
to use any indexes they build during their transaction, since they may
need to be able to see prior data, but I don't think anybody is going to
complain about that restriction. Anyone?
  

Practically every statement I've seen in this thread that used the
phrase SERIALIZABLE transaction was wrong to some extent, and this
one is no different.

The issue is not whether the whole transaction is serializable or not,
it's how old is the oldest still-live snapshot, a thing that CREATE
INDEX can't tell with any certainty in READ COMMITTED mode.  So if your
solution involves any explicit dependence on the transaction
serializability mode, it's probably wrong.  I'm not totally sure if you
are expecting to be able to tell that, but I do know that the planner
has no idea what snapshots a plan it makes will be used with.



Thanks for correcting me.

Reworded: There is a slight hole in that snapshots older than the CREATE
INDEX must never be allowed to use the index. That means that
SERIALIZABLE transactions and some other situations will need to be
restricted. Personally, I would argue that such a restriction was an
acceptable loss of functionality, since I can't think of a situation
where such a thing would need to occur, though one may turn up.

Currently, I don't know how to prevent this from happening. We'll need
to examine this in more detail to see if there is a way.
  
I have seen and used transactions that create indexes inside a 
transaction, use them for the life of the transaction, and then drop 
them at the end.
I think this is an acceptable solution to not be able to use the index 
in the current transaction if the table you are building the index on is 
HOT enabled.  That way it is not really a functionality loss, it's just 
a restriction put in place if you are using a certain feature.  We do 
not want to go breaking existing code.


However HOT is enabled by default on tables, then we have a different 
situation.  And if the expectation is that HOT will be enabled by 
default in future releases, then this needs to be considered now.


Regards

Russell Smith


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Pavan Deolasee


On 3/23/07, Simon Riggs [EMAIL PROTECTED] wrote:




The ShareLock taken by CREATE INDEX guarantees all transactions that
wrote data to the table have completed and that no new data can be added
until after the index build commits. So the end of the chain is visible
to CREATE INDEX and won't change. As long as you index the latest
committed version on each HOT chain, then I think it works.




Its slightly different for the HOT-chains created by this transaction which
is creating the index. We should index the latest version of the row which
is not yet committed. But thats ok because when CREATE INDEX commits
this latest version would also get committed.


Sounds like you'll need to store the Next TransactionId rather than the

TransactionId of the CREATE INDEX.




Yes, I agree.


There is a slight hole in that SERIALIZABLE transactions won't be able

to use any indexes they build during their transaction, since they may
need to be able to see prior data, but I don't think anybody is going to
complain about that restriction. Anyone?




Oh, I did not see that. If there are no HOT-chains in the table, we can
set the xid to InvalidTransactionId so that the index is usable immediately
after it is created in the current transaction, as well as those
transactions
which were started before CREATE INDEX. We can possibly further
improve it by checking if there are no HOT-chains except those created
by this transaction and set xid to InvalidTransactionId. IMO with that we
shall address most of the use cases. There are few which might
still get impacted, but even for them there won't be any correctness
problem.


Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Bruce Momjian wrote:
 
 I have read the HOT discussion and wanted to give my input.  The major
 issue is that CREATE INDEX might require a HOT chain to be split apart
 if one of the new indexed columns changed in the HOT chain.

To expand a little more, the problem is that when you split those HOT
chains, you have to insert new entries into the _existing_ indexes,
causing problems for concurrent sequential scans.

I have a new idea.  There has been a lot of focus on trying to tag each
tuple to indicate whether it is part of a HOT chain for individual
indexes, the idea being that some indexes will follow the HOT chain, and
some will not.

A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index.  The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains.  Then doing lookups using that index, the new index
does not follow HOT chains.  We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created.  Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.

A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it.  The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created.  Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index.  Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.  

I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.

OK, so there is my first idea to do this cleanly.  Comments?

---




 As for the outline below, there is no way we are going to add new ALTER
 TABLE and CHILL commands to make this work.  Can you imagine what kind
 of warts we would have in the system if we added such things every time
 we got stuck?  And once the warts are in, it is hard to educate people
 once they are removed.  We need to keep going until we have a solution
 that is as user-invisible as possible.  While I understand the
 frustration that we have not discussed this enough, I don't want us
 rushing to a solution either until it has been totally thought through.
 
 HOT is not a feature only a few people are going to want to use ---
 everyone will want it, and if the user process is cumbersome, we will get
 never-ending questions about how to make it work.
 
 Let's all think about this for the next few days.
 
 ---
 
 Pavan Deolasee wrote:
  
  
  There are  few things I realized over the weekend while going
  through the code:
  
  1. It looks like a bad idea to use ALTER TABLE ..  to chill a table
  becuase ALTER TABLE takes AccessExclusive lock on the table.
  But it would still be a good idea to have ALTER TABLE .. to turn
  HOT-updates ON/OFF.
  
  2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
  its lock anyways and is prone to deadlock. So as long as we don't
  create new deadlock scenarios, we should be fine.
  
  3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
  So its should be acceptable if we run CHILL as a seperate transaction.
  
  4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
  thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
  VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
  race conditions between all of these.
  
  
  So here is my plan. Let me know your suggestions/comments/objections.
  
  
  Changes to pg_class and new DDLs:
  -
  
  We add two boolean attributes to pg_class: hot_update and hot_fetch.
  
  We introduce two DDLs to turn HOT on/off. Note that the DDL
  itself does not CHILL the table, but only affects the
  subsequent UPDATEs.
  
  postgres=# ALTER TABLE tblname ENABLE HOT;
  postgres=# ALTER TABLE tblname DISABLE HOT;
  
  These DDLs would acquire AccessExclusive lock on the table and
  set hot_update to true/false using simple_heap_update()
  
  
  CREATE INDEX [CONCURRENTLY]:
  
  
  If a HEAP_ONLY tuple is found, error out with a HINT to run
  CHILL on the table and then retry.
  
  If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
  the table.
  
  
  CHILL utility:
  --
  
  We introduce a new command to chill a table. The syntax for the
  same could be:
  
  postgres=# CHILL

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Heikki Linnakangas


Bruce Momjian wrote:

A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index.  The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains.  Then doing lookups using that index, the new index
does not follow HOT chains.  We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created.  Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.


When exactly would all HOT chains be dead? AFAICS, that would be after 
the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run 
to prune and pointer-swing all HOT chains.


Would we have to wait after setting the new forbid_hot_updates-flag in 
pg_class, to make sure everyone sees the change? What if CREATE INDEX 
crashes, would we need a vacuum to reset the flag?



A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it.  The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created.  Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index.  Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.  


What xid would you place in pg_index? Xid of the transaction running 
CREATE INDEX, ReadNewTransactionId() or what?


How does that work if you have a transaction that begins before CREATE 
INDEX, and updates something after CREATE INDEX?



I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.


I don't think you can freeze the xid-column, we went through a similar 
discussion on pg_class.relfrozenxid. But you can move it forward to 
oldest xmin.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Simon Riggs

On Wed, 2007-03-21 at 10:47 -0400, Bruce Momjian wrote:

 A different idea is to flag the _index_ as using HOT for the table or
 not, using a boolean in pg_index.  The idea is that when a new index is
 created, it has its HOT boolean set to false and indexes all tuples and
 ignores HOT chains.  Then doing lookups using that index, the new index
 does not follow HOT chains.  We also add a boolean to pg_class to
 indicate no new HOT chains should be created and set that to false once
 the new index is created.  Then, at some later time when all HOT chains
 are dead, we can enable HOT chain following for the new index and allow
 new HOT chains to be created.

enable HOT chain following would require us to hold an
AccessExclusiveLock on the index.

We know that solution exists, the question is: at what point would we
ever request that lock? Or would we just wait until that lock is next
taken before enabling it, giving the user no control over when its
taken? A separate DDL command would be effectively the same as what
Pavan has recently suggested.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Heikki Linnakangas

Sorry, I was a bit too quick to respond. I didn't understand at first 
how this differs from Pavan's/Simon's proposals.


Let me answer my own questions.

Heikki Linnakangas wrote:

Bruce Momjian wrote:

A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index.  The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains.  Then doing lookups using that index, the new index
does not follow HOT chains.  We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created.  Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.


When exactly would all HOT chains be dead? AFAICS, that would be after 
the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run 
to prune and pointer-swing all HOT chains.


I still think that's true.

Would we have to wait after setting the new forbid_hot_updates-flag in 
pg_class, to make sure everyone sees the change? What if CREATE INDEX 
crashes, would we need a vacuum to reset the flag?


You wouldn't need to do any extra waits to set the forbid_hot_updates 
flag, CREATE INDEX locks the table and already sends a relcache 
invalidations to make the new index visible. CREATE INDEX CONCURRENTLY 
waits already.



A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it.  The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created.  Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index.  Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.  


What xid would you place in pg_index? Xid of the transaction running 
CREATE INDEX, ReadNewTransactionId() or what?


Apparently ReadNewTransactionId to make sure there's no existing tuples 
with an xmax smaller than that.


How does that work if you have a transaction that begins before CREATE 
INDEX, and updates something after CREATE INDEX?


You actually explained that above...

The HOT_UPDATED flag on a tuple would basically mean that all indexes 
with xid  xmax doesn't contain an index pointer for the tuple, and all 
others do. When inserting new updated tuples, we'd also need to maintain 
that invariant.



I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.


I don't think you can freeze the xid-column, we went through a similar 
discussion on pg_class.relfrozenxid. But you can move it forward to 
oldest xmin.


You could actually freeze the column, because unlike relfrozenid we 
never need to unfreeze it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Gregory Stark


Bruce Momjian [EMAIL PROTECTED] writes:

 We also add a boolean to pg_class to indicate no new HOT chains should be
 created and set that to false once the new index is created.

Since we have all the index info in the relcache we could just skim through
all the indexes when we build the relcache and decide then whether we're
allowed to do HOT updates. That avoids problems if we crash while HOT updates
are disabled.

I think we need to think harder about exactly what test we would perform
against this xid to determine the two relevant tests, 

a) whether to prohibit HOT updates (because the index is too new)

b) whether to ignore HOT update chains when we use the index (because it's
   too new and any HOT update chains predate it).

I fear it may imply that we have to keep performing cold updates until the
first vacuum after the xid expires.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Heikki Linnakangas wrote:
 Bruce Momjian wrote:
  A different idea is to flag the _index_ as using HOT for the table or
  not, using a boolean in pg_index.  The idea is that when a new index is
  created, it has its HOT boolean set to false and indexes all tuples and
  ignores HOT chains.  Then doing lookups using that index, the new index
  does not follow HOT chains.  We also add a boolean to pg_class to
  indicate no new HOT chains should be created and set that to false once
  the new index is created.  Then, at some later time when all HOT chains
  are dead, we can enable HOT chain following for the new index and allow
  new HOT chains to be created.
 
 When exactly would all HOT chains be dead? AFAICS, that would be after 
 the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run 
 to prune and pointer-swing all HOT chains.

Yes, I think so.  We might need to have pg_autovacuum involved in
checking pg_index and doing a vacuum when needed, but that is only for
the boolean idea;  the xid idea doesn't need it, I think.

 Would we have to wait after setting the new forbid_hot_updates-flag in 
 pg_class, to make sure everyone sees the change? What if CREATE INDEX 
 crashes, would we need a vacuum to reset the flag?

Uh, I am thinking we would just set it at the start of CREATE INDEX, but
actually if we do the pg_index xid idea, we don't need to tell them
anything --- once they see the new index in pg_index, they will
automatically know whether to include the new index in the HOT tests
based on its own xid.  Right now, if a new index is created, existing
backends know they need to insert into it, so it would be the same
mechanism, except they have to check the pg_index xid field.

If we use the xid idea, and CREATE INDEX crashes, there is no problem
because all the backend just keep going and never see the new pg_index
xid entry.

  A more sophisticated idea would be to place an xid, rather than a
  boolean, in pg_index to indicate which chains were created after the
  index was created to control whether the index should follow that HOT
  chain, or ignore it.  The xmax of the head of the HOT chain can be used
  as an indicator of when the chain was created.  Transactions started
  before the pg_index xid could continue following the old rules and
  insert into the _new_ index for HOT chain additions, and new
  transactions would create HOT chains that could skip adding to the new
  index.  Cleanup of the hybrid HOT chains (some indexes take part, some
  do not) would be more complex.  
 
 What xid would you place in pg_index? Xid of the transaction running 
 CREATE INDEX, ReadNewTransactionId() or what?

I would say the current xid counter value at time of index creation
completion, so currently running transactions are unaffected, but newly
started transactions create HOT chains that include the new index, and
we just cleanup the old hot chains as the currently-running transactions
complete.

 How does that work if you have a transaction that begins before CREATE 
 INDEX, and updates something after CREATE INDEX?

OK, uses the old rules of always inserting into the new index, even if
it is creating a HOT chain.  The bottom line is that we would now have
chains with different indexes involved, and the HOT head xmax compared
to pg_index tells us which indexes are involved in the chain.

  I know we have xid wrap-around, but I think the VACUUM FREEZE could
  handle it by freezing the pg_index xid column value when it does the
  table.
 
 I don't think you can freeze the xid-column, we went through a similar 
 discussion on pg_class.relfrozenxid. But you can move it forward to 
 oldest xmin.

Sounds good.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Pavan Deolasee


On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:


Bruce Momjian wrote:

 I have read the HOT discussion and wanted to give my input.  The major
 issue is that CREATE INDEX might require a HOT chain to be split apart
 if one of the new indexed columns changed in the HOT chain.

To expand a little more, the problem is that when you split those HOT
chains, you have to insert new entries into the _existing_ indexes,
causing problems for concurrent sequential scans.



You mean index scans, right ? Sequential scans don't change with HOT.


I have a new idea.  There has been a lot of focus on trying to tag each

tuple to indicate whether it is part of a HOT chain for individual
indexes, the idea being that some indexes will follow the HOT chain, and
some will not.




In general I would like to preserve the HOT properties at the end of
creation of new index. All index should point to the same root tuple
if pointing to a HOT-chain. The things might become otherwise messy
with the line pointer redirection, reuse of LP_DELETEd tuples and
other things that we have put in.

Disabling HOT-updates using pg_class attribute has same issue
with my earlier proposal. What do we do if the backend crashes before
it can enable it again ? Who would reset the flag ? We could have lived
without DDL and CHILL command if we would have answers for
these questions.

A different idea here:

Would it be acceptable to force CREATE INDEX to be run outside
a transaction block just like CREATE INDEX CONCURRENTLY ? If thats
acceptable, we can do the following:

CREATE INDEX:

Since CREATE INDEX locks out table for UPDATEs, we just need
to wait enough before we start the heap scan so that when we do
heap scan, all HOT-chains can be pruned to a single tuple (with or
without the redirected line pointer). So when the new index is built,
we first prune the entire page of HOT-chains and insert the TID of
the root tuple into the new index. IOW we need to wait for all
transactions in the snapshot after acquiring ShareLock but before
we start heap scan.

CREATE INDEX CONCURRENTLY:

Simon has already posted a design that would work with CREATE
INDEX CONCURRENTLY. I think we need to tweak it a bit so
that we can handle the HOT-updated tuples after then first heap
scan, but before the index is visible to all transactions. Once the
new index is visible, the heap_update() code path would take care
of avoiding HOT-updates if the column on which new index is being
built is updated.

It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?

Comments ?

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Simon Riggs wrote:
 On Wed, 2007-03-21 at 10:47 -0400, Bruce Momjian wrote:
 
  A different idea is to flag the _index_ as using HOT for the table or
  not, using a boolean in pg_index.  The idea is that when a new index is
  created, it has its HOT boolean set to false and indexes all tuples and
  ignores HOT chains.  Then doing lookups using that index, the new index
  does not follow HOT chains.  We also add a boolean to pg_class to
  indicate no new HOT chains should be created and set that to false once
  the new index is created.  Then, at some later time when all HOT chains
  are dead, we can enable HOT chain following for the new index and allow
  new HOT chains to be created.
 
 enable HOT chain following would require us to hold an
 AccessExclusiveLock on the index.

OK.

 We know that solution exists, the question is: at what point would we
 ever request that lock? Or would we just wait until that lock is next
 taken before enabling it, giving the user no control over when its
 taken? A separate DDL command would be effectively the same as what
 Pavan has recently suggested.

We could just request a lock and if it fails, try again later;  it just
delays HOT updates, which is not a big problem.  Allowing the user to do
it via a separate command seems to add nothing.  The only thing I could
see would be adding an option to CREATE INDEX which waits and then does
it.  

However, given recent comments, I think the xid idea, while more
complicated, is better because it has fewer restrictions on when things
happen.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian


Ah, sounds like you have the idea clearly now.  Great.

---

Heikki Linnakangas wrote:
 Sorry, I was a bit too quick to respond. I didn't understand at first 
 how this differs from Pavan's/Simon's proposals.
 
 Let me answer my own questions.
 
 Heikki Linnakangas wrote:
  Bruce Momjian wrote:
  A different idea is to flag the _index_ as using HOT for the table or
  not, using a boolean in pg_index.  The idea is that when a new index is
  created, it has its HOT boolean set to false and indexes all tuples and
  ignores HOT chains.  Then doing lookups using that index, the new index
  does not follow HOT chains.  We also add a boolean to pg_class to
  indicate no new HOT chains should be created and set that to false once
  the new index is created.  Then, at some later time when all HOT chains
  are dead, we can enable HOT chain following for the new index and allow
  new HOT chains to be created.
  
  When exactly would all HOT chains be dead? AFAICS, that would be after 
  the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run 
  to prune and pointer-swing all HOT chains.
 
 I still think that's true.
 
  Would we have to wait after setting the new forbid_hot_updates-flag in 
  pg_class, to make sure everyone sees the change? What if CREATE INDEX 
  crashes, would we need a vacuum to reset the flag?
 
 You wouldn't need to do any extra waits to set the forbid_hot_updates 
 flag, CREATE INDEX locks the table and already sends a relcache 
 invalidations to make the new index visible. CREATE INDEX CONCURRENTLY 
 waits already.
 
  A more sophisticated idea would be to place an xid, rather than a
  boolean, in pg_index to indicate which chains were created after the
  index was created to control whether the index should follow that HOT
  chain, or ignore it.  The xmax of the head of the HOT chain can be used
  as an indicator of when the chain was created.  Transactions started
  before the pg_index xid could continue following the old rules and
  insert into the _new_ index for HOT chain additions, and new
  transactions would create HOT chains that could skip adding to the new
  index.  Cleanup of the hybrid HOT chains (some indexes take part, some
  do not) would be more complex.  
  
  What xid would you place in pg_index? Xid of the transaction running 
  CREATE INDEX, ReadNewTransactionId() or what?
 
 Apparently ReadNewTransactionId to make sure there's no existing tuples 
 with an xmax smaller than that.
 
  How does that work if you have a transaction that begins before CREATE 
  INDEX, and updates something after CREATE INDEX?
 
 You actually explained that above...
 
 The HOT_UPDATED flag on a tuple would basically mean that all indexes 
 with xid  xmax doesn't contain an index pointer for the tuple, and all 
 others do. When inserting new updated tuples, we'd also need to maintain 
 that invariant.
 
  I know we have xid wrap-around, but I think the VACUUM FREEZE could
  handle it by freezing the pg_index xid column value when it does the
  table.
  
  I don't think you can freeze the xid-column, we went through a similar 
  discussion on pg_class.relfrozenxid. But you can move it forward to 
  oldest xmin.
 
 You could actually freeze the column, because unlike relfrozenid we 
 never need to unfreeze it.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian


As stated in later email, I think we should focus on the xid idea
because it is more flexible.

---

Gregory Stark wrote:
 
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  We also add a boolean to pg_class to indicate no new HOT chains should be
  created and set that to false once the new index is created.
 
 Since we have all the index info in the relcache we could just skim through
 all the indexes when we build the relcache and decide then whether we're
 allowed to do HOT updates. That avoids problems if we crash while HOT updates
 are disabled.
 
 I think we need to think harder about exactly what test we would perform
 against this xid to determine the two relevant tests, 
 
 a) whether to prohibit HOT updates (because the index is too new)
 
 b) whether to ignore HOT update chains when we use the index (because it's
too new and any HOT update chains predate it).
 
 I fear it may imply that we have to keep performing cold updates until the
 first vacuum after the xid expires.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Pavan Deolasee wrote:
 On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 
  Bruce Momjian wrote:
  
   I have read the HOT discussion and wanted to give my input.  The major
   issue is that CREATE INDEX might require a HOT chain to be split apart
   if one of the new indexed columns changed in the HOT chain.
 
  To expand a little more, the problem is that when you split those HOT
  chains, you have to insert new entries into the _existing_ indexes,
  causing problems for concurrent sequential scans.
 
 
 You mean index scans, right ? Sequential scans don't change with HOT.

Sorry, yes, index scans.

 I have a new idea.  There has been a lot of focus on trying to tag each
  tuple to indicate whether it is part of a HOT chain for individual
  indexes, the idea being that some indexes will follow the HOT chain, and
  some will not.
 
 
 
 In general I would like to preserve the HOT properties at the end of
 creation of new index. All index should point to the same root tuple
 if pointing to a HOT-chain. The things might become otherwise messy
 with the line pointer redirection, reuse of LP_DELETEd tuples and
 other things that we have put in.
 
 Disabling HOT-updates using pg_class attribute has same issue
 with my earlier proposal. What do we do if the backend crashes before
 it can enable it again ? Who would reset the flag ? We could have lived
 without DDL and CHILL command if we would have answers for
 these questions.

Let's focus on the xid idea, as stated in earlier emails.  It has fewer
restrictions.

 A different idea here:
 
 Would it be acceptable to force CREATE INDEX to be run outside
 a transaction block just like CREATE INDEX CONCURRENTLY ? If thats
 acceptable, we can do the following:

Yea, that is possible, but not ideal.

 CREATE INDEX:
 
 Since CREATE INDEX locks out table for UPDATEs, we just need
 to wait enough before we start the heap scan so that when we do
 heap scan, all HOT-chains can be pruned to a single tuple (with or
 without the redirected line pointer). So when the new index is built,
 we first prune the entire page of HOT-chains and insert the TID of
 the root tuple into the new index. IOW we need to wait for all
 transactions in the snapshot after acquiring ShareLock but before
 we start heap scan.

I am worried that will require CREATE INDEX to wait for a long time.

 CREATE INDEX CONCURRENTLY:
 
 Simon has already posted a design that would work with CREATE
 INDEX CONCURRENTLY. I think we need to tweak it a bit so
 that we can handle the HOT-updated tuples after then first heap
 scan, but before the index is visible to all transactions. Once the
 new index is visible, the heap_update() code path would take care
 of avoiding HOT-updates if the column on which new index is being
 built is updated.
 
 It seems much simpler to me do something like this. But important
 question is whether the restriction that CREATE INDEX can not
 be run in a transaction block is acceptable ?

Is the pg_index xid idea too complex?  It seems to give you the
per-tuple index bit, without the restrictions.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Pavan Deolasee


On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:




I am worried that will require CREATE INDEX to wait for a long time.




Not unless there are long running transactions. We are not waiting
for the lock, but only for the current transactions to finish.




Is the pg_index xid idea too complex?  It seems to give you the
per-tuple index bit, without the restrictions.




How do we handle HEAP_ONLY tuple cleanup ? If I understood
the proposal correctly, at the end of the create index, a HEAP_ONLY
tuple may have pointer from the new index, isn't it ?


Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 As stated in later email, I think we should focus on the xid idea
 because it is more flexible.

Sorry if I was unclear. I agree, my comment and questions are all predicated
on the assumption that we would go with xids. 


 ---

 Gregory Stark wrote:
 
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  We also add a boolean to pg_class to indicate no new HOT chains should be
  created and set that to false once the new index is created.
 
 Since we have all the index info in the relcache we could just skim through
 all the indexes when we build the relcache and decide then whether we're
 allowed to do HOT updates. That avoids problems if we crash while HOT updates
 are disabled.
 
 I think we need to think harder about exactly what test we would perform
 against this xid to determine the two relevant tests, 
 
 a) whether to prohibit HOT updates (because the index is too new)
 
 b) whether to ignore HOT update chains when we use the index (because it's
too new and any HOT update chains predate it).
 
 I fear it may imply that we have to keep performing cold updates until the
 first vacuum after the xid expires.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

 -- 
   Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
   EnterpriseDB   http://www.enterprisedb.com

   + If your life is a hard drive, Christ can be your backup. +

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  As stated in later email, I think we should focus on the xid idea
  because it is more flexible.
 
 Sorry if I was unclear. I agree, my comment and questions are all predicated
 on the assumption that we would go with xids. 

OK, replies below.

  Gregory Stark wrote:
  
  Bruce Momjian [EMAIL PROTECTED] writes:
  
   We also add a boolean to pg_class to indicate no new HOT chains should be
   created and set that to false once the new index is created.
  
  Since we have all the index info in the relcache we could just skim through
  all the indexes when we build the relcache and decide then whether we're
  allowed to do HOT updates. That avoids problems if we crash while HOT 
  updates
  are disabled.

Uh, with xid, we don't disable HOT updates at all --- we just control,
via xids, which indexes are part of which HOT update chain.

  I think we need to think harder about exactly what test we would perform
  against this xid to determine the two relevant tests, 
  
  a) whether to prohibit HOT updates (because the index is too new)

Never.

  b) whether to ignore HOT update chains when we use the index (because it's
 too new and any HOT update chains predate it).

We have to look at the xmax of the HOT chain head and compare that to
the index xid.

  I fear it may imply that we have to keep performing cold updates until the
  first vacuum after the xid expires.

Yes, effectively, that is true, because once we have the new index in
place, we can't reuse the old HOT chains --- we need VACUUM at some
point, but new HOT chains can be created.

Actually, since we have a _write_ lock on the table, there are no table
changes during index creation, so we only need to worry about changes
after the index is created.  

After the index is created, old HOT chains have to be cleaned up via
VACUUM because we have the new index pointing into the HOT chain. 
Existing transactions can't create new HOT chains because they would
have to insert into the new index (because of their xid), so they might
as well not use HOT.  New transactions can create new HOT chains because
their xid marks them as including the new index in the index group.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Pavan Deolasee wrote:
 On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 
 
 
  I am worried that will require CREATE INDEX to wait for a long time.
 
 
 
 Not unless there are long running transactions. We are not waiting
 for the lock, but only for the current transactions to finish.

Waiting for all transactions might take a while, no?

  Is the pg_index xid idea too complex?  It seems to give you the
  per-tuple index bit, without the restrictions.
 
 
 
 How do we handle HEAP_ONLY tuple cleanup ? If I understood
 the proposal correctly, at the end of the create index, a HEAP_ONLY
 tuple may have pointer from the new index, isn't it ?

Right.  You would need vacuum to clean up the HEAP_ONLY tuples.  I just
sent an email about those deails.  We might have autovacuum check
pg_index and do it automatically.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian


Effectively, my idea is not to chill/break the HOT chains during index
creation, but rather to abandon them and wait for VACUUM to clean them
up.

My idea is much closer to the idea of a bit per index on every tuple,
except the tuple xmax and pg_index xid replace them.

---

Bruce Momjian wrote:
 Pavan Deolasee wrote:
  On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:
  
  
  
   I am worried that will require CREATE INDEX to wait for a long time.
  
  
  
  Not unless there are long running transactions. We are not waiting
  for the lock, but only for the current transactions to finish.
 
 Waiting for all transactions might take a while, no?
 
   Is the pg_index xid idea too complex?  It seems to give you the
   per-tuple index bit, without the restrictions.
  
  
  
  How do we handle HEAP_ONLY tuple cleanup ? If I understood
  the proposal correctly, at the end of the create index, a HEAP_ONLY
  tuple may have pointer from the new index, isn't it ?
 
 Right.  You would need vacuum to clean up the HEAP_ONLY tuples.  I just
 sent an email about those deails.  We might have autovacuum check
 pg_index and do it automatically.
 
 -- 
   Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
   EnterpriseDB   http://www.enterprisedb.com
 
   + If your life is a hard drive, Christ can be your backup. +
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Merlin Moncure


On 3/21/07, Pavan Deolasee [EMAIL PROTECTED] wrote:

It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?


yikes -- this is huge, huge price to pay, IMHO.  Think about DDL that
implies index creation such as adding unique constraint to a table,
many postgresql users (including me) take advantage of that in update
systems to production databases.

merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Pavan Deolasee


On 3/21/07, Merlin Moncure [EMAIL PROTECTED] wrote:


On 3/21/07, Pavan Deolasee [EMAIL PROTECTED] wrote:
 It seems much simpler to me do something like this. But important
 question is whether the restriction that CREATE INDEX can not
 be run in a transaction block is acceptable ?

yikes -- this is huge, huge price to pay, IMHO.  Think about DDL that
implies index creation such as adding unique constraint to a table,
many postgresql users (including me) take advantage of that in update
systems to production databases.




I didn't understand that quite well. How does it help to run CREATE
INDEX inside a transaction ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Pavan Deolasee


On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:



Effectively, my idea is not to chill/break the HOT chains during index
creation, but rather to abandon them and wait for VACUUM to clean them
up.

My idea is much closer to the idea of a bit per index on every tuple,
except the tuple xmax and pg_index xid replace them.




Regarding waiting inside CREATE INDEX for transactions started
before us doesn't seem like a big problem to me. But may be I am
mistaken.

Given a choice I won't complicate the fetch/update logic more. I would
rather fix the problem at hand, if I can. But I would go by the
community agreement on this issue since I don't have a clear
answer.

Also, I am wondering whether the information that which index is used to
fetch a tuple is always available. I haven't checked, but do we have that
information in lossy bitmap heapscan ?


Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Florian G. Pflug


Pavan Deolasee wrote:

On 3/21/07, Merlin Moncure [EMAIL PROTECTED] wrote:


On 3/21/07, Pavan Deolasee [EMAIL PROTECTED] wrote:
 It seems much simpler to me do something like this. But important
 question is whether the restriction that CREATE INDEX can not
 be run in a transaction block is acceptable ?

yikes -- this is huge, huge price to pay, IMHO.  Think about DDL that
implies index creation such as adding unique constraint to a table,
many postgresql users (including me) take advantage of that in update
systems to production databases.


I didn't understand that quite well. How does it help to run CREATE
INDEX inside a transaction ?

The problem is not so much CREATE INDEX per se, but other DDL commands
that implicitly create an index, for example adding a PRIMARY KEY to a
table.

Some applications that I have written would fail badly if CREATE INDEX
was disallowed inside a transaction - mostly, because they use plpgsql
functions to manipulate database objects, and disallowing CREATE INDEX
inside a transaction presumably means disallowing it from inside
stored procedures.

greetings, Florian Pflug

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Merlin Moncure


On 3/21/07, Florian G. Pflug [EMAIL PROTECTED] wrote:

Pavan Deolasee wrote:
 On 3/21/07, Merlin Moncure [EMAIL PROTECTED] wrote:

 On 3/21/07, Pavan Deolasee [EMAIL PROTECTED] wrote:
  It seems much simpler to me do something like this. But important
  question is whether the restriction that CREATE INDEX can not
  be run in a transaction block is acceptable ?

 yikes -- this is huge, huge price to pay, IMHO.  Think about DDL that
 implies index creation such as adding unique constraint to a table,
 many postgresql users (including me) take advantage of that in update
 systems to production databases.

 I didn't understand that quite well. How does it help to run CREATE
 INDEX inside a transaction ?
The problem is not so much CREATE INDEX per se, but other DDL commands
that implicitly create an index, for example adding a PRIMARY KEY to a
table.

Some applications that I have written would fail badly if CREATE INDEX
was disallowed inside a transaction - mostly, because they use plpgsql
functions to manipulate database objects, and disallowing CREATE INDEX
inside a transaction presumably means disallowing it from inside
stored procedures.


speaking with pavan off list he seems to think that only 'create
index' is outside transaction, not the other ddl flavors of it because
they are generally acquiring a excl lock.  so, in that sense it is
possibly acceptable to me although still a pretty tough pill to
swallow (thinking, guc time).  It would also preclude ever integrating
vanilla 'create index' to create table command, fwiw.

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Pavan Deolasee wrote:
 On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:
 
 
  Effectively, my idea is not to chill/break the HOT chains during index
  creation, but rather to abandon them and wait for VACUUM to clean them
  up.
 
  My idea is much closer to the idea of a bit per index on every tuple,
  except the tuple xmax and pg_index xid replace them.
 
 
 
 Regarding waiting inside CREATE INDEX for transactions started
 before us doesn't seem like a big problem to me. But may be I am
 mistaken.
 
 Given a choice I won't complicate the fetch/update logic more. I would
 rather fix the problem at hand, if I can. But I would go by the
 community agreement on this issue since I don't have a clear
 answer.

Yep, we all want is simple.  If we can do the xid idea, we hopefully fix
the concurrent index scan and crash recovery issues.

 Also, I am wondering whether the information that which index is used to
 fetch a tuple is always available. I haven't checked, but do we have that
 information in lossy bitmap heapscan ?

Oh, that is an interesting problem because an index might have one index
entry representing an entire HOT chain, while another index might
represent each chain member by individual index entries.  When we do the
bitmaps, don't we access them by heap tid, meaning we would find all
entries anyway?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Bruce Momjian wrote:
  Also, I am wondering whether the information that which index is used to
  fetch a tuple is always available. I haven't checked, but do we have that
  information in lossy bitmap heapscan ?
 
 Oh, that is an interesting problem because an index might have one index
 entry representing an entire HOT chain, while another index might
 represent each chain member by individual index entries.  When we do the
 bitmaps, don't we access them by heap tid, meaning we would find all
 entries anyway?

I thinking some more, it would be a problem because while we are merging
the tids, we are using index entries and haven't looked at the heap yet.
I am guessing we would have to exclude the new index from bitmap joins
with other indexes until the VACUUM happens.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Bruce Momjian wrote:
 Bruce Momjian wrote:
   Also, I am wondering whether the information that which index is used to
   fetch a tuple is always available. I haven't checked, but do we have that
   information in lossy bitmap heapscan ?
  
  Oh, that is an interesting problem because an index might have one index
  entry representing an entire HOT chain, while another index might
  represent each chain member by individual index entries.  When we do the
  bitmaps, don't we access them by heap tid, meaning we would find all
  entries anyway?
 
 I thinking some more, it would be a problem because while we are merging
 the tids, we are using index entries and haven't looked at the heap yet.
 I am guessing we would have to exclude the new index from bitmap joins
 with other indexes until the VACUUM happens.

Thinking some more, bitmap scans have a mode that tracks just the page
numbers, rather than the tids --- if the index visibilities do not
match, we would need to fall back to that mode.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Simon Riggs

On Wed, 2007-03-21 at 13:29 -0400, Bruce Momjian wrote:
 Pavan Deolasee wrote:
  On 3/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:
  
  
  
   I am worried that will require CREATE INDEX to wait for a long time.
  
  
  
  Not unless there are long running transactions. We are not waiting
  for the lock, but only for the current transactions to finish.
 
 Waiting for all transactions might take a while, no?

It would be nice if we could force CREATE INDEX to only run outside a
transaction block, but that just isn't possible if you care about
existing applications.

Any PL function running a CREATE INDEX would fail and I'm pretty sure
there's lots of them. Many of them use dynamic SQL, so you wouldn't even
be able to scan them prior to run-time to know that they would fail.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Heikki Linnakangas


Bruce Momjian wrote:

Bruce Momjian wrote:

Bruce Momjian wrote:

Also, I am wondering whether the information that which index is used to
fetch a tuple is always available. I haven't checked, but do we have that
information in lossy bitmap heapscan ?

Oh, that is an interesting problem because an index might have one index
entry representing an entire HOT chain, while another index might
represent each chain member by individual index entries.  When we do the
bitmaps, don't we access them by heap tid, meaning we would find all
entries anyway?

I thinking some more, it would be a problem because while we are merging
the tids, we are using index entries and haven't looked at the heap yet.
I am guessing we would have to exclude the new index from bitmap joins
with other indexes until the VACUUM happens.


Thinking some more, bitmap scans have a mode that tracks just the page
numbers, rather than the tids --- if the index visibilities do not
match, we would need to fall back to that mode.


You don't need to scan the whole page like in the lossy bitmap mode, 
just all the tuples in the HOT-chain.


You need to somehow pass the information that multiple indexes have been 
used in the bitmap scan to the bitmap heapscan node, so that it knows 
when the extra checking is required.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Heikki Linnakangas wrote:
 Bruce Momjian wrote:
  Bruce Momjian wrote:
  Bruce Momjian wrote:
  Also, I am wondering whether the information that which index is used to
  fetch a tuple is always available. I haven't checked, but do we have that
  information in lossy bitmap heapscan ?
  Oh, that is an interesting problem because an index might have one index
  entry representing an entire HOT chain, while another index might
  represent each chain member by individual index entries.  When we do the
  bitmaps, don't we access them by heap tid, meaning we would find all
  entries anyway?
  I thinking some more, it would be a problem because while we are merging
  the tids, we are using index entries and haven't looked at the heap yet.
  I am guessing we would have to exclude the new index from bitmap joins
  with other indexes until the VACUUM happens.
  
  Thinking some more, bitmap scans have a mode that tracks just the page
  numbers, rather than the tids --- if the index visibilities do not
  match, we would need to fall back to that mode.
 
 You don't need to scan the whole page like in the lossy bitmap mode, 
 just all the tuples in the HOT-chain.
 
 You need to somehow pass the information that multiple indexes have been 
 used in the bitmap scan to the bitmap heapscan node, so that it knows 
 when the extra checking is required.

That might be confusing because you are going to have some tids that are
chains, and some that aren't.  The hard part is making sure you don't
include the same tid twice.

Another idea is to set pg_index xid to FrozenTransactionId once the
VACUUM happens, and if it not frozen, do something special for bitmap
scans.


-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Bruce Momjian

Bruce Momjian wrote:
  You don't need to scan the whole page like in the lossy bitmap mode, 
  just all the tuples in the HOT-chain.
  
  You need to somehow pass the information that multiple indexes have been 
  used in the bitmap scan to the bitmap heapscan node, so that it knows 
  when the extra checking is required.
 
 That might be confusing because you are going to have some tids that are
 chains, and some that aren't.  The hard part is making sure you don't
 include the same tid twice.
 
 Another idea is to set pg_index xid to FrozenTransactionId once the
 VACUUM happens, and if it not frozen, do something special for bitmap
 scans.

Anyway, perhaps we can leave the bitmap scan part to someone more
familiar with that part of the code, like Tom.

HOT is a huge feature, so let's keep working on it.  The patch doesn't
have to be 100% on April 1, but we certainly want to try to get as far
as we can by then.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-21 Thread Andrew Dunstan


Bruce Momjian wrote:


Anyway, perhaps we can leave the bitmap scan part to someone more
familiar with that part of the code, like Tom.


  


Yeah. We all know he doesn't have enough to do ...


cheers

andrew


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-20 Thread Bruce Momjian


I have read the HOT discussion and wanted to give my input.  The major
issue is that CREATE INDEX might require a HOT chain to be split apart
if one of the new indexed columns changed in the HOT chain.

As for the outline below, there is no way we are going to add new ALTER
TABLE and CHILL commands to make this work.  Can you imagine what kind
of warts we would have in the system if we added such things every time
we got stuck?  And once the warts are in, it is hard to educate people
once they are removed.  We need to keep going until we have a solution
that is as user-invisible as possible.  While I understand the
frustration that we have not discussed this enough, I don't want us
rushing to a solution either until it has been totally thought through.

HOT is not a feature only a few people are going to want to use ---
everyone will want it, and if the user process is cumbersome, we will get
never-ending questions about how to make it work.

Let's all think about this for the next few days.

---

Pavan Deolasee wrote:
 
 
 There are  few things I realized over the weekend while going
 through the code:
 
 1. It looks like a bad idea to use ALTER TABLE ..  to chill a table
 becuase ALTER TABLE takes AccessExclusive lock on the table.
 But it would still be a good idea to have ALTER TABLE .. to turn
 HOT-updates ON/OFF.
 
 2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
 its lock anyways and is prone to deadlock. So as long as we don't
 create new deadlock scenarios, we should be fine.
 
 3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
 So its should be acceptable if we run CHILL as a seperate transaction.
 
 4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
 thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
 VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
 race conditions between all of these.
 
 
 So here is my plan. Let me know your suggestions/comments/objections.
 
 
 Changes to pg_class and new DDLs:
 -
 
 We add two boolean attributes to pg_class: hot_update and hot_fetch.
 
 We introduce two DDLs to turn HOT on/off. Note that the DDL
 itself does not CHILL the table, but only affects the
 subsequent UPDATEs.
 
 postgres=# ALTER TABLE tblname ENABLE HOT;
 postgres=# ALTER TABLE tblname DISABLE HOT;
 
 These DDLs would acquire AccessExclusive lock on the table and
 set hot_update to true/false using simple_heap_update()
 
 
 CREATE INDEX [CONCURRENTLY]:
 
 
 If a HEAP_ONLY tuple is found, error out with a HINT to run
 CHILL on the table and then retry.
 
 If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
 the table.
 
 
 CHILL utility:
 --
 
 We introduce a new command to chill a table. The syntax for the
 same could be:
 
 postgres=# CHILL [VERBOSE] tblname;
 
 UPDATE/INSERT/SELECT would work while the table is being chilled.
 But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL
 would be locked out. As a side-effect, HOT-updates are turned off on the
 table and explicit ALTER TABLE ENABLE HOT is required to turn
 HOT-updates on again.
 
 Here is the algoirthm to CHILL table.
 
 1. Check if CHILL is running inside a transaction block, error
 out if so.
 
 2. Start a new transaction
 
 3. Acquire ShareUpdateExclusiveLock on the relation. This would
 allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent
 CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY]
 
 4. Set hot_update to false and update pg_class using
 simple_heap_update()
 
 5. Acquire ShareUpdateExclusiveLock for the entire session.
 
 6. Commit the transaction
 
 7. Start a new transaction
 
 8. Wait for all transactions in the current snapshot to finish.
 This would ensure that there are no HOT-updates possible further
 
 9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting
 appropriate index entries and setting CHILL_IN_PROGRESS flag.
 WAL log the operation
 
 10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't
 be sure whether the corresponding index entry already exists
 or not. One option is to error out and force VACUUM on the table.
 Alternatively, the index_insert can be enhanced to check if a
 the same entry already exists.
 
 11. When the entire heap is chilled, set hot_fetch to false
 and update pg_class using simple_heap_update()
 
 12. Commit the transaction
 
 13. Start a new transaction
 
 14. Wait for all transactions in the current snapshot to finish.
 This would ensure that all the subsequent index scans would
 only use direct path from the index.
 
 15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY
 and HOT_UPDATED flags. WAL log the operations. We may not need
 this, but we can revisit this later to optimize WAL logging.
 
 16. When the second scan is complete, set hot_fetch to
 true and update pg_class using

[HACKERS] CREATE INDEX and HOT - revised design

2007-03-19 Thread Pavan Deolasee




There are  few things I realized over the weekend while going
through the code:

1. It looks like a bad idea to use ALTER TABLE ..  to chill a table
becuase ALTER TABLE takes AccessExclusive lock on the table.
But it would still be a good idea to have ALTER TABLE .. to turn
HOT-updates ON/OFF.

2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
its lock anyways and is prone to deadlock. So as long as we don't
create new deadlock scenarios, we should be fine.

3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
So its should be acceptable if we run CHILL as a seperate transaction.

4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
race conditions between all of these.


So here is my plan. Let me know your suggestions/comments/objections.


Changes to pg_class and new DDLs:
-

We add two boolean attributes to pg_class: hot_update and hot_fetch.

We introduce two DDLs to turn HOT on/off. Note that the DDL
itself does not CHILL the table, but only affects the
subsequent UPDATEs.

postgres=# ALTER TABLE tblname ENABLE HOT;
postgres=# ALTER TABLE tblname DISABLE HOT;

These DDLs would acquire AccessExclusive lock on the table and
set hot_update to true/false using simple_heap_update()


CREATE INDEX [CONCURRENTLY]:


If a HEAP_ONLY tuple is found, error out with a HINT to run
CHILL on the table and then retry.

If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
the table.


CHILL utility:
--

We introduce a new command to chill a table. The syntax for the
same could be:

postgres=# CHILL [VERBOSE] tblname;

UPDATE/INSERT/SELECT would work while the table is being chilled.
But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL
would be locked out. As a side-effect, HOT-updates are turned off on the
table and explicit ALTER TABLE ENABLE HOT is required to turn
HOT-updates on again.

Here is the algoirthm to CHILL table.

1. Check if CHILL is running inside a transaction block, error
out if so.

2. Start a new transaction

3. Acquire ShareUpdateExclusiveLock on the relation. This would
allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent
CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY]

4. Set hot_update to false and update pg_class using
simple_heap_update()

5. Acquire ShareUpdateExclusiveLock for the entire session.

6. Commit the transaction

7. Start a new transaction

8. Wait for all transactions in the current snapshot to finish.
This would ensure that there are no HOT-updates possible further

9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting
appropriate index entries and setting CHILL_IN_PROGRESS flag.
WAL log the operation

10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't
be sure whether the corresponding index entry already exists
or not. One option is to error out and force VACUUM on the table.
Alternatively, the index_insert can be enhanced to check if a
the same entry already exists.

11. When the entire heap is chilled, set hot_fetch to false
and update pg_class using simple_heap_update()

12. Commit the transaction

13. Start a new transaction

14. Wait for all transactions in the current snapshot to finish.
This would ensure that all the subsequent index scans would
only use direct path from the index.

15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY
and HOT_UPDATED flags. WAL log the operations. We may not need
this, but we can revisit this later to optimize WAL logging.

16. When the second scan is complete, set hot_fetch to
true and update pg_class using simple_heap_update(). There are
no HOT_UPDATED tuples in the heap at this moment, but we should
reset the state neverthless.

17. Commit the transaction.

If the CHILL command crashes before completing the operation,
we might be left with hot_update/hot_fetch turned OFF. Administrative
command is needed to turn them ON again. But there won't be any
correctness problems in the meantime.

The uncleaned tuples left with CHILL_IN_PROGRESS flags would
require VACUUM for cleanup.

Index Fetch:


If hot_fetch is true, we ignore direct paths from the
index to HEAP_ONLY tuples

If hot_fetch is false, we ignore HOT_UPDATED flags
and only use direct paths from the index.

VACUUM [FULL]:
--

If a CHILL_IN_PROGRESS flag found, collect that tuple for
index removal irrespective of whether the tuple is DEAD
or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS
flag is reset in the second pass.


Is the plan acceptable ? If there are no objections to the
algorithms or the behavior in general, I would start working
on this with a target of feature freeze.


Thanks,
Pavan


--


EnterpriseDBhttp://www.enterprisedb.com


---(end of broadcast)---

95 matches

Mail list logo