Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-10-10 Thread Andres Freund
On 2013-10-02 13:16:06 +0900, Michael Paquier wrote:
 Each patch applied with its parents compiles, has no warnings AFAIK
 and passes regression/isolation tests. Working on 0004 by the end of
 the CF seems out of the way IMO, so I'd suggest focusing on 0002 and
 0003 now, and I can put some time to finalize them for this CF. I
 think that we should perhaps split 0003 into 2 pieces, with one patch
 for the introduction of index_concurrent_build, and another for
 index_concurrent_set_dead. Comments are welcome about that though, and
 if people agree on that I'll do it once 0002 is finalized.

FWIW I don't think splitting of index_concurrent_build is worthwile...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-10-09 Thread Michael Paquier
Marking this patch as returned with feedback, I will not be able to
work on that by the 15th of October. It would have been great to get
the infrastructure patches 0002 and 0003 committed to minimize the
work on the core patch, but well it is not the case.

I am attaching as well a patch fixing some comments of index_drop, as
mentioned by Andres in another thread, such as it doesn't get lost in
the flow.

Thanks to all for the involvement.

Regards,
--
Michael


20131002_index_drop_comments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-10-01 Thread Alvaro Herrera
Michael Paquier escribió:

 Btw, taking the problem from another viewpoint... This feature has now
 3 patches, the 2 first patches doing only code refactoring. Could it
 be possible to have a look at those ones first? Straight-forward
 things should go first, simplifying the core feature evaluation.

I have pushed the first half of the first patch for now, revising it
somewhat: I renamed the functions and put them in lmgr.c instead of
procarray.c.

I think the second half of that first patch (WaitForOldSnapshots) should
be in index.c, not procarray.c either.  I didn't look at the actual code
in there.

I already shipped Michael fixed versions of the remaining patches
adjusting them to the changed API.  I expect him to post them here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-26 Thread Andres Freund
On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
  2) I don't think the drop algorithm used now is correct. Your
  index_concurrent_set_dead() sets both indisvalid = false and indislive =
  false at the same time. It does so after doing a WaitForVirtualLocks() -
  but that's not sufficient. Between waiting and setting indisvalid =
  false another transaction could start which then would start using that
  index. Which will not get updated anymore by other concurrent backends
  because of inislive = false.
  You really need to follow index_drop's lead here and first unset
  indisvalid then wait till nobody can use the index for querying anymore
  and only then unset indislive.

 Sorry, I do not follow you here. index_concurrent_set_dead calls
 index_set_state_flags that sets indislive and *indisready* to false,
 not indisvalid. The concurrent index never uses indisvalid = true so
 it can never be called by another backend for a read query. The drop
 algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.

That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as valid
2) now wait for all transactions that potentially still can use the index for
   *querying* to finish. During that indisready *must* be true,
   otherwise the index will have outdated contents.
3) Mark the index as indislive = false, indisready = false. Anything
   using a newer relcache entry will now not update the index.
4) Wait till all potential updaters of the index have finished.
5) Drop the index.

With the patch's current scheme concurrent queries that use plans using
the old index will get wrong results (at least in read committed)
because concurrent writers will not update it anymore since it's marked
indisready = false.
This isn't a problem of the *new* index, it's a problem of the *old*
one.

Am I missing something?

  3) I am not sure if the swap algorithm used now actually is correct
  either. We have mvcc snapshots now, right, but we're still potentially
  taking separate snapshot for individual relcache lookups. What's
  stopping us from temporarily ending up with two relcache entries with
  the same relfilenode?
  Previously you swapped names - I think that might end up being easier,
  because having names temporarily confused isn't as bad as two indexes
  manipulating the same file.

 Actually, performing swap operation with names proves to be more
 difficult than it looks as it makes necessary a moment where both the
 old and new indexes are marked as valid for all the backends.

But that doesn't make the current method correct, does it?

 The only
 reason for that is that index_set_state_flag assumes that a given xact
 has not yet done any transactional update when it is called, forcing
 to one the number of state flag that can be changed inside a
 transaction. This is a safe method IMO, and we shouldn't break that.

Part of that reasoning comes from the non-mvcc snapshot days, so it's
not really up to date anymore.
Even if you don't want to go through all that logic - which I'd
understand quite well - you can just do it like:
1) start with: old index: valid, ready, live; new index: invalid, ready, live
2) one transaction: switch names from real_name = tmp_name, new_name =
  real_name
3) one transaction: mark real_name (which is the rebuilt index) as valid,
and new_name as invalid

Now, if we fail in the midst of 3, we'd have two indexes marked as
valid. But that's unavoidable as long as you don't want to use
transactions.
Alternatively you could pass in a flag to use transactional updates,
that should now be safe.

At least, unless the old index still has indexcheckxmin = true with an
xmin that's not old enough. But in that case we cannot do the concurrent
reindex at all I think since we rely on the old index to be full valid.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-26 Thread Michael Paquier
On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
  2) I don't think the drop algorithm used now is correct. Your
  index_concurrent_set_dead() sets both indisvalid = false and indislive =
  false at the same time. It does so after doing a WaitForVirtualLocks() -
  but that's not sufficient. Between waiting and setting indisvalid =
  false another transaction could start which then would start using that
  index. Which will not get updated anymore by other concurrent backends
  because of inislive = false.
  You really need to follow index_drop's lead here and first unset
  indisvalid then wait till nobody can use the index for querying anymore
  and only then unset indislive.

 Sorry, I do not follow you here. index_concurrent_set_dead calls
 index_set_state_flags that sets indislive and *indisready* to false,
 not indisvalid. The concurrent index never uses indisvalid = true so
 it can never be called by another backend for a read query. The drop
 algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.

 That makes it even worse... You can do the concurrent drop only in the
 following steps:
 1) set indisvalid = false, no future relcache lookups will have it as valid
indisvalid is never set to true for the concurrent index. Swap is done
with concurrent index having indisvalid = false and former index with
indisvalid = true. The concurrent index is validated with
index_validate in a transaction before swap transaction.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-26 Thread Andres Freund
On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
 On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
   2) I don't think the drop algorithm used now is correct. Your
   index_concurrent_set_dead() sets both indisvalid = false and indislive =
   false at the same time. It does so after doing a WaitForVirtualLocks() -
   but that's not sufficient. Between waiting and setting indisvalid =
   false another transaction could start which then would start using that
   index. Which will not get updated anymore by other concurrent backends
   because of inislive = false.
   You really need to follow index_drop's lead here and first unset
   indisvalid then wait till nobody can use the index for querying anymore
   and only then unset indislive.
 
  Sorry, I do not follow you here. index_concurrent_set_dead calls
  index_set_state_flags that sets indislive and *indisready* to false,
  not indisvalid. The concurrent index never uses indisvalid = true so
  it can never be called by another backend for a read query. The drop
  algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
 
  That makes it even worse... You can do the concurrent drop only in the
  following steps:
  1) set indisvalid = false, no future relcache lookups will have it as valid

 indisvalid is never set to true for the concurrent index. Swap is done
 with concurrent index having indisvalid = false and former index with
 indisvalid = true. The concurrent index is validated with
 index_validate in a transaction before swap transaction.

Yes. I've described how it *has* to be done, not how it's done.

The current method of going straight to isready = false for the original
index will result in wrong results because it's not updated anymore
while it's still being used.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-26 Thread Michael Paquier
On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
 On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
   2) I don't think the drop algorithm used now is correct. Your
   index_concurrent_set_dead() sets both indisvalid = false and indislive =
   false at the same time. It does so after doing a WaitForVirtualLocks() -
   but that's not sufficient. Between waiting and setting indisvalid =
   false another transaction could start which then would start using that
   index. Which will not get updated anymore by other concurrent backends
   because of inislive = false.
   You really need to follow index_drop's lead here and first unset
   indisvalid then wait till nobody can use the index for querying anymore
   and only then unset indislive.
 
  Sorry, I do not follow you here. index_concurrent_set_dead calls
  index_set_state_flags that sets indislive and *indisready* to false,
  not indisvalid. The concurrent index never uses indisvalid = true so
  it can never be called by another backend for a read query. The drop
  algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
 
  That makes it even worse... You can do the concurrent drop only in the
  following steps:
  1) set indisvalid = false, no future relcache lookups will have it as valid

 indisvalid is never set to true for the concurrent index. Swap is done
 with concurrent index having indisvalid = false and former index with
 indisvalid = true. The concurrent index is validated with
 index_validate in a transaction before swap transaction.

 Yes. I've described how it *has* to be done, not how it's done.

 The current method of going straight to isready = false for the original
 index will result in wrong results because it's not updated anymore
 while it's still being used.
The index being dropped at the end of process is not the former index,
but the concurrent index. The index used after REINDEX CONCURRENTLY is
the old index but with the new relfilenode.

Am I lacking of caffeine? It looks so...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-26 Thread Andres Freund
On 2013-09-26 20:47:33 +0900, Michael Paquier wrote:
 On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
  On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
   On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and 
indislive =
false at the same time. It does so after doing a 
WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using 
that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying 
anymore
and only then unset indislive.
  
   Sorry, I do not follow you here. index_concurrent_set_dead calls
   index_set_state_flags that sets indislive and *indisready* to false,
   not indisvalid. The concurrent index never uses indisvalid = true so
   it can never be called by another backend for a read query. The drop
   algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
  
   That makes it even worse... You can do the concurrent drop only in the
   following steps:
   1) set indisvalid = false, no future relcache lookups will have it as 
   valid
 
  indisvalid is never set to true for the concurrent index. Swap is done
  with concurrent index having indisvalid = false and former index with
  indisvalid = true. The concurrent index is validated with
  index_validate in a transaction before swap transaction.
 
  Yes. I've described how it *has* to be done, not how it's done.
 
  The current method of going straight to isready = false for the original
  index will result in wrong results because it's not updated anymore
  while it's still being used.

 The index being dropped at the end of process is not the former index,
 but the concurrent index. The index used after REINDEX CONCURRENTLY is
 the old index but with the new relfilenode.

That's not relevant unless I miss something.

After phase 4 both indexes are valid (although only the old one is
flagged as such), but due to the switching of the relfilenodes backends
could have either of both open, depending on the time they built the
relcache entry. Right?
Then you go ahead and mark the old index - which still might be used! -
as dead in phase 5. Which means other backends might (again, depending
on the time they have built the relcache entry) not update it
anymore. In read committed we very well might go ahead and use the index
with the same plan as before, but with a new snapshot. Which now will
miss entries.

Am I misunderstanding the algorithm you're using?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-26 Thread Michael Paquier
On Thu, Sep 26, 2013 at 8:56 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-26 20:47:33 +0900, Michael Paquier wrote:
 On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
  On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
   On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and 
indislive =
false at the same time. It does so after doing a 
WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using 
that
index. Which will not get updated anymore by other concurrent 
backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying 
anymore
and only then unset indislive.
  
   Sorry, I do not follow you here. index_concurrent_set_dead calls
   index_set_state_flags that sets indislive and *indisready* to false,
   not indisvalid. The concurrent index never uses indisvalid = true so
   it can never be called by another backend for a read query. The drop
   algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
  
   That makes it even worse... You can do the concurrent drop only in the
   following steps:
   1) set indisvalid = false, no future relcache lookups will have it as 
   valid
 
  indisvalid is never set to true for the concurrent index. Swap is done
  with concurrent index having indisvalid = false and former index with
  indisvalid = true. The concurrent index is validated with
  index_validate in a transaction before swap transaction.
 
  Yes. I've described how it *has* to be done, not how it's done.
 
  The current method of going straight to isready = false for the original
  index will result in wrong results because it's not updated anymore
  while it's still being used.

 The index being dropped at the end of process is not the former index,
 but the concurrent index. The index used after REINDEX CONCURRENTLY is
 the old index but with the new relfilenode.

 That's not relevant unless I miss something.

 After phase 4 both indexes are valid (although only the old one is
 flagged as such), but due to the switching of the relfilenodes backends
 could have either of both open, depending on the time they built the
 relcache entry. Right?
 Then you go ahead and mark the old index - which still might be used! -
 as dead in phase 5. Which means other backends might (again, depending
 on the time they have built the relcache entry) not update it
 anymore. In read committed we very well might go ahead and use the index
 with the same plan as before, but with a new snapshot. Which now will
 miss entries.
In this case, doing a call to WaitForOldSnapshots after the swap phase
is enough. It was included in past versions of the patch but removed
in the last 2 versions.

Btw, taking the problem from another viewpoint... This feature has now
3 patches, the 2 first patches doing only code refactoring. Could it
be possible to have a look at those ones first? Straight-forward
things should go first, simplifying the core feature evaluation.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-26 Thread Andres Freund
On 2013-09-27 05:41:26 +0900, Michael Paquier wrote:
 In this case, doing a call to WaitForOldSnapshots after the swap phase
 is enough. It was included in past versions of the patch but removed
 in the last 2 versions.

I don't think it is. I really, really suggest following the protocol
used by index_drop down to the t and document every *slight* deviation
carefully.
We've had more than one bug in index_drop's concurrent feature.

 Btw, taking the problem from another viewpoint... This feature has now
 3 patches, the 2 first patches doing only code refactoring. Could it
 be possible to have a look at those ones first? Straight-forward
 things should go first, simplifying the core feature evaluation.

I haven't looked at them in detail, but they looked good on a quick
pass. I'll make another pass, but that won't be before, say, Tuesday.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-19 Thread Robert Haas
On Tue, Sep 17, 2013 at 7:04 PM, Andres Freund and...@2ndquadrant.com wrote:
 1. We're not in a huge hurry to ensure that sinval notifications are
 delivered in a timely fashion.  We know that sinval resets are bad, so
 if a backend is getting close to needing a sinval reset, we kick it in
 an attempt to get it to AcceptInvalidationMessages().  But if the
 sinval queue isn't filling up, there's no upper bound on the amount of
 time that can pass before a particular sinval is read.  Therefore, the
 amount of time that passes before an idle backend is forced to drain
 the sinval queue can vary widely, from a fraction of a second to
 minutes, hours, or days.   So it's kind of unappealing to think about
 making user-visible behavior dependent on how long it ends up taking.

 Well, when we're signalling it's certainly faster than waiting for the
 other's snapshot to vanish which can take ages for normal backends. And
 we can signal when we wait for consumption without too many
 problems.
 Also, I think in most of the usecases we can simply not wait for any of
 the idle backends, those don't use the old definition anyway.

Possibly.  It would need some thought, though.

 I am pretty sure there's quite a bit to improve around sinvals but I
 think any replacement would look surprisingly similar to what we
 have. So I think doing it incrementally is more realistic.
 And I am certainly scared by the thought of having to replace it without
 breaking corner cases all over.

I guess I was more thinking that we might want some parallel mechanism
with somewhat different semantics.  But that might be a bad idea
anyway.  On the flip side, if I had any clear idea how to adapt the
current mechanism to suck less, I would have done it already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-17 Thread Robert Haas
On Mon, Sep 16, 2013 at 10:38 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-08-29 10:39:09 -0400, Robert Haas wrote:
 I have been of the opinion for some time now that the
 shared-invalidation code is not a particularly good design for much of
 what we need.  Waiting for an old snapshot is often a proxy for
 waiting long enough that we can be sure every other backend will
 process the shared-invalidation message before it next uses any of the
 cached data that will be invalidated by that message.  However, it
 would be better to be able to send invalidation messages in some way
 that causes them to processed more eagerly by other backends, and that
 provides some more specific feedback on whether or not they have
 actually been processed.  Then we could send the invalidation
 messages, wait just until everyone confirms that they have been seen,
 which should hopefully happen quickly, and then proceed.

 Actually, the shared inval code already has that knowledge, doesn't it?
 ISTM all we'd need is have a sequence number of SI entries which has
 to be queryable. Then one can simply wait till all backends have
 consumed up to that id which we keep track of the furthest back backend
 in shmem.

In theory, yes, but in practice, there are a few difficulties.

1. We're not in a huge hurry to ensure that sinval notifications are
delivered in a timely fashion.  We know that sinval resets are bad, so
if a backend is getting close to needing a sinval reset, we kick it in
an attempt to get it to AcceptInvalidationMessages().  But if the
sinval queue isn't filling up, there's no upper bound on the amount of
time that can pass before a particular sinval is read.  Therefore, the
amount of time that passes before an idle backend is forced to drain
the sinval queue can vary widely, from a fraction of a second to
minutes, hours, or days.   So it's kind of unappealing to think about
making user-visible behavior dependent on how long it ends up taking.

2. Every time we add a new kind of sinval message, we increase the
frequency of sinval resets, and those are bad.  So any notifications
that we choose to send this way had better be pretty low-volume.

Considering the foregoing points, it's unclear to me whether we should
try to improve sinval incrementally or replace it with something
completely new.  I'm sure that the above-mentioned problems are
solvable, but I'm not sure how hairy it will be.  On the other hand,
designing something new could be pretty hairy, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-17 Thread Andres Freund
On 2013-09-17 16:34:37 -0400, Robert Haas wrote:
 On Mon, Sep 16, 2013 at 10:38 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Actually, the shared inval code already has that knowledge, doesn't it?
  ISTM all we'd need is have a sequence number of SI entries which has
  to be queryable. Then one can simply wait till all backends have
  consumed up to that id which we keep track of the furthest back backend
  in shmem.
 
 In theory, yes, but in practice, there are a few difficulties.

Agreed ;)

 1. We're not in a huge hurry to ensure that sinval notifications are
 delivered in a timely fashion.  We know that sinval resets are bad, so
 if a backend is getting close to needing a sinval reset, we kick it in
 an attempt to get it to AcceptInvalidationMessages().  But if the
 sinval queue isn't filling up, there's no upper bound on the amount of
 time that can pass before a particular sinval is read.  Therefore, the
 amount of time that passes before an idle backend is forced to drain
 the sinval queue can vary widely, from a fraction of a second to
 minutes, hours, or days.   So it's kind of unappealing to think about
 making user-visible behavior dependent on how long it ends up taking.

Well, when we're signalling it's certainly faster than waiting for the
other's snapshot to vanish which can take ages for normal backends. And
we can signal when we wait for consumption without too many
problems.
Also, I think in most of the usecases we can simply not wait for any of
the idle backends, those don't use the old definition anyway.

 2. Every time we add a new kind of sinval message, we increase the
 frequency of sinval resets, and those are bad.  So any notifications
 that we choose to send this way had better be pretty low-volume.

In pretty much all the cases where I can see the need for something like
that, we already send sinval messages, so we should be able to
piggbyback on those.

 Considering the foregoing points, it's unclear to me whether we should
 try to improve sinval incrementally or replace it with something
 completely new.  I'm sure that the above-mentioned problems are
 solvable, but I'm not sure how hairy it will be.  On the other hand,
 designing something new could be pretty hairy, too.

I am pretty sure there's quite a bit to improve around sinvals but I
think any replacement would look surprisingly similar to what we
have. So I think doing it incrementally is more realistic.
And I am certainly scared by the thought of having to replace it without
breaking corner cases all over.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-16 Thread Andres Freund
On 2013-08-29 10:39:09 -0400, Robert Haas wrote:
 I have been of the opinion for some time now that the
 shared-invalidation code is not a particularly good design for much of
 what we need.  Waiting for an old snapshot is often a proxy for
 waiting long enough that we can be sure every other backend will
 process the shared-invalidation message before it next uses any of the
 cached data that will be invalidated by that message.  However, it
 would be better to be able to send invalidation messages in some way
 that causes them to processed more eagerly by other backends, and that
 provides some more specific feedback on whether or not they have
 actually been processed.  Then we could send the invalidation
 messages, wait just until everyone confirms that they have been seen,
 which should hopefully happen quickly, and then proceed.

Actually, the shared inval code already has that knowledge, doesn't it?
ISTM all we'd need is have a sequence number of SI entries which has
to be queryable. Then one can simply wait till all backends have
consumed up to that id which we keep track of the furthest back backend
in shmem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-09-16 Thread Andres Freund
Hi,

Looking at this version of the patch now:
1) comment for Phase 4 of REINDEX CONCURRENTLY ends with an incomplete
sentence.

2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.

3) I am not sure if the swap algorithm used now actually is correct
either. We have mvcc snapshots now, right, but we're still potentially
taking separate snapshot for individual relcache lookups. What's
stopping us from temporarily ending up with two relcache entries with
the same relfilenode?
Previously you swapped names - I think that might end up being easier,
because having names temporarily confused isn't as bad as two indexes
manipulating the same file.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-08-29 Thread Robert Haas
On Wed, Aug 28, 2013 at 9:02 AM, Andres Freund and...@2ndquadrant.com wrote:
 During swap phase, process was waiting for transactions with older
 snapshots than the one taken by transaction doing the swap as they
 might hold the old index information. I think that we can get rid of
 it thanks to the MVCC snapshots as other backends are now able to see
 what is the correct index information to fetch.

 I don't see MVCC snapshots guaranteeing that. The only thing changed due
 to them is that other backends see a self consistent picture of the
 catalog (i.e. not either, neither or both versions of a tuple as
 earlier). It's still can be out of date. And we rely on those not being
 out of date.

 I need to look into the patch for more details.

I agree with Andres.  The only way in which the MVCC catalog snapshot
patch helps is that you can now do a transactional update on a system
catalog table without fearing that other backends will see the row as
nonexistent or duplicated.  They will see exactly one version of the
row, just as you would naturally expect.  However, a backend's
syscaches can still contain old versions of rows, and they can still
cache older versions of some tuples and newer versions of other
tuples.  Those caches only get reloaded when shared-invalidation
messages are processed, and that only happens when the backend
acquires a lock on a new relation.

I have been of the opinion for some time now that the
shared-invalidation code is not a particularly good design for much of
what we need.  Waiting for an old snapshot is often a proxy for
waiting long enough that we can be sure every other backend will
process the shared-invalidation message before it next uses any of the
cached data that will be invalidated by that message.  However, it
would be better to be able to send invalidation messages in some way
that causes them to processed more eagerly by other backends, and that
provides some more specific feedback on whether or not they have
actually been processed.  Then we could send the invalidation
messages, wait just until everyone confirms that they have been seen,
which should hopefully happen quickly, and then proceed.  This would
probably lead to much shorter waits.  Or maybe we should have
individual backends process invalidations more frequently, and try to
set things up so that once an invalidation is sent, the sending
backend is immediately guaranteed that it will be processed soon
enough, and thus it doesn't need to wait at all.  This is all pie in
the sky, though.  I don't have a clear idea how to design something
that's an improvement over the (rather intricate) system we have
today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-08-28 Thread Andres Freund
On 2013-08-28 13:58:08 +0900, Michael Paquier wrote:
 On Tue, Aug 27, 2013 at 11:09 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
  I have been working a little bit more on this patch for the next
  commit fest. Compared to the previous version, I have removed the part
  of the code where process running REINDEX CONCURRENTLY was waiting for
  transactions holding a snapshot older than the snapshot xmin of
  process running REINDEX CONCURRENTLY at the validation and swap phase.
  At the validation phase, there was a risk that the newly-validated
  index might not contain deleted tuples before the snapshot used for
  validation was taken. I tried to break the code in this area by
  playing with multiple sessions but couldn't. Feel free to try the code
  and break it if you can!
 
  Hm. Do you have any justifications for removing those waits besides I
  couldn't break it? The logic for the concurrent indexing is pretty
  intricate and we've got it wrong a couple of times without noticing bugs
  for a long while, so I am really uncomfortable with statements like this.
 Note that the waits on relation locks are not removed, only the wait
 phases involving old snapshots.
 
 During swap phase, process was waiting for transactions with older
 snapshots than the one taken by transaction doing the swap as they
 might hold the old index information. I think that we can get rid of
 it thanks to the MVCC snapshots as other backends are now able to see
 what is the correct index information to fetch.

I don't see MVCC snapshots guaranteeing that. The only thing changed due
to them is that other backends see a self consistent picture of the
catalog (i.e. not either, neither or both versions of a tuple as
earlier). It's still can be out of date. And we rely on those not being
out of date.

I need to look into the patch for more details.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-08-27 Thread Andres Freund
On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
 I have been working a little bit more on this patch for the next
 commit fest. Compared to the previous version, I have removed the part
 of the code where process running REINDEX CONCURRENTLY was waiting for
 transactions holding a snapshot older than the snapshot xmin of
 process running REINDEX CONCURRENTLY at the validation and swap phase.
 At the validation phase, there was a risk that the newly-validated
 index might not contain deleted tuples before the snapshot used for
 validation was taken. I tried to break the code in this area by
 playing with multiple sessions but couldn't. Feel free to try the code
 and break it if you can!

Hm. Do you have any justifications for removing those waits besides I
couldn't break it? The logic for the concurrent indexing is pretty
intricate and we've got it wrong a couple of times without noticing bugs
for a long while, so I am really uncomfortable with statements like this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-08-27 Thread Michael Paquier
On Tue, Aug 27, 2013 at 11:09 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
 I have been working a little bit more on this patch for the next
 commit fest. Compared to the previous version, I have removed the part
 of the code where process running REINDEX CONCURRENTLY was waiting for
 transactions holding a snapshot older than the snapshot xmin of
 process running REINDEX CONCURRENTLY at the validation and swap phase.
 At the validation phase, there was a risk that the newly-validated
 index might not contain deleted tuples before the snapshot used for
 validation was taken. I tried to break the code in this area by
 playing with multiple sessions but couldn't. Feel free to try the code
 and break it if you can!

 Hm. Do you have any justifications for removing those waits besides I
 couldn't break it? The logic for the concurrent indexing is pretty
 intricate and we've got it wrong a couple of times without noticing bugs
 for a long while, so I am really uncomfortable with statements like this.
Note that the waits on relation locks are not removed, only the wait
phases involving old snapshots.

During swap phase, process was waiting for transactions with older
snapshots than the one taken by transaction doing the swap as they
might hold the old index information. I think that we can get rid of
it thanks to the MVCC snapshots as other backends are now able to see
what is the correct index information to fetch.
After doing the new index validation, index has all the tuples
necessary, however it might not have taken into account tuples that
have been deleted before the reference snapshot was taken. But, in the
case of REINDEX CONCURRENTLY the index validated is not marked as
valid as it is the case in CREATE INDEX CONCURRENTLY, the transaction
doing the validation is directly committed. This index is thought as
valid only after doing the swap phase, when relfilenodes are changed.

I am sure you will find some flaws in this reasoning though :). Of
course not having being able to break this code now with my picky
tests taking targeted breakpoints does not mean that this code will
not fail in a given scenario, just that I could not break it yet.

Note also that removing those wait phases has the advantage to remove
risks of deadlocks when an ANALYZE is run in parallel to REINDEX
CONCURRENTLY as it was the case in the previous versions of the patch
(reproducible when waiting for the old snapshots if a session takes
ShareUpdateExclusiveLock on the same relation in parallel).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-11 Thread Michael Paquier
On Thu, Jul 11, 2013 at 5:11 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 I am resending the patches after Fujii-san noticed a bug allowing to
 even drop valid toast indexes with the latest code... While looking at
 that, I found a couple of other bugs:
 - two bugs, now fixed, with the code path added in tablecmds.c to
 allow the manual drop of invalid toast indexes:
 -- Even a user having no permission on the parent toast table could
 drop an invalid toast index
 -- A lock on the parent toast relation was not taken as it is the case
 for all the indexes dropped with DROP INDEX
 - Trying to reindex concurrently a mapped catalog leads to an error.
 As they have no relfilenode, I think it makes sense to block reindex
 concurrently in this case, so I modified the core patch in this sense.
This patch status has been changed to returned with feedback.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-04 Thread Michael Paquier
Hi,

I noticed some errors in the comments of the patch committed. Please
find attached a patch to correct that.
Regards,
--
Michael


20130704_reltoastidxid_comments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-04 Thread Fujii Masao
On Thu, Jul 4, 2013 at 3:38 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Hi,

 I noticed some errors in the comments of the patch committed. Please
 find attached a patch to correct that.

Committed. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-03 Thread Andres Freund
On 2013-07-03 10:03:26 +0900, Michael Paquier wrote:
 +static int
 +toast_open_indexes(Relation toastrel,
 +LOCKMODE lock,
 +Relation **toastidxs,
 +int *num_indexes)
 + /*
 +  * Free index list, not necessary as relations are opened and a valid 
 index
 +  * has been found.
 +  */
 + list_free(indexlist);

Missing anymore or such.

 index 9ee9ea2..23e0373 100644
 --- a/src/bin/pg_dump/pg_dump.c
 +++ b/src/bin/pg_dump/pg_dump.c
 @@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
   PQExpBuffer upgrade_query = createPQExpBuffer();
   PGresult   *upgrade_res;
   Oid pg_class_reltoastrelid;
 - Oid pg_class_reltoastidxid;
  
   appendPQExpBuffer(upgrade_query,
 -   SELECT c.reltoastrelid, 
 t.reltoastidxid 
 +   SELECT c.reltoastrelid 
 FROM pg_catalog.pg_class c LEFT JOIN 
 
 pg_catalog.pg_class t ON 
 (c.reltoastrelid = t.oid) 
 WHERE c.oid = '%u'::pg_catalog.oid;,
 @@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
   upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query-data);
  
   pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, 
 PQfnumber(upgrade_res, reltoastrelid)));
 - pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, 
 PQfnumber(upgrade_res, reltoastidxid)));
  
   appendPQExpBuffer(upgrade_buffer,
  \n-- For binary upgrade, must preserve 
 pg_class oids\n);
 @@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
   /* only tables have toast tables, not indexes */
   if (OidIsValid(pg_class_reltoastrelid))
   {
 + PQExpBuffer index_query = createPQExpBuffer();
 + PGresult   *index_res;
 + Oid indexrelid;
 +
   /*
* One complexity is that the table definition might 
 not require
* the creation of a TOAST table, and the TOAST table 
 might have
 @@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
 SELECT 
 binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n,
 
 pg_class_reltoastrelid);
  
 - /* every toast table has an index */
 + /* Every toast table has one valid index, so fetch it 
 first... */
 + appendPQExpBuffer(index_query,
 +   SELECT c.indexrelid 
 +   FROM 
 pg_catalog.pg_index c 
 +   WHERE c.indrelid = 
 %u AND c.indisvalid;,
 +   
 pg_class_reltoastrelid);
 + index_res = ExecuteSqlQueryForSingleRow(fout, 
 index_query-data);
 + indexrelid = atooid(PQgetvalue(index_res, 0,
 +
 PQfnumber(index_res, indexrelid)));
 +
 + /* Then set it */
   appendPQExpBuffer(upgrade_buffer,
 SELECT 
 binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n,
 -   
 pg_class_reltoastidxid);
 +   indexrelid);
 +
 + PQclear(index_res);
 + destroyPQExpBuffer(index_query);

Wouldn't it make more sense to fetch the toast index oid in the query
ontop instead of making a query for every relation?

Looking good!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-03 Thread Michael Paquier
On Wed, Jul 3, 2013 at 11:16 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-07-03 10:03:26 +0900, Michael Paquier wrote:
 index 9ee9ea2..23e0373 100644
 --- a/src/bin/pg_dump/pg_dump.c
 +++ b/src/bin/pg_dump/pg_dump.c
 @@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
   PQExpBuffer upgrade_query = createPQExpBuffer();
   PGresult   *upgrade_res;
   Oid pg_class_reltoastrelid;
 - Oid pg_class_reltoastidxid;

   appendPQExpBuffer(upgrade_query,
 -   SELECT c.reltoastrelid, 
 t.reltoastidxid 
 +   SELECT c.reltoastrelid 
 FROM pg_catalog.pg_class c LEFT 
 JOIN 
 pg_catalog.pg_class t ON 
 (c.reltoastrelid = t.oid) 
 WHERE c.oid = 
 '%u'::pg_catalog.oid;,
 @@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
   upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query-data);

   pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, 
 PQfnumber(upgrade_res, reltoastrelid)));
 - pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, 
 PQfnumber(upgrade_res, reltoastidxid)));

   appendPQExpBuffer(upgrade_buffer,
  \n-- For binary upgrade, must preserve 
 pg_class oids\n);
 @@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
   /* only tables have toast tables, not indexes */
   if (OidIsValid(pg_class_reltoastrelid))
   {
 + PQExpBuffer index_query = createPQExpBuffer();
 + PGresult   *index_res;
 + Oid indexrelid;
 +
   /*
* One complexity is that the table definition might 
 not require
* the creation of a TOAST table, and the TOAST table 
 might have
 @@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
 SELECT 
 binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n,
 
 pg_class_reltoastrelid);

 - /* every toast table has an index */
 + /* Every toast table has one valid index, so fetch it 
 first... */
 + appendPQExpBuffer(index_query,
 +   SELECT c.indexrelid 
 
 +   FROM 
 pg_catalog.pg_index c 
 +   WHERE c.indrelid = 
 %u AND c.indisvalid;,
 +   
 pg_class_reltoastrelid);
 + index_res = ExecuteSqlQueryForSingleRow(fout, 
 index_query-data);
 + indexrelid = atooid(PQgetvalue(index_res, 0,
 +
 PQfnumber(index_res, indexrelid)));
 +
 + /* Then set it */
   appendPQExpBuffer(upgrade_buffer,
 SELECT 
 binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n,
 -   
 pg_class_reltoastidxid);
 +   indexrelid);
 +
 + PQclear(index_res);
 + destroyPQExpBuffer(index_query);

 Wouldn't it make more sense to fetch the toast index oid in the query
 ontop instead of making a query for every relation?
With something like a CASE condition in the upper query for
reltoastrelid? This code path is not only taken by indexes but also by
tables. So I thought that it was cleaner and more readable to fetch
the index OID only if necessary as a separate query.

Regards,
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-03 Thread Andres Freund
On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
  Wouldn't it make more sense to fetch the toast index oid in the query
  ontop instead of making a query for every relation?
 With something like a CASE condition in the upper query for
 reltoastrelid? This code path is not only taken by indexes but also by
 tables. So I thought that it was cleaner and more readable to fetch
 the index OID only if necessary as a separate query.

A left join should do the trick?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-03 Thread Fujii Masao
On Thu, Jul 4, 2013 at 2:41 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jul 4, 2013 at 2:36 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
  Wouldn't it make more sense to fetch the toast index oid in the query
  ontop instead of making a query for every relation?

 +1
 I changed the query that way. Updated version of the patch attached.

 Also I updated the rules.out because Michael changed the system_views.sql.
 Otherwise, the regression test would fail.

 Will commit this patch.

Committed. So, let's get to REINDEX CONCURRENTLY patch!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-03 Thread Michael Paquier
On Thu, Jul 4, 2013 at 3:26 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jul 4, 2013 at 2:41 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jul 4, 2013 at 2:36 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
  Wouldn't it make more sense to fetch the toast index oid in the query
  ontop instead of making a query for every relation?

 +1
 I changed the query that way. Updated version of the patch attached.

 Also I updated the rules.out because Michael changed the system_views.sql.
 Otherwise, the regression test would fail.

 Will commit this patch.

 Committed. So, let's get to REINDEX CONCURRENTLY patch!
Thanks for the hard work! I'll work on something based on MVCC
catalogs, so at least lock will be lowered at swap phase and isolation
tests will be added.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-02 Thread Fujii Masao
On Fri, Jun 28, 2013 at 4:30 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jun 26, 2013 at 1:06 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for updating the patch!
 And thanks for taking time to look at that. I updated the patch
 according to your comments, except for the VACUUM FULL problem. Please
 see patch attached and below for more details.

 When I ran VACUUM FULL, I got the following error.

 ERROR:  attempt to apply a mapping to unmapped relation 16404
 STATEMENT:  vacuum full;
 This can be reproduced when doing a vacuum full on pg_proc,
 pg_shdescription or pg_db_role_setting for example, or relations that
 have no relfilenode (mapped catalogs), and a toast relation. I still
 have no idea what is happening here but I am looking at it. As this
 patch removes reltoastidxid, could that removal have effect on the
 relation mapping of mapped catalogs? Does someone have an idea?

 Could you let me clear why toast_save_datum needs to update even invalid 
 toast
 index? It's required only for REINDEX CONCURRENTLY?
 Because an invalid index might be marked as indisready, so ready to
 receive inserts. Yes this is a requirement for REINDEX CONCURRENTLY,
 and in a more general way a requirement for a relation that includes
 in rd_indexlist indexes that are live, ready but not valid. Just based
 on this remark I spotted a bug in my patch for tuptoaster.c where we
 could insert a new index tuple entry in toast_save_datum for an index
 live but not ready. Fixed that by adding an additional check to the
 flag indisready before calling index_insert.

 @@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)

 toastrel = heap_open(toastrelid, AccessShareLock);

 -   result = toastrel_valueid_exists(toastrel, valueid);
 +   result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);

 toastid_valueid_exists() is used only in toast_save_datum(). So we should use
 RowExclusiveLock here rather than AccessShareLock?
 Makes sense.

 + * toast_open_indexes
 + *
 + * Get an array of index relations associated to the given toast 
 relation
 + * and return as well the position of the valid index used by the toast
 + * relation in this array. It is the responsability of the caller of 
 this

 Typo: responsibility
 Done.

 toast_open_indexes(Relation toastrel,
 +  LOCKMODE lock,
 +  Relation **toastidxs,
 +  int *num_indexes)
 +{
 +   int i = 0;
 +   int res = 0;
 +   boolfound = false;
 +   List   *indexlist;
 +   ListCell   *lc;
 +
 +   /* Get index list of relation */
 +   indexlist = RelationGetIndexList(toastrel);

 What about adding the assertion which checks that the return value
 of RelationGetIndexList() is not NIL?
 Done.

 When I ran pg_upgrade for the upgrade from 9.2 to HEAD (with patch),
 I got the following error. Without the patch, that succeeded.

 command: /dav/reindex/bin/pg_dump --host /dav/reindex --port 50432
 --username postgres --schema-only --quote-all-identifiers
 --binary-upgrade --format=custom
 --file=pg_upgrade_dump_12270.custom postgres 
 pg_upgrade_dump_12270.log 21
 pg_dump: query returned 0 rows instead of one: SELECT c.reltoastrelid,
 t.indexrelid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_index
 t ON (c.reltoastrelid = t.indrelid) WHERE c.oid =
 '16390'::pg_catalog.oid AND t.indisvalid;
 This issue is reproducible easily by having more than 1 table using
 toast indexes in the cluster to be upgraded. The error was on pg_dump
 side when trying to do a binary upgrade. In order to fix that, I
 changed the code binary_upgrade_set_pg_class_oids:pg_dump.c to fetch
 the index associated to a toast relation only if there is a toast
 relation. This adds one extra step in the process for each having a
 toast relation, but makes the code clearer. Note that I checked
 pg_upgrade down to 8.4...

Why did you remove the check of indisvalid from the --binary-upgrade SQL?
Without this check, if there is the invalid toast index, more than one rows are
returned and ExecuteSqlQueryForSingleRow() would cause the error.

+   foreach(lc, indexlist)
+   *toastidxs[i++] = index_open(lfirst_oid(lc), lock);

*toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
happen.

For now I've not found any other big problem except the above.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-02 Thread Fujii Masao
On Wed, Jul 3, 2013 at 5:43 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jul 3, 2013 at 5:22 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Why did you remove the check of indisvalid from the --binary-upgrade SQL?
 Without this check, if there is the invalid toast index, more than one rows 
 are
 returned and ExecuteSqlQueryForSingleRow() would cause the error.

 +   foreach(lc, indexlist)
 +   *toastidxs[i++] = index_open(lfirst_oid(lc), lock);

 *toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault 
 can
 happen.

 For now I've not found any other big problem except the above.

system_views.sql
-GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;

I found another problem. X.indexrelid should be X.indrelid. Otherwise, when
there is the invalid toast index, more than one rows are returned for the same
relation.

 OK cool, updated version attached. If you guys think that the attached
 version is fine (only the reltoasyidxid removal part), perhaps it
 would be worth committing it as Robert also committed the MVCC catalog
 patch today. So we would be able to focus on the core feature asap
 with the 2nd patch, and the removal of AccessExclusiveLock at swap
 step.

Yep, will do. Maybe today.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-01 Thread Fujii Masao
On Mon, Jul 1, 2013 at 9:31 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Hi all,

 Please find attached an updated version of the patch removing
 reltoastidxid (with and w/o context diffs), patch fixing the vacuum
 full issue. With this fix, all the comments are addressed.

Thanks for updating the patch!

I have one question related to VACUUM FULL problem. What happens
if we run VACUUM FULL when there is an invalid toast index? The invalid
toast index is rebuilt and marked as valid, i.e., there can be multiple valid
toast indexes?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-07-01 Thread Michael Paquier
On Tue, Jul 2, 2013 at 7:36 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jul 1, 2013 at 9:31 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Hi all,

 Please find attached an updated version of the patch removing
 reltoastidxid (with and w/o context diffs), patch fixing the vacuum
 full issue. With this fix, all the comments are addressed.

 Thanks for updating the patch!

 I have one question related to VACUUM FULL problem. What happens
 if we run VACUUM FULL when there is an invalid toast index? The invalid
 toast index is rebuilt and marked as valid, i.e., there can be multiple valid
 toast indexes?
The invalid toast indexes are not rebuilt. With the design of this
patch, toast relations can only have one valid index at the same time,
and this is also the path taken by REINDEX CONCURRENTLY for toast
relations. This process is managed by this code in cluster.c, only the
valid index of toast relation is taken into account when rebuilding
relations:
***
*** 1393,1410  swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,

/*
 * If we're swapping two toast tables by content, do the same for their
!* indexes.
 */
if (swap_toast_by_content 
!   relform1-reltoastidxid  relform2-reltoastidxid)
!   swap_relation_files(relform1-reltoastidxid,
!   relform2-reltoastidxid,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);

/* Clean up. */
heap_freetuple(reltup1);
--- 1392,1421 

/*
 * If we're swapping two toast tables by content, do the same for their
!* valid index. The swap can actually be safely done only if
the relations
!* have indexes.
 */
if (swap_toast_by_content 
!   relform1-relkind == RELKIND_TOASTVALUE 
!   relform2-relkind == RELKIND_TOASTVALUE)
!   {
!   Oid toastIndex1, toastIndex2;
!
!   /* Get valid index for each relation */
!   toastIndex1 = toast_get_valid_index(r1,
!
 AccessExclusiveLock);
!   toastIndex2 = toast_get_valid_index(r2,
!
 AccessExclusiveLock);
!
!   swap_relation_files(toastIndex1,
!   toastIndex2,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
+   }

Regards,
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-28 Thread Andres Freund
On 2013-06-28 16:30:16 +0900, Michael Paquier wrote:
  When I ran VACUUM FULL, I got the following error.
 
  ERROR:  attempt to apply a mapping to unmapped relation 16404
  STATEMENT:  vacuum full;
 This can be reproduced when doing a vacuum full on pg_proc,
 pg_shdescription or pg_db_role_setting for example, or relations that
 have no relfilenode (mapped catalogs), and a toast relation. I still
 have no idea what is happening here but I am looking at it. As this
 patch removes reltoastidxid, could that removal have effect on the
 relation mapping of mapped catalogs? Does someone have an idea?

I'd guess you broke swap_toast_by_content case in cluster.c? We cannot
change the oid of a mapped relation (including indexes) since pg_class
in other databases wouldn't get the news.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-28 Thread Michael Paquier
On Fri, Jun 28, 2013 at 4:52 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-06-28 16:30:16 +0900, Michael Paquier wrote:
  When I ran VACUUM FULL, I got the following error.
 
  ERROR:  attempt to apply a mapping to unmapped relation 16404
  STATEMENT:  vacuum full;
 This can be reproduced when doing a vacuum full on pg_proc,
 pg_shdescription or pg_db_role_setting for example, or relations that
 have no relfilenode (mapped catalogs), and a toast relation. I still
 have no idea what is happening here but I am looking at it. As this
 patch removes reltoastidxid, could that removal have effect on the
 relation mapping of mapped catalogs? Does someone have an idea?

 I'd guess you broke swap_toast_by_content case in cluster.c? We cannot
 change the oid of a mapped relation (including indexes) since pg_class
 in other databases wouldn't get the news.
Yeah, I thought that something was broken in swap_relation_files, but
after comparing the code path taken by my code and master, and the
different function calls I can't find any difference. I'm assuming
that there is something wrong in tuptoaster.c in the fact of opening
toast index relations in order to get the Oids to be swapped... But so
far nothing I am just not sure...
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-25 Thread Fujii Masao
On Tue, Jun 25, 2013 at 8:15 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Patch updated according to comments.

Thanks for updating the patch!

When I ran VACUUM FULL, I got the following error.

ERROR:  attempt to apply a mapping to unmapped relation 16404
STATEMENT:  vacuum full;

Could you let me clear why toast_save_datum needs to update even invalid toast
index? It's required only for REINDEX CONCURRENTLY?

@@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)

toastrel = heap_open(toastrelid, AccessShareLock);

-   result = toastrel_valueid_exists(toastrel, valueid);
+   result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);

toastid_valueid_exists() is used only in toast_save_datum(). So we should use
RowExclusiveLock here rather than AccessShareLock?

+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsability of the caller of this

Typo: responsibility

toast_open_indexes(Relation toastrel,
+  LOCKMODE lock,
+  Relation **toastidxs,
+  int *num_indexes)
+{
+   int i = 0;
+   int res = 0;
+   boolfound = false;
+   List   *indexlist;
+   ListCell   *lc;
+
+   /* Get index list of relation */
+   indexlist = RelationGetIndexList(toastrel);

What about adding the assertion which checks that the return value
of RelationGetIndexList() is not NIL?

When I ran pg_upgrade for the upgrade from 9.2 to HEAD (with patch),
I got the following error. Without the patch, that succeeded.

command: /dav/reindex/bin/pg_dump --host /dav/reindex --port 50432
--username postgres --schema-only --quote-all-identifiers
--binary-upgrade --format=custom
--file=pg_upgrade_dump_12270.custom postgres 
pg_upgrade_dump_12270.log 21
pg_dump: query returned 0 rows instead of one: SELECT c.reltoastrelid,
t.indexrelid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_index
t ON (c.reltoastrelid = t.indrelid) WHERE c.oid =
'16390'::pg_catalog.oid AND t.indisvalid;

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-24 Thread Andres Freund
On 2013-06-24 07:46:34 +0900, Michael Paquier wrote:
 On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao masao.fu...@gmail.com wrote:
  Compile error ;)
 It looks like filterdiff did not work correctly when generating the
 latest patch with context diffs, I cannot apply it cleanly wither.
 This is perhaps due to a wrong manipulation from me. Please try the
 attached that has been generated as a raw git output. It works
 correctly with a git apply. I just checked.

Did you check whether that introduces a performance regression?


  /* --
 + * toast_get_valid_index
 + *
 + *   Get the valid index of given toast relation. A toast relation can only
 + *   have one valid index at the same time. The lock taken on the index
 + *   relations is released at the end of this function call.
 + */
 +Oid
 +toast_get_valid_index(Oid toastoid, LOCKMODE lock)
 +{
 + ListCell   *lc;
 + List   *indexlist;
 + int num_indexes, i = 0;
 + Oid validIndexOid;
 + RelationvalidIndexRel;
 + Relation   *toastidxs;
 + Relationtoastrel;
 +
 + /* Get the index list of relation */
 + toastrel = heap_open(toastoid, lock);
 + indexlist = RelationGetIndexList(toastrel);
 + num_indexes = list_length(indexlist);
 +
 + /* Open all the index relations */
 + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
 + foreach(lc, indexlist)
 + toastidxs[i++] = index_open(lfirst_oid(lc), lock);
 +
 + /* Fetch valid toast index */
 + validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes);
 + validIndexOid = RelationGetRelid(validIndexRel);
 +
 + /* Close all the index relations */
 + for (i = 0; i  num_indexes; i++)
 + index_close(toastidxs[i], lock);
 + pfree(toastidxs);
 + list_free(indexlist);
 +
 + heap_close(toastrel, lock);
 + return validIndexOid;
 +}

Just to make sure, could you check we've found a valid index?

  static bool
 -toastrel_valueid_exists(Relation toastrel, Oid valueid)
 +toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
  {
   boolresult = false;
   ScanKeyData toastkey;
   SysScanDesc toastscan;
 + int i = 0;
 + int num_indexes;
 + Relation   *toastidxs;
 + Relationvalidtoastidx;
 + ListCell   *lc;
 + List   *indexlist;
 +
 + /* Ensure that the list of indexes of toast relation is computed */
 + indexlist = RelationGetIndexList(toastrel);
 + num_indexes = list_length(indexlist);
 +
 + /* Open each index relation necessary */
 + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
 + foreach(lc, indexlist)
 + toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
 +
 + /* Fetch a valid index relation */
 + validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);

Those 10 lines are repeated multiple times, in different
functions. Maybe move them into toast_index_fetch_valid and rename that
to
Relation *
toast_open_indexes(Relation toastrel, LOCKMODE mode, size_t *numindexes, size_t 
valididx);

That way we also wouldn't fetch/copy the indexlist twice in some
functions.

 + /* Clean up */
 + for (i = 0; i  num_indexes; i++)
 + index_close(toastidxs[i], lockmode);
 + list_free(indexlist);
 + pfree(toastidxs);

The indexlist could already be freed inside the function proposed
above...


 diff --git a/src/backend/commands/tablecmds.c 
 b/src/backend/commands/tablecmds.c
 index 8294b29..2b777da 100644
 --- a/src/backend/commands/tablecmds.c
 +++ b/src/backend/commands/tablecmds.c
 @@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, 
 LOCKMODE lockmode)
errmsg(cannot move temporary tables of other 
 sessions)));
  

 + foreach(lc, reltoastidxids)
 + {
 + Oid toastidxid = lfirst_oid(lc);
 + if (OidIsValid(toastidxid))
 + ATExecSetTableSpace(toastidxid, newTableSpace, 
 lockmode);
 + }

Copy  pasted OidIsValid(), shouldn't be necessary anymore.


Otherwise I think there's not really much left to be done. Fujii?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-24 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Otherwise I think there's not really much left to be done. Fujii?

Well, other than the fact that we've not got MVCC catalog scans yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-24 Thread Andres Freund
On 2013-06-24 09:57:24 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Otherwise I think there's not really much left to be done. Fujii?
 
 Well, other than the fact that we've not got MVCC catalog scans yet.

That statement was only about about the patch dealing the removal of
reltoastidxid.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-24 Thread Fujii Masao
On Mon, Jun 24, 2013 at 7:39 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-06-24 07:46:34 +0900, Michael Paquier wrote:
 On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao masao.fu...@gmail.com wrote:
  Compile error ;)
 It looks like filterdiff did not work correctly when generating the
 latest patch with context diffs, I cannot apply it cleanly wither.
 This is perhaps due to a wrong manipulation from me. Please try the
 attached that has been generated as a raw git output. It works
 correctly with a git apply. I just checked.

 Did you check whether that introduces a performance regression?


  /* --
 + * toast_get_valid_index
 + *
 + *   Get the valid index of given toast relation. A toast relation can only
 + *   have one valid index at the same time. The lock taken on the index
 + *   relations is released at the end of this function call.
 + */
 +Oid
 +toast_get_valid_index(Oid toastoid, LOCKMODE lock)
 +{
 + ListCell   *lc;
 + List   *indexlist;
 + int num_indexes, i = 0;
 + Oid validIndexOid;
 + RelationvalidIndexRel;
 + Relation   *toastidxs;
 + Relationtoastrel;
 +
 + /* Get the index list of relation */
 + toastrel = heap_open(toastoid, lock);
 + indexlist = RelationGetIndexList(toastrel);
 + num_indexes = list_length(indexlist);
 +
 + /* Open all the index relations */
 + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
 + foreach(lc, indexlist)
 + toastidxs[i++] = index_open(lfirst_oid(lc), lock);
 +
 + /* Fetch valid toast index */
 + validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes);
 + validIndexOid = RelationGetRelid(validIndexRel);
 +
 + /* Close all the index relations */
 + for (i = 0; i  num_indexes; i++)
 + index_close(toastidxs[i], lock);
 + pfree(toastidxs);
 + list_free(indexlist);
 +
 + heap_close(toastrel, lock);
 + return validIndexOid;
 +}

 Just to make sure, could you check we've found a valid index?

  static bool
 -toastrel_valueid_exists(Relation toastrel, Oid valueid)
 +toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
  {
   boolresult = false;
   ScanKeyData toastkey;
   SysScanDesc toastscan;
 + int i = 0;
 + int num_indexes;
 + Relation   *toastidxs;
 + Relationvalidtoastidx;
 + ListCell   *lc;
 + List   *indexlist;
 +
 + /* Ensure that the list of indexes of toast relation is computed */
 + indexlist = RelationGetIndexList(toastrel);
 + num_indexes = list_length(indexlist);
 +
 + /* Open each index relation necessary */
 + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
 + foreach(lc, indexlist)
 + toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
 +
 + /* Fetch a valid index relation */
 + validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);

 Those 10 lines are repeated multiple times, in different
 functions. Maybe move them into toast_index_fetch_valid and rename that
 to
 Relation *
 toast_open_indexes(Relation toastrel, LOCKMODE mode, size_t *numindexes, 
 size_t valididx);

 That way we also wouldn't fetch/copy the indexlist twice in some
 functions.

 + /* Clean up */
 + for (i = 0; i  num_indexes; i++)
 + index_close(toastidxs[i], lockmode);
 + list_free(indexlist);
 + pfree(toastidxs);

 The indexlist could already be freed inside the function proposed
 above...


 diff --git a/src/backend/commands/tablecmds.c 
 b/src/backend/commands/tablecmds.c
 index 8294b29..2b777da 100644
 --- a/src/backend/commands/tablecmds.c
 +++ b/src/backend/commands/tablecmds.c
 @@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, 
 LOCKMODE lockmode)
errmsg(cannot move temporary tables of other 
 sessions)));


 + foreach(lc, reltoastidxids)
 + {
 + Oid toastidxid = lfirst_oid(lc);
 + if (OidIsValid(toastidxid))
 + ATExecSetTableSpace(toastidxid, newTableSpace, 
 lockmode);
 + }

 Copy  pasted OidIsValid(), shouldn't be necessary anymore.


 Otherwise I think there's not really much left to be done. Fujii?

Yep, will check.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-24 Thread Michael Paquier
On Mon, Jun 24, 2013 at 11:06 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-06-24 09:57:24 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Otherwise I think there's not really much left to be done. Fujii?

 Well, other than the fact that we've not got MVCC catalog scans yet.

 That statement was only about about the patch dealing the removal of
 reltoastidxid.
Partially my mistake. It is not that obvious just based on the name of
this thread, so I should have moved the review of this particular
patch to another thread.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-23 Thread Fujii Masao
On Wed, Jun 19, 2013 at 9:50 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jun 19, 2013 at 12:36 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 An updated patch for the toast part is attached.

 On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Here are the review comments of the removal_of_reltoastidxid patch.
 I've not completed the review yet, but I'd like to post the current 
 comments
 before going to bed ;)

 *** a/src/backend/catalog/system_views.sql
 -pg_stat_get_blocks_fetched(X.oid) -
 -pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
 -pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
 +pg_stat_get_blocks_fetched(X.indrelid) -
 +pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
 +pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit

 ISTM that X.indrelid indicates the TOAST table not the TOAST index.
 Shouldn't we use X.indexrelid instead of X.indrelid?
 Indeed good catch! We need in this case the statistics on the index
 and here I used the table OID. Btw, I also noticed that as multiple
 indexes may be involved for a given toast relation, it makes sense to
 actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
 stats of the indexes.

 Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
 Otherwise, you may get two rows of the same table from pg_statio_all_tables.
 I changed it a little bit in a different way in my latest patch by
 adding a sum on all the indexes when getting tidx_blks stats.

 doc/src/sgml/diskusage.sgml
 There will be one index on the
 acronymTOAST/ table, if present.

 +   table (see xref linkend=storage-toast).  There will be one valid 
 index
 +   on the acronymTOAST/ table, if present. There also might be indexes

 When I used gdb and tracked the code path of concurrent reindex patch,
 I found it's possible that more than one *valid* toast indexes appear. Those
 multiple valid toast indexes are viewable, for example, from pg_indexes.
 I'm not sure whether this is the bug of concurrent reindex patch. But
 if it's not,
 you seem to need to change the above description again.
 Not sure about that. The latest code is made such as only one valid
 index is present on the toast relation at the same time.


 *** a/src/bin/pg_dump/pg_dump.c
 + SELECT c.reltoastrelid, 
 t.indexrelid 
   FROM pg_catalog.pg_class c LEFT 
 JOIN 
 - pg_catalog.pg_class t ON 
 (c.reltoastrelid = t.oid) 
 - WHERE c.oid = 
 '%u'::pg_catalog.oid;,
 + pg_catalog.pg_index t ON 
 (c.reltoastrelid = t.indrelid) 
 + WHERE c.oid = 
 '%u'::pg_catalog.oid AND t.indisvalid 
 + LIMIT 1,

 Is there the case where TOAST table has more than one *valid* indexes?
 I just rechecked the patch and is answer is no. The concurrent index
 is set as valid inside the same transaction as swap. So only the
 backend performing the swap will be able to see two valid toast
 indexes at the same time.

 According to my quick gdb testing, this seems not to be true
 Well, I have to disagree. I am not able to reproduce it. Which version
 did you use? Here is what I get with the latest version of REINDEX
 CONCURRENTLY patch... I checked with the following process:

Sorry. This is my mistake.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-23 Thread Fujii Masao
On Sun, Jun 23, 2013 at 3:34 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 OK. Please find an updated patch for the toast part.

 On Sat, Jun 22, 2013 at 10:48 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
 On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
  By looking at the comments of RelationGetIndexList:relcache.c,
  actually the method of the patch is correct because in the event of a
  shared cache invalidation, rd_indexvalid is set to 0 when the index
  list is reset, so the index list would get recomputed even in the case
  of shared mem invalidation.
 
  The problem I see is something else. Consider code like the following:
 
  RelationFetchIndexListIfInvalid(toastrel);
  foreach(lc, toastrel-rd_indexlist)
 toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
 
  index_open calls relation_open calls LockRelationOid which does:
  if (res != LOCKACQUIRE_ALREADY_HELD)
 AcceptInvalidationMessages();
 
  So, what might happen is that you open the first index, which accepts an
  invalidation message which in turn might delete the indexlist. Which
  means we would likely read invalid memory if there are two indexes.
 And I imagine that you have the same problem even with
 RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
 this would appear as long as you try to open more than 1 index with an
 index list.

 No. RelationGetIndexList() returns a copy of the list for exactly that
 reason. The danger is not to see an outdated list - we should be
 protected by locks against that - but looking at uninitialized or reused
 memory.
 OK, so I removed RelationGetIndexListIfInvalid (such things could be
 an optimization for another patch) and replaced it by calls to
 RelationGetIndexList to get a copy of rd_indexlist in a local list
 variable, list free'd when it is not necessary anymore.

 It looks that there is nothing left for this patch, no?

Compile error ;)

gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -I../../../src/include   -c -o index.o index.c
index.c: In function 'index_constraint_create':
index.c:1257: error: too many arguments to function 'index_update_stats'
index.c: At top level:
index.c:1785: error: conflicting types for 'index_update_stats'
index.c:106: error: previous declaration of 'index_update_stats' was here
index.c: In function 'index_update_stats':
index.c:1881: error: 'FormData_pg_class' has no member named 'reltoastidxid'
index.c:1883: error: 'FormData_pg_class' has no member named 'reltoastidxid'
make[3]: *** [index.o] Error 1
make[2]: *** [catalog-recursive] Error 2
make[1]: *** [install-backend-recurse] Error 2
make: *** [install-src-recurse] Error 2

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-22 Thread Andres Freund
On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
 On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Hm. Looking at how this is currently used - I am afraid it's not
  correct... the reason RelationGetIndexList() returns a copy is that
  cache invalidations will throw away that list. And you do index_open()
  while iterating over it which will accept invalidation messages.
  Mybe it's better to try using RelationGetIndexList directly and measure
  whether that has a measurable impact=
 By looking at the comments of RelationGetIndexList:relcache.c,
 actually the method of the patch is correct because in the event of a
 shared cache invalidation, rd_indexvalid is set to 0 when the index
 list is reset, so the index list would get recomputed even in the case
 of shared mem invalidation.

The problem I see is something else. Consider code like the following:

RelationFetchIndexListIfInvalid(toastrel);
foreach(lc, toastrel-rd_indexlist)
   toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);

index_open calls relation_open calls LockRelationOid which does:
if (res != LOCKACQUIRE_ALREADY_HELD)
   AcceptInvalidationMessages();

So, what might happen is that you open the first index, which accepts an
invalidation message which in turn might delete the indexlist. Which
means we would likely read invalid memory if there are two indexes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-22 Thread Michael Paquier
On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
 On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Hm. Looking at how this is currently used - I am afraid it's not
  correct... the reason RelationGetIndexList() returns a copy is that
  cache invalidations will throw away that list. And you do index_open()
  while iterating over it which will accept invalidation messages.
  Mybe it's better to try using RelationGetIndexList directly and measure
  whether that has a measurable impact=
 By looking at the comments of RelationGetIndexList:relcache.c,
 actually the method of the patch is correct because in the event of a
 shared cache invalidation, rd_indexvalid is set to 0 when the index
 list is reset, so the index list would get recomputed even in the case
 of shared mem invalidation.

 The problem I see is something else. Consider code like the following:

 RelationFetchIndexListIfInvalid(toastrel);
 foreach(lc, toastrel-rd_indexlist)
toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);

 index_open calls relation_open calls LockRelationOid which does:
 if (res != LOCKACQUIRE_ALREADY_HELD)
AcceptInvalidationMessages();

 So, what might happen is that you open the first index, which accepts an
 invalidation message which in turn might delete the indexlist. Which
 means we would likely read invalid memory if there are two indexes.
And I imagine that you have the same problem even with
RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
this would appear as long as you try to open more than 1 index with an
index list.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-22 Thread Andres Freund
On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
 On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
  By looking at the comments of RelationGetIndexList:relcache.c,
  actually the method of the patch is correct because in the event of a
  shared cache invalidation, rd_indexvalid is set to 0 when the index
  list is reset, so the index list would get recomputed even in the case
  of shared mem invalidation.
 
  The problem I see is something else. Consider code like the following:
 
  RelationFetchIndexListIfInvalid(toastrel);
  foreach(lc, toastrel-rd_indexlist)
 toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
 
  index_open calls relation_open calls LockRelationOid which does:
  if (res != LOCKACQUIRE_ALREADY_HELD)
 AcceptInvalidationMessages();
 
  So, what might happen is that you open the first index, which accepts an
  invalidation message which in turn might delete the indexlist. Which
  means we would likely read invalid memory if there are two indexes.
 And I imagine that you have the same problem even with
 RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
 this would appear as long as you try to open more than 1 index with an
 index list.

No. RelationGetIndexList() returns a copy of the list for exactly that
reason. The danger is not to see an outdated list - we should be
protected by locks against that - but looking at uninitialized or reused
memory.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-22 Thread Alvaro Herrera
Andres Freund escribió:
 On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:

  And I imagine that you have the same problem even with
  RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
  this would appear as long as you try to open more than 1 index with an
  index list.
 
 No. RelationGetIndexList() returns a copy of the list for exactly that
 reason. The danger is not to see an outdated list - we should be
 protected by locks against that - but looking at uninitialized or reused
 memory.

Are we doing this only to save some palloc traffic?  Could we do this
by, say, teaching list_copy() to have a special case for lists of ints
and oids that allocates all the cells in a single palloc chunk?

(This has the obvious problem that list_free no longer works, of
course.  But I think that specific problem can be easily fixed.  Not
sure if it causes more breakage elsewhere.)

Alternatively, I guess we could grab an uncopied list, then copy the
items individually into a locally allocated array, avoiding list_copy.
We'd need to iterate differently than with foreach().

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Andres Freund
On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
 Please find an updated patch. The regression test rules has been
 updated, and all the comments are addressed.
 
 On Tue, Jun 18, 2013 at 6:35 PM, Andres Freund and...@2ndquadrant.com wrote:
  Hi,
 
  On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
  diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
  index c381f11..3a6342c 100644
  --- a/contrib/pg_upgrade/info.c
  +++ b/contrib/pg_upgrade/info.c
  @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
  INSERT INTO 
  info_rels 
  SELECT 
  reltoastrelid 
  FROM info_rels i 
  JOIN pg_catalog.pg_class c 
  -ON 
  i.reloid = c.oid));
  +ON 
  i.reloid = c.oid 
  +AND 
  c.reltoastrelid != %u, InvalidOid));
PQclear(executeQueryOrDie(conn,
  INSERT INTO 
  info_rels 
  -   SELECT 
  reltoastidxid 
  -   FROM info_rels i 
  JOIN pg_catalog.pg_class c 
  -ON 
  i.reloid = c.oid));
  +   SELECT indexrelid 
  
  +   FROM pg_index 
  +   WHERE indrelid IN 
  (SELECT reltoastrelid 
  + FROM 
  pg_class 
  + WHERE 
  oid = %u 
  +AND 
  reltoastrelid != %u),
  +   
  FirstNormalObjectId, InvalidOid));
 
  What's the idea behind the = here?
 It is here to avoid fetching the toast relations of system tables. But
 I see your point, the inner query fetching the toast OIDs should do a
 join on the exising info_rels and not try to do a join on a plain
 pg_index, so changed this way.

I'd also rather not introduce knowledge about FirstNormalObjectId into
client applications... But you fixed it already.


/* Clean up. */
heap_freetuple(reltup1);
  @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel-rd_rel-reltoastrelid))
{
Relationtoastrel;
  - Oid toastidx;
charNewToastName[NAMEDATALEN];
  + ListCell   *lc;
  + int count = 0;
 
toastrel = 
  relation_open(newrel-rd_rel-reltoastrelid,
 
  AccessShareLock);
  - toastidx = toastrel-rd_rel-reltoastidxid;
  + RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
 
/* rename the toast table ... */
  @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel-rd_rel-reltoastrelid,
   
  NewToastName, true);
 
  - /* ... and its index too */
  - snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index,
  -  OIDOldHeap);
  - RenameRelationInternal(toastidx,
  -
  NewToastName, true);
  + /* ... and its indexes too */
  + foreach(lc, toastrel-rd_indexlist)
  + {
  + /*
  +  * The first index keeps the former toast 
  name and the
  +  * following entries have a suffix appended.
  +  */
  + if (count == 0)
  + snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index,
  +  OIDOldHeap);
  + else
  + snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index_%d,
  +  OIDOldHeap, count);
  + RenameRelationInternal(lfirst_oid(lc),
  +

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Michael Paquier
On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
/* Clean up. */
heap_freetuple(reltup1);
  @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel-rd_rel-reltoastrelid))
{
Relationtoastrel;
  - Oid toastidx;
charNewToastName[NAMEDATALEN];
  + ListCell   *lc;
  + int count = 0;
 
toastrel = 
  relation_open(newrel-rd_rel-reltoastrelid,
 
  AccessShareLock);
  - toastidx = toastrel-rd_rel-reltoastidxid;
  + RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
 
/* rename the toast table ... */
  @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,

  RenameRelationInternal(newrel-rd_rel-reltoastrelid,
   
  NewToastName, true);
 
  - /* ... and its index too */
  - snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index,
  -  OIDOldHeap);
  - RenameRelationInternal(toastidx,
  -
  NewToastName, true);
  + /* ... and its indexes too */
  + foreach(lc, toastrel-rd_indexlist)
  + {
  + /*
  +  * The first index keeps the former toast 
  name and the
  +  * following entries have a suffix appended.
  +  */
  + if (count == 0)
  + snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index,
  +  OIDOldHeap);
  + else
  + snprintf(NewToastName, NAMEDATALEN, 
  pg_toast_%u_index_%d,
  +  OIDOldHeap, count);
  + RenameRelationInternal(lfirst_oid(lc),
  +
  NewToastName, true);
  + count++;
  + }
}
relation_close(newrel, NoLock);
}
 
  Is it actually possible to get here with multiple toast indexes?
 Actually it is possible. finish_heap_swap is also called for example
 in ALTER TABLE where rewriting the table (phase 3), so I think it is
 better to protect this code path this way.

 But why would we copy invalid toast indexes over to the new relation?
 Shouldn't the new relation have been freshly built in the previous
 steps?
What do you think about that? Using only the first valid index would be enough?


  diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
  index 8ac2549..31309ed 100644
  --- a/src/include/utils/relcache.h
  +++ b/src/include/utils/relcache.h
  @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
   typedef Relation *RelationPtr;
 
   /*
  + * RelationGetIndexListIfValid
  + * Get index list of relation without recomputing it.
  + */
  +#define RelationGetIndexListIfValid(rel) \
  +do { \
  + if (rel-rd_indexvalid == 0) \
  + RelationGetIndexList(rel); \
  +} while(0)
 
  Isn't this function misnamed and should be
  RelationGetIndexListIfInValid?
 When naming that; I had more in mind: get the list of indexes if it
 is already there. It looks more intuitive to my mind.

 I can't follow. RelationGetIndexListIfValid() doesn't return
 anything. And it doesn't do anything if the list is already valid. It
 only does something iff the list currently is invalid.
In this case RelationGetIndexListIfInvalid?
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Andres Freund
On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
 On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
   @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
   Is it actually possible to get here with multiple toast indexes?
  Actually it is possible. finish_heap_swap is also called for example
  in ALTER TABLE where rewriting the table (phase 3), so I think it is
  better to protect this code path this way.
 
  But why would we copy invalid toast indexes over to the new relation?
  Shouldn't the new relation have been freshly built in the previous
  steps?
 What do you think about that? Using only the first valid index would be 
 enough?

What I am thinking about is the following: When we rewrite a relation,
we build a completely new toast relation. Which will only have one
index, right? So I don't see how this could could be correct if we deal
with multiple indexes. In fact, the current patch's swap_relation_files
throws an error if there are multiple ones around.


   diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
   index 8ac2549..31309ed 100644
   --- a/src/include/utils/relcache.h
   +++ b/src/include/utils/relcache.h
   @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
  
/*
   + * RelationGetIndexListIfValid
   + * Get index list of relation without recomputing it.
   + */
   +#define RelationGetIndexListIfValid(rel) \
   +do { \
   + if (rel-rd_indexvalid == 0) \
   + RelationGetIndexList(rel); \
   +} while(0)
  
   Isn't this function misnamed and should be
   RelationGetIndexListIfInValid?
  When naming that; I had more in mind: get the list of indexes if it
  is already there. It looks more intuitive to my mind.
 
  I can't follow. RelationGetIndexListIfValid() doesn't return
  anything. And it doesn't do anything if the list is already valid. It
  only does something iff the list currently is invalid.
 In this case RelationGetIndexListIfInvalid?

Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?

Hm. Looking at how this is currently used - I am afraid it's not
correct... the reason RelationGetIndexList() returns a copy is that
cache invalidations will throw away that list. And you do index_open()
while iterating over it which will accept invalidation messages.
Mybe it's better to try using RelationGetIndexList directly and measure
whether that has a measurable impact=

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Michael Paquier
OK let's finalize this patch first. I'll try to send an updated patch
within today.

On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
 On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
   @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid 
   OIDNewHeap,
   Is it actually possible to get here with multiple toast indexes?
  Actually it is possible. finish_heap_swap is also called for example
  in ALTER TABLE where rewriting the table (phase 3), so I think it is
  better to protect this code path this way.
 
  But why would we copy invalid toast indexes over to the new relation?
  Shouldn't the new relation have been freshly built in the previous
  steps?
 What do you think about that? Using only the first valid index would be 
 enough?

 What I am thinking about is the following: When we rewrite a relation,
 we build a completely new toast relation. Which will only have one
 index, right? So I don't see how this could could be correct if we deal
 with multiple indexes. In fact, the current patch's swap_relation_files
 throws an error if there are multiple ones around.
Yes, OK. Let me have a look at the code of CLUSTER more in details
before giving a precise answer, but I'll try to remove that renaming
part.  Btw, I'd like to add an assertion in the code at least to
prevent wrong use of this code path.

   diff --git a/src/include/utils/relcache.h 
   b/src/include/utils/relcache.h
   index 8ac2549..31309ed 100644
   --- a/src/include/utils/relcache.h
   +++ b/src/include/utils/relcache.h
   @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
  
/*
   + * RelationGetIndexListIfValid
   + * Get index list of relation without recomputing it.
   + */
   +#define RelationGetIndexListIfValid(rel) \
   +do { \
   + if (rel-rd_indexvalid == 0) \
   + RelationGetIndexList(rel); \
   +} while(0)
  
   Isn't this function misnamed and should be
   RelationGetIndexListIfInValid?
  When naming that; I had more in mind: get the list of indexes if it
  is already there. It looks more intuitive to my mind.
 
  I can't follow. RelationGetIndexListIfValid() doesn't return
  anything. And it doesn't do anything if the list is already valid. It
  only does something iff the list currently is invalid.
 In this case RelationGetIndexListIfInvalid?

 Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?

 Hm. Looking at how this is currently used - I am afraid it's not
 correct... the reason RelationGetIndexList() returns a copy is that
 cache invalidations will throw away that list. And you do index_open()
 while iterating over it which will accept invalidation messages.
 Mybe it's better to try using RelationGetIndexList directly and measure
 whether that has a measurable impact=
Yes, I was wondering about potential memory leak that list_copy could
introduce in tuptoaster.c when doing a bulk insert, that's the only
reason why I added this macro.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-21 Thread Michael Paquier
On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hm. Looking at how this is currently used - I am afraid it's not
 correct... the reason RelationGetIndexList() returns a copy is that
 cache invalidations will throw away that list. And you do index_open()
 while iterating over it which will accept invalidation messages.
 Mybe it's better to try using RelationGetIndexList directly and measure
 whether that has a measurable impact=
By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-18 Thread Andres Freund
Hi,

On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
 diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
 index c381f11..3a6342c 100644
 --- a/contrib/pg_upgrade/info.c
 +++ b/contrib/pg_upgrade/info.c
 @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 INSERT INTO 
 info_rels 
 SELECT reltoastrelid 
 
 FROM info_rels i 
 JOIN pg_catalog.pg_class c 
 -ON 
 i.reloid = c.oid));
 +ON 
 i.reloid = c.oid 
 +AND 
 c.reltoastrelid != %u, InvalidOid));
   PQclear(executeQueryOrDie(conn,
 INSERT INTO 
 info_rels 
 -   SELECT reltoastidxid 
 
 -   FROM info_rels i 
 JOIN pg_catalog.pg_class c 
 -ON 
 i.reloid = c.oid));
 +   SELECT indexrelid 
 +   FROM pg_index 
 +   WHERE indrelid IN 
 (SELECT reltoastrelid 
 + FROM 
 pg_class 
 + WHERE oid 
 = %u 
 +AND 
 reltoastrelid != %u),
 +   FirstNormalObjectId, 
 InvalidOid));

What's the idea behind the = here?

I think we should ignore the invalid indexes in that SELECT?


 @@ -1392,19 +1390,62 @@ swap_relation_files(Oid r1, Oid r2, bool 
 target_is_pg_class,
   }
  
   /*
 -  * If we're swapping two toast tables by content, do the same for their
 -  * indexes.
 +  * If we're swapping two toast tables by content, do the same for all of
 +  * their indexes. The swap can actually be safely done only if the
 +  * relations have indexes.
*/
   if (swap_toast_by_content 
 - relform1-reltoastidxid  relform2-reltoastidxid)
 - swap_relation_files(relform1-reltoastidxid,
 - relform2-reltoastidxid,
 - target_is_pg_class,
 - swap_toast_by_content,
 - is_internal,
 - InvalidTransactionId,
 - InvalidMultiXactId,
 - mapped_tables);
 + relform1-reltoastrelid 
 + relform2-reltoastrelid)
 + {
 + Relation toastRel1, toastRel2;
 +
 + /* Open relations */
 + toastRel1 = heap_open(relform1-reltoastrelid, 
 AccessExclusiveLock);
 + toastRel2 = heap_open(relform2-reltoastrelid, 
 AccessExclusiveLock);
 +
 + /* Obtain index list */
 + RelationGetIndexList(toastRel1);
 + RelationGetIndexList(toastRel2);
 +
 + /* Check if the swap is possible for all the toast indexes */
 + if (list_length(toastRel1-rd_indexlist) == 1 
 + list_length(toastRel2-rd_indexlist) == 1)
 + {
 + ListCell *lc1, *lc2;
 +
 + /* Now swap each couple */
 + lc2 = list_head(toastRel2-rd_indexlist);
 + foreach(lc1, toastRel1-rd_indexlist)
 + {
 + Oid indexOid1 = lfirst_oid(lc1);
 + Oid indexOid2 = lfirst_oid(lc2);
 + swap_relation_files(indexOid1,
 + 
 indexOid2,
 + 
 target_is_pg_class,
 + 
 swap_toast_by_content,
 + 
 is_internal,
 + 
 InvalidTransactionId,
 + 
 InvalidMultiXactId,
 + 
 mapped_tables);
 + lc2 = lnext(lc2);
 + }

Why are you iterating over the indexlists after checking they are both
of length == 1? 

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-18 Thread Andres Freund
Hi,

On 2013-06-18 11:35:10 +0200, Andres Freund wrote:
 Going to do some performance tests now.

Ok, so ran the worst case load I could think of and didn't notice
any relevant performance changes.

The test I ran was:

CREATE TABLE test_toast(id serial primary key, data text);
ALTER TABLE test_toast ALTER COLUMN data SET STORAGE external;
INSERT INTO test_toast(data) SELECT repeat('a', 8000) FROM generate_series(1, 
20);
VACUUM FREEZE test_toast;

And then with that:
\setrandom id 1 20
SELECT id, substring(data, 1, 10) FROM test_toast WHERE id = :id;

Which should really stress the potentially added overhead since we're
doing many toast accesses, but always only fetch one chunk.


One other thing: Your latest patch forgot to adjust rules.out, so make
check didn't pass...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-18 Thread Fujii Masao
On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 An updated patch for the toast part is attached.

 On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Here are the review comments of the removal_of_reltoastidxid patch.
 I've not completed the review yet, but I'd like to post the current comments
 before going to bed ;)

 *** a/src/backend/catalog/system_views.sql
 -pg_stat_get_blocks_fetched(X.oid) -
 -pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
 -pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
 +pg_stat_get_blocks_fetched(X.indrelid) -
 +pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
 +pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit

 ISTM that X.indrelid indicates the TOAST table not the TOAST index.
 Shouldn't we use X.indexrelid instead of X.indrelid?
 Indeed good catch! We need in this case the statistics on the index
 and here I used the table OID. Btw, I also noticed that as multiple
 indexes may be involved for a given toast relation, it makes sense to
 actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
 stats of the indexes.

Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
Otherwise, you may get two rows of the same table from pg_statio_all_tables.

 doc/src/sgml/diskusage.sgml
 There will be one index on the
 acronymTOAST/ table, if present.

+   table (see xref linkend=storage-toast).  There will be one valid index
+   on the acronymTOAST/ table, if present. There also might be indexes

When I used gdb and tracked the code path of concurrent reindex patch,
I found it's possible that more than one *valid* toast indexes appear. Those
multiple valid toast indexes are viewable, for example, from pg_indexes.
I'm not sure whether this is the bug of concurrent reindex patch. But
if it's not,
you seem to need to change the above description again.

 *** a/src/bin/pg_dump/pg_dump.c
 + SELECT c.reltoastrelid, 
 t.indexrelid 
   FROM pg_catalog.pg_class c LEFT 
 JOIN 
 - pg_catalog.pg_class t ON 
 (c.reltoastrelid = t.oid) 
 - WHERE c.oid = 
 '%u'::pg_catalog.oid;,
 + pg_catalog.pg_index t ON 
 (c.reltoastrelid = t.indrelid) 
 + WHERE c.oid = 
 '%u'::pg_catalog.oid AND t.indisvalid 
 + LIMIT 1,

 Is there the case where TOAST table has more than one *valid* indexes?
 I just rechecked the patch and is answer is no. The concurrent index
 is set as valid inside the same transaction as swap. So only the
 backend performing the swap will be able to see two valid toast
 indexes at the same time.

According to my quick gdb testing, this seems not to be true

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-18 Thread Fujii Masao
On Tue, Jun 18, 2013 at 9:54 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 On 2013-06-18 11:35:10 +0200, Andres Freund wrote:
 Going to do some performance tests now.

 Ok, so ran the worst case load I could think of and didn't notice
 any relevant performance changes.

 The test I ran was:

 CREATE TABLE test_toast(id serial primary key, data text);
 ALTER TABLE test_toast ALTER COLUMN data SET STORAGE external;
 INSERT INTO test_toast(data) SELECT repeat('a', 8000) FROM generate_series(1, 
 20);
 VACUUM FREEZE test_toast;

 And then with that:
 \setrandom id 1 20
 SELECT id, substring(data, 1, 10) FROM test_toast WHERE id = :id;

 Which should really stress the potentially added overhead since we're
 doing many toast accesses, but always only fetch one chunk.

Sounds really good!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-18 Thread Michael Paquier
On Wed, Jun 19, 2013 at 12:36 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 An updated patch for the toast part is attached.

 On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Here are the review comments of the removal_of_reltoastidxid patch.
 I've not completed the review yet, but I'd like to post the current comments
 before going to bed ;)

 *** a/src/backend/catalog/system_views.sql
 -pg_stat_get_blocks_fetched(X.oid) -
 -pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
 -pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
 +pg_stat_get_blocks_fetched(X.indrelid) -
 +pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
 +pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit

 ISTM that X.indrelid indicates the TOAST table not the TOAST index.
 Shouldn't we use X.indexrelid instead of X.indrelid?
 Indeed good catch! We need in this case the statistics on the index
 and here I used the table OID. Btw, I also noticed that as multiple
 indexes may be involved for a given toast relation, it makes sense to
 actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
 stats of the indexes.

 Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
 Otherwise, you may get two rows of the same table from pg_statio_all_tables.
I changed it a little bit in a different way in my latest patch by
adding a sum on all the indexes when getting tidx_blks stats.

 doc/src/sgml/diskusage.sgml
 There will be one index on the
 acronymTOAST/ table, if present.

 +   table (see xref linkend=storage-toast).  There will be one valid index
 +   on the acronymTOAST/ table, if present. There also might be indexes

 When I used gdb and tracked the code path of concurrent reindex patch,
 I found it's possible that more than one *valid* toast indexes appear. Those
 multiple valid toast indexes are viewable, for example, from pg_indexes.
 I'm not sure whether this is the bug of concurrent reindex patch. But
 if it's not,
 you seem to need to change the above description again.
Not sure about that. The latest code is made such as only one valid
index is present on the toast relation at the same time.


 *** a/src/bin/pg_dump/pg_dump.c
 + SELECT c.reltoastrelid, 
 t.indexrelid 
   FROM pg_catalog.pg_class c LEFT 
 JOIN 
 - pg_catalog.pg_class t ON 
 (c.reltoastrelid = t.oid) 
 - WHERE c.oid = 
 '%u'::pg_catalog.oid;,
 + pg_catalog.pg_index t ON 
 (c.reltoastrelid = t.indrelid) 
 + WHERE c.oid = 
 '%u'::pg_catalog.oid AND t.indisvalid 
 + LIMIT 1,

 Is there the case where TOAST table has more than one *valid* indexes?
 I just rechecked the patch and is answer is no. The concurrent index
 is set as valid inside the same transaction as swap. So only the
 backend performing the swap will be able to see two valid toast
 indexes at the same time.

 According to my quick gdb testing, this seems not to be true
Well, I have to disagree. I am not able to reproduce it. Which version
did you use? Here is what I get with the latest version of REINDEX
CONCURRENTLY patch... I checked with the following process:
1) Create this table:
CREATE TABLE aa (a int, b text);
ALTER TABLE aa ALTER COLUMN b SET STORAGE EXTERNAL;
2) Create session 1 and take a breakpoint on
ReindexRelationConcurrently:indexcmds.c
3) Launch REINDEX TABLE CONCURRENTLY aa
4) With a 2nd session, Go through all the phases of the process and
scanned the validity of toast indexes with the following
ioltas=# select pg_class.relname, indisvalid, indisready from
pg_class, pg_index where pg_class.reltoastrelid = pg_index.indrelid
and pg_class.relname = 'aa';
 relname | indisvalid | indisready
-++
 aa  | t  | t
 aa  | f  | t
(2 rows)

When scanning all the phases with the 2nd psql session (the concurrent
index creation, build, validation, swap, and drop of the concurrent
index), I saw at no moment that indisvalid was set at true for the two
indexes at the same time. indisready was of course changed to prepare
the concurrent index to be ready for inserts, but that was all and
this is part of the process.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Michael Paquier
On Mon, Jun 17, 2013 at 5:23 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
  On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
  michael.paqu...@gmail.com wrote:
   Hi all,
  
   Please find attached the latest versions of REINDEX CONCURRENTLY for
 the 1st
   commit fest of 9.4:
   - 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to
 allow
   a toast relation to have multiple indexes running in parallel (extra
 indexes
   could be created by a REINDEX CONCURRENTLY processed)
   - 20130606_2_reindex_concurrently_v26.patch, correcting some comments
 and
   fixed a lock in index_concurrent_create on an index relation not
 released at
   the end of a transaction
 
  Could you let me know how this patch has something to do with MVCC
 catalog
  access patch? Should we wait for MVCC catalog access patch to be
 committed
  before starting to review this patch?

 I wondered the same. The MVCC catalog patch, if applied, would make it
 possible to make the actual relfilenode swap concurrently instead of
 requiring to take access exlusive locks which obviously is way nicer. On
 the other hand, that function is only a really small part of this patch,
 so it seems quite possible to make another pass at it before relying on
 mvcc catalog scans.

As mentionned by Andres, the only thing that the MVCC catalog patch can
improve here
is the index swap phase (index_concurrent_swap:index.c) where the
relfilenode of the
old and new indexes are exchanged. Now an AccessExclusiveLock is taken on
the 2 relations
being swap, we could leverage that to ShareUpdateExclusiveLock with the
MVCC catalog
access I think.

Also, with the MVCC catalog patch in, we could add some isolation tests for
REINDEX CONCURRENTLY (there were some tests in one of the previous
versions),
what is currently not possible due to the exclusive lock taken at swap
phase.

Btw, those are minor things in the patch, so I think that it would be
better to not wait
for the MVCC catalog patch. Even if you think that it would be better to
wait for it,
you could even begin with the 1st patch allowing a toast relation to have
multiple
indexes (removal of reltoastidxid) which does not depend at all on it.

Thanks,
-- 
Michael


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Peter Eisentraut
On 6/17/13 8:23 AM, Michael Paquier wrote:
 As mentionned by Andres, the only thing that the MVCC catalog patch can
 improve here
 is the index swap phase (index_concurrent_swap:index.c) where the
 relfilenode of the
 old and new indexes are exchanged. Now an AccessExclusiveLock is taken
 on the 2 relations
 being swap, we could leverage that to ShareUpdateExclusiveLock with the
 MVCC catalog
 access I think.

Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
not really concurrent, at least not concurrent to the standard set by
CREATE and DROP INDEX CONCURRENTLY.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Andres Freund
On 2013-06-17 09:12:12 -0400, Peter Eisentraut wrote:
 On 6/17/13 8:23 AM, Michael Paquier wrote:
  As mentionned by Andres, the only thing that the MVCC catalog patch can
  improve here
  is the index swap phase (index_concurrent_swap:index.c) where the
  relfilenode of the
  old and new indexes are exchanged. Now an AccessExclusiveLock is taken
  on the 2 relations
  being swap, we could leverage that to ShareUpdateExclusiveLock with the
  MVCC catalog
  access I think.
 
 Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
 not really concurrent, at least not concurrent to the standard set by
 CREATE and DROP INDEX CONCURRENTLY.

Well, it still does the main body of work in a concurrent fashion, so I
still don't see how that argument holds that much water. But anyway, the
argument was only whether we could continue reviewing before the mvcc
stuff goes in, not whether it can get committed before.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Peter Eisentraut
On 6/17/13 9:19 AM, Andres Freund wrote:
 Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
 not really concurrent, at least not concurrent to the standard set by
 CREATE and DROP INDEX CONCURRENTLY.
 
 Well, it still does the main body of work in a concurrent fashion, so I
 still don't see how that argument holds that much water.

The reason we added DROP INDEX CONCURRENTLY is so that you don't get
stuck in a lock situation like

long-running-transaction - DROP INDEX - everything else

If we accepted REINDEX CONCURRENTLY as currently proposed, then it would
have the same problem.

I don't think we should accept a REINDEX CONCURRENTLY implementation
that is worse in that respect than a manual CREATE INDEX CONCURRENTLY +
DROP INDEX CONCURRENTLY combination.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Andres Freund
On 2013-06-17 11:03:35 -0400, Peter Eisentraut wrote:
 On 6/17/13 9:19 AM, Andres Freund wrote:
  Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
  not really concurrent, at least not concurrent to the standard set by
  CREATE and DROP INDEX CONCURRENTLY.
  
  Well, it still does the main body of work in a concurrent fashion, so I
  still don't see how that argument holds that much water.
 
 The reason we added DROP INDEX CONCURRENTLY is so that you don't get
 stuck in a lock situation like
 
 long-running-transaction - DROP INDEX - everything else
 
 If we accepted REINDEX CONCURRENTLY as currently proposed, then it would
 have the same problem.
 
 I don't think we should accept a REINDEX CONCURRENTLY implementation
 that is worse in that respect than a manual CREATE INDEX CONCURRENTLY +
 DROP INDEX CONCURRENTLY combination.

Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
* reindex primary keys
* reindex keys referenced by foreign keys
* reindex exclusion constraints
* reindex toast tables
* do all that for a whole database
so I don't think that comparison is fair. Having it would have made
several previous point releases far less painful (e.g. 9.1.6/9.2.1).

But anyway, the as I said the argument was only whether we could
continue reviewing before the mvcc stuff goes in, not whether it can get
committed before..

I don't think we a have need to decide whether REINDEX CONCURRENTLY can
go in with the short exclusive lock unless we find unresolveable
problems with the mvcc patch. Which I very, very much hope not to be the
case.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Fujii Masao
On Mon, Jun 17, 2013 at 9:23 PM, Michael Paquier
michael.paqu...@gmail.com wrote:



 On Mon, Jun 17, 2013 at 5:23 AM, Andres Freund and...@2ndquadrant.com
 wrote:

 On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
  On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
  michael.paqu...@gmail.com wrote:
   Hi all,
  
   Please find attached the latest versions of REINDEX CONCURRENTLY for
   the 1st
   commit fest of 9.4:
   - 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to
   allow
   a toast relation to have multiple indexes running in parallel (extra
   indexes
   could be created by a REINDEX CONCURRENTLY processed)
   - 20130606_2_reindex_concurrently_v26.patch, correcting some comments
   and
   fixed a lock in index_concurrent_create on an index relation not
   released at
   the end of a transaction
 
  Could you let me know how this patch has something to do with MVCC
  catalog
  access patch? Should we wait for MVCC catalog access patch to be
  committed
  before starting to review this patch?

 I wondered the same. The MVCC catalog patch, if applied, would make it
 possible to make the actual relfilenode swap concurrently instead of
 requiring to take access exlusive locks which obviously is way nicer. On
 the other hand, that function is only a really small part of this patch,
 so it seems quite possible to make another pass at it before relying on
 mvcc catalog scans.

 As mentionned by Andres, the only thing that the MVCC catalog patch can
 improve here
 is the index swap phase (index_concurrent_swap:index.c) where the
 relfilenode of the
 old and new indexes are exchanged. Now an AccessExclusiveLock is taken on
 the 2 relations
 being swap, we could leverage that to ShareUpdateExclusiveLock with the MVCC
 catalog
 access I think.

 Also, with the MVCC catalog patch in, we could add some isolation tests for
 REINDEX CONCURRENTLY (there were some tests in one of the previous
 versions),
 what is currently not possible due to the exclusive lock taken at swap
 phase.

 Btw, those are minor things in the patch, so I think that it would be better
 to not wait
 for the MVCC catalog patch. Even if you think that it would be better to
 wait for it,
 you could even begin with the 1st patch allowing a toast relation to have
 multiple
 indexes (removal of reltoastidxid) which does not depend at all on it.

Here are the review comments of the removal_of_reltoastidxid patch.
I've not completed the review yet, but I'd like to post the current comments
before going to bed ;)

*** a/src/backend/catalog/system_views.sql
-pg_stat_get_blocks_fetched(X.oid) -
-pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
-pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+pg_stat_get_blocks_fetched(X.indrelid) -
+pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit

ISTM that X.indrelid indicates the TOAST table not the TOAST index.
Shouldn't we use X.indexrelid instead of X.indrelid?

You changed some SQLs because of removal of reltoastidxid.
Could you check that the original SQL and changed one return
the same value, again?

doc/src/sgml/diskusage.sgml
 There will be one index on the
 acronymTOAST/ table, if present.

I'm not sure if multiple indexes on TOAST table are viewable by a user.
If it's viewable, we need to correct the above description.

doc/src/sgml/monitoring.sgml
 entrystructfieldtidx_blks_read//entry
 entrytypebigint//entry
 entryNumber of disk blocks read from this table's TOAST table index (if 
 any)/entry
/row
row
 entrystructfieldtidx_blks_hit//entry
 entrytypebigint//entry
 entryNumber of buffer hits in this table's TOAST table index (if 
 any)/entry

For the same reason as the above, we need to change index to indexes
in these descriptions?

*** a/src/bin/pg_dump/pg_dump.c
+ SELECT c.reltoastrelid, t.indexrelid 

  FROM pg_catalog.pg_class c LEFT JOIN 

- pg_catalog.pg_class t ON 
(c.reltoastrelid = t.oid) 
- WHERE c.oid = '%u'::pg_catalog.oid;,
+ pg_catalog.pg_index t ON 
(c.reltoastrelid = t.indrelid) 
+ WHERE c.oid = '%u'::pg_catalog.oid 
AND t.indisvalid 
+ LIMIT 1,

Is there the case where TOAST table has more than one *valid* indexes?
If yes, is it really okay to choose just one index by using LIMIT 1?
If no, i.e., TOAST table should have only one valid index, we should get rid
of LIMIT 1 and check that only one row is returned from this query.
Fortunately, ISTM this check has been already done by the subsequent
call of ExecuteSqlQueryForSingleRow(). Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Josh Berkus

 Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
 * reindex primary keys
 * reindex keys referenced by foreign keys
 * reindex exclusion constraints
 * reindex toast tables
 * do all that for a whole database
 so I don't think that comparison is fair. Having it would have made
 several previous point releases far less painful (e.g. 9.1.6/9.2.1).

FWIW, I have a client who needs this implementation enough that we're
backporting it to 9.1 for them.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Andres Freund
On 2013-06-17 12:52:36 -0700, Josh Berkus wrote:
 
  Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
  * reindex primary keys
  * reindex keys referenced by foreign keys
  * reindex exclusion constraints
  * reindex toast tables
  * do all that for a whole database
  so I don't think that comparison is fair. Having it would have made
  several previous point releases far less painful (e.g. 9.1.6/9.2.1).
 
 FWIW, I have a client who needs this implementation enough that we're
 backporting it to 9.1 for them.

Wait. What? Unless you break catalog compatibility that's not safely
possible using this implementation.

Greetings,

Andres Freund

PS: Josh, minor thing, but could you please not trim the CC list, at
least when I am on it?

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Alvaro Herrera
Andres Freund wrote:

 PS: Josh, minor thing, but could you please not trim the CC list, at
 least when I am on it?

Yes, it's annoying.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Josh Berkus
On 06/17/2013 01:40 PM, Alvaro Herrera wrote:
 Andres Freund wrote:
 
 PS: Josh, minor thing, but could you please not trim the CC list, at
 least when I am on it?
 
 Yes, it's annoying.

I also get private comments from people who don't want me to cc them
when they are already on the list.  I can't satisfy everyone.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-17 Thread Andres Freund
On 2013-06-17 13:46:07 -0700, Josh Berkus wrote:
 On 06/17/2013 01:40 PM, Alvaro Herrera wrote:
  Andres Freund wrote:
  
  PS: Josh, minor thing, but could you please not trim the CC list, at
  least when I am on it?
  
  Yes, it's annoying.
 
 I also get private comments from people who don't want me to cc them
 when they are already on the list.  I can't satisfy everyone.

Given that nobody but you trims the CC list I don't find that a
convincing argument.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-16 Thread Fujii Masao
On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Hi all,

 Please find attached the latest versions of REINDEX CONCURRENTLY for the 1st
 commit fest of 9.4:
 - 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to allow
 a toast relation to have multiple indexes running in parallel (extra indexes
 could be created by a REINDEX CONCURRENTLY processed)
 - 20130606_2_reindex_concurrently_v26.patch, correcting some comments and
 fixed a lock in index_concurrent_create on an index relation not released at
 the end of a transaction

Could you let me know how this patch has something to do with MVCC catalog
access patch? Should we wait for MVCC catalog access patch to be committed
before starting to review this patch?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-06-16 Thread Andres Freund
On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
 On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Hi all,
 
  Please find attached the latest versions of REINDEX CONCURRENTLY for the 1st
  commit fest of 9.4:
  - 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to allow
  a toast relation to have multiple indexes running in parallel (extra indexes
  could be created by a REINDEX CONCURRENTLY processed)
  - 20130606_2_reindex_concurrently_v26.patch, correcting some comments and
  fixed a lock in index_concurrent_create on an index relation not released at
  the end of a transaction
 
 Could you let me know how this patch has something to do with MVCC catalog
 access patch? Should we wait for MVCC catalog access patch to be committed
 before starting to review this patch?

I wondered the same. The MVCC catalog patch, if applied, would make it
possible to make the actual relfilenode swap concurrently instead of
requiring to take access exlusive locks which obviously is way nicer. On
the other hand, that function is only a really small part of this patch,
so it seems quite possible to make another pass at it before relying on
mvcc catalog scans.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-31 Thread Michael Paquier
Hi,

I moved this patch to the next commit fest.
Thanks,
-- 
Michael


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-28 Thread Fujii Masao
On Thu, Mar 28, 2013 at 10:34 AM, Andres Freund and...@anarazel.de wrote:
 On 2013-03-28 10:18:45 +0900, Michael Paquier wrote:
 On Thu, Mar 28, 2013 at 3:12 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Since we call relation_open() with lockmode, ISTM that we should also call
  relation_close() with the same lockmode instead of NoLock. No?
 
 Agreed on that.

 That doesn't really hold true generally, its often sensible to hold the
 lock till the end of the transaction, which is what not specifying a
 lock at close implies.

You're right. Even if we release the lock there, the lock is taken again soon
and hold till the end of the transaction. There is no need to release the lock
there.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-27 Thread Fujii Masao
On Wed, Mar 27, 2013 at 8:26 AM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Wed, Mar 27, 2013 at 3:05 AM, Fujii Masao masao.fu...@gmail.com wrote:

 ISTM you failed to make the patches from your repository.
 20130323_1_toastindex_v7.patch contains all the changes of
 20130323_2_reindex_concurrently_v25.patch

 Oops, sorry I haven't noticed.
 Please find correct versions attached (realigned with latest head at the
 same time).

Thanks!

-   reltoastidxid = rel-rd_rel-reltoastidxid;
+   /* Fetch the list of indexes on toast relation if necessary */
+   if (OidIsValid(reltoastrelid))
+   {
+   Relation toastRel = relation_open(reltoastrelid, lockmode);
+   RelationGetIndexList(toastRel);
+   reltoastidxids = list_copy(toastRel-rd_indexlist);
+   relation_close(toastRel, NoLock);

list_copy() seems not to be required here. We can just set reltoastidxids to
the return list of RelationGetIndexList().

Since we call relation_open() with lockmode, ISTM that we should also call
relation_close() with the same lockmode instead of NoLock. No?

-   if (OidIsValid(reltoastidxid))
-   ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+   foreach(lc, reltoastidxids)
+   {
+   Oid idxid = lfirst_oid(lc);
+   if (OidIsValid(idxid))
+   ATExecSetTableSpace(idxid, newTableSpace, lockmode);

Since idxid is the pg_index.indexrelid, ISTM it should never be invalid.
If this is true, the check of OidIsValid(idxid) is not required.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-27 Thread Andres Freund
On 2013-03-28 10:18:45 +0900, Michael Paquier wrote:
 On Thu, Mar 28, 2013 at 3:12 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Since we call relation_open() with lockmode, ISTM that we should also call
  relation_close() with the same lockmode instead of NoLock. No?
 
 Agreed on that.

That doesn't really hold true generally, its often sensible to hold the
lock till the end of the transaction, which is what not specifying a
lock at close implies.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-27 Thread Andres Freund
On 2013-03-19 08:57:31 +0900, Michael Paquier wrote:
 On Tue, Mar 19, 2013 at 3:24 AM, Fujii Masao masao.fu...@gmail.com wrote:
 
  On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
  michael.paqu...@gmail.com wrote:
   I have been working on improving the code of the 2 patches:
   1) reltoastidxid removal:
  snip
   - Fix a bug with pg_dump and binary upgrade. One valid index is necessary
   for a given toast relation.
 
  Is this bugfix related to the following?
 
  appendPQExpBuffer(upgrade_query,
  - SELECT c.reltoastrelid,
  t.reltoastidxid 
  + SELECT c.reltoastrelid,
  t.indexrelid 
FROM pg_catalog.pg_class c LEFT
  JOIN 
  - pg_catalog.pg_class t ON
  (c.reltoastrelid = t.oid) 
  - WHERE c.oid =
  '%u'::pg_catalog.oid;,
  + pg_catalog.pg_index t ON
  (c.reltoastrelid = t.indrelid) 
  + WHERE c.oid =
  '%u'::pg_catalog.oid AND t.indisvalid 
  + LIMIT 1,
 
 Yes.
 
 
  Don't indisready and indislive need to be checked?
 
 An index is valid if it is already ready and line. We could add such check
 for safely but I don't think it is necessary.

Note that thats not true for 9.2. live  !ready represents isdead there, since
the need for that was only recognized after the release.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Remove invalid indexes from pg_dump Was: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-26 Thread Fujii Masao
On Tue, Mar 19, 2013 at 9:19 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 If failures happen with CREATE INDEX CONCURRENTLY, the system will be let
 with invalid indexes. I don't think that the user would like to see invalid
 indexes of
 an existing system being recreated as valid after a restore.
 So why not removing from a dump invalid indexes with something like the
 patch
 attached?

+1

The patch looks good to me.

 This should perhaps be applied in pg_dump for versions down to 8.2 where
 CREATE
 INDEX CONCURRENTLY has been implemented?

I think so.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-26 Thread Fujii Masao
On Sun, Mar 24, 2013 at 12:37 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund and...@2ndquadrant.com
 wrote:

 On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
  Is someone planning to provide additional feedback about this patch at
  some
  point?

 Yes, now that I have returned from my holidays - or well, am returning
 from them, I do plan to. But it should probably get some implementation
 level review from somebody but Fujii and me...

 Yeah, it would be good to have an extra pair of fresh eyes looking at those
 patches.

Probably I don't have enough time to review the patch thoroughly. It's quite
helpful if someone becomes another reviewer of this patch.

 Please find new patches realigned with HEAD. There were conflicts with 
 commits done recently.

ISTM you failed to make the patches from your repository.
20130323_1_toastindex_v7.patch contains all the changes of
20130323_2_reindex_concurrently_v25.patch

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-23 Thread Andres Freund
On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
 Is someone planning to provide additional feedback about this patch at some
 point?

Yes, now that I have returned from my holidays - or well, am returning
from them, I do plan to. But it should probably get some implementation
level review from somebody but Fujii and me...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-23 Thread Michael Paquier
On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
  Is someone planning to provide additional feedback about this patch at
 some
  point?

 Yes, now that I have returned from my holidays - or well, am returning
 from them, I do plan to. But it should probably get some implementation
 level review from somebody but Fujii and me...

Yeah, it would be good to have an extra pair of fresh eyes looking at those
patches.
Thanks,
-- 
Michael


Remove invalid indexes from pg_dump Was: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-21 Thread Michael Paquier
Hi,

On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Please find attached the patches wanted:
  - 20130317_dump_only_valid_index.patch, a 1-line patch that makes
pg_dump
  not take a dump of invalid indexes. This patch can be backpatched to
9.0.
 The patch seems to change pg_dump so that it ignores an invalid index only
 when the remote server version = 9.0. But why not when the remote server
 version  9.0?
 I think that you should start new thread to get much attention about this
patch
 if there is no enough feedback.

If failures happen with CREATE INDEX CONCURRENTLY, the system will be let
with invalid indexes. I don't think that the user would like to see invalid
indexes of
an existing system being recreated as valid after a restore.
So why not removing from a dump invalid indexes with something like the
patch
attached?
This should perhaps be applied in pg_dump for versions down to 8.2 where
CREATE
INDEX CONCURRENTLY has been implemented?

I noticed some recent discussions about that:
http://www.postgresql.org/message-id/20121207141236.gb4...@alvh.no-ip.org
In this case the problem has been fixed in pg_upgrade directly.

Regards,
-- 
Michael


20130317_dump_only_valid_index.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-21 Thread Michael Paquier
Is someone planning to provide additional feedback about this patch at some
point?
Thanks,
-- 
Michael


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-18 Thread Fujii Masao
On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Please find attached the patches wanted:
 - 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
 not take a dump of invalid indexes. This patch can be backpatched to 9.0.

Don't indisready and indislive need to be checked?

The patch seems to change pg_dump so that it ignores an invalid index only
when the remote server version = 9.0. But why not when the remote server
version  9.0?

I think that you should start new thread to get much attention about this patch
if there is no enough feedback.

 Note that there have been some recent discussions about that. This *problem*
 also concerned pg_upgrade.
 http://www.postgresql.org/message-id/20121207141236.gb4...@alvh.no-ip.org

What's the conclusion of this discussion? pg_dump --binary-upgrade also should
ignore an invalid index? pg_upgrade needs to be changed together?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-18 Thread Fujii Masao
On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 I have been working on improving the code of the 2 patches:
 1) reltoastidxid removal:
snip
 - Fix a bug with pg_dump and binary upgrade. One valid index is necessary
 for a given toast relation.

Is this bugfix related to the following?

appendPQExpBuffer(upgrade_query,
- SELECT c.reltoastrelid, 
t.reltoastidxid 
+ SELECT c.reltoastrelid, t.indexrelid 

  FROM pg_catalog.pg_class c LEFT JOIN 

- pg_catalog.pg_class t ON 
(c.reltoastrelid = t.oid) 
- WHERE c.oid = '%u'::pg_catalog.oid;,
+ pg_catalog.pg_index t ON 
(c.reltoastrelid = t.indrelid) 
+ WHERE c.oid = '%u'::pg_catalog.oid 
AND t.indisvalid 
+ LIMIT 1,

Don't indisready and indislive need to be checked?

Why is LIMIT 1 required? The toast table can have more than one toast indexes?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-18 Thread Michael Paquier
On Tue, Mar 19, 2013 at 3:03 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Please find attached the patches wanted:
  - 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
  not take a dump of invalid indexes. This patch can be backpatched to 9.0.

 Don't indisready and indislive need to be checked?

 The patch seems to change pg_dump so that it ignores an invalid index only
 when the remote server version = 9.0. But why not when the remote server
 version  9.0?

 I think that you should start new thread to get much attention about this
 patch
 if there is no enough feedback.

Yeah... Will send a message about that...



  Note that there have been some recent discussions about that. This
 *problem*
  also concerned pg_upgrade.
 
 http://www.postgresql.org/message-id/20121207141236.gb4...@alvh.no-ip.org

 What's the conclusion of this discussion? pg_dump --binary-upgrade also
 should
 ignore an invalid index? pg_upgrade needs to be changed together?

The conclusion is that pg_dump should not need to include invalid indexes
if it is
to create them as valid index during restore. However I haven't seen any
patch...
-- 
Michael


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-18 Thread Michael Paquier
On Tue, Mar 19, 2013 at 3:24 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  I have been working on improving the code of the 2 patches:
  1) reltoastidxid removal:
 snip
  - Fix a bug with pg_dump and binary upgrade. One valid index is necessary
  for a given toast relation.

 Is this bugfix related to the following?

 appendPQExpBuffer(upgrade_query,
 - SELECT c.reltoastrelid,
 t.reltoastidxid 
 + SELECT c.reltoastrelid,
 t.indexrelid 
   FROM pg_catalog.pg_class c LEFT
 JOIN 
 - pg_catalog.pg_class t ON
 (c.reltoastrelid = t.oid) 
 - WHERE c.oid =
 '%u'::pg_catalog.oid;,
 + pg_catalog.pg_index t ON
 (c.reltoastrelid = t.indrelid) 
 + WHERE c.oid =
 '%u'::pg_catalog.oid AND t.indisvalid 
 + LIMIT 1,

Yes.


 Don't indisready and indislive need to be checked?

An index is valid if it is already ready and line. We could add such check
for safely but I don't think it is necessary.

Why is LIMIT 1 required? The toast table can have more than one toast
 indexes?

It cannot have more than one VALID index, so yes as long as a check on
indisvalid is here there is no need to worry about a LIMIT condition. I
only thought of that as a safeguard. The same thing applies to the addition
of a condition based on indislive and indisready.
-- 
Michael


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-18 Thread Michael Paquier
On Tue, Mar 19, 2013 at 8:54 AM, Michael Paquier
michael.paqu...@gmail.comwrote:



 On Tue, Mar 19, 2013 at 3:03 AM, Fujii Masao masao.fu...@gmail.comwrote:

 On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Please find attached the patches wanted:
  - 20130317_dump_only_valid_index.patch, a 1-line patch that makes
 pg_dump
  not take a dump of invalid indexes. This patch can be backpatched to
 9.0.

 Don't indisready and indislive need to be checked?

 The patch seems to change pg_dump so that it ignores an invalid index only
 when the remote server version = 9.0. But why not when the remote server
 version  9.0?

 I think that you should start new thread to get much attention about this
 patch
 if there is no enough feedback.

 Yeah... Will send a message about that...



  Note that there have been some recent discussions about that. This
 *problem*
  also concerned pg_upgrade.
 
 http://www.postgresql.org/message-id/20121207141236.gb4...@alvh.no-ip.org

 What's the conclusion of this discussion? pg_dump --binary-upgrade also
 should
 ignore an invalid index? pg_upgrade needs to be changed together?

 The conclusion is that pg_dump should not need to include invalid indexes
 if it is
 to create them as valid index during restore. However I haven't seen any
 patch...

The fix has been done inside pg_upgrade:
http://momjian.us/main/blogs/pgblog/2012.html#December_14_2012

Nothing has been done for pg_dump.
-- 
Michael


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-17 Thread Michael Paquier
Please find attached the patches wanted:
- 20130317_reindexdb_concurrently.patch, adding an option -c/--concurrently
to reindexdb
Note that I added an error inside reindexdb for options -s -c as REINDEX
CONCURRENTLY does not support SYSTEM.
- 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
not take a dump of invalid indexes. This patch can be backpatched to 9.0.

On Sun, Mar 17, 2013 at 3:31 AM, Michael Paquier
michael.paqu...@gmail.comwrote:

 On 2013/03/17, at 0:35, Fujii Masao masao.fu...@gmail.com wrote:

  On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
  I found pg_dump dumps even the invalid index. But pg_dump should
  ignore the invalid index?
  This problem exists even without REINDEX CONCURRENTLY patch. So we might
 need to
  implement the bugfix patch separately rather than including the bugfix
  code in your patches.
  Probably the backport would be required. Thought?
 Hum... Indeed, they shouldn't be included... Perhaps this is already known?

Note that there have been some recent discussions about that. This
*problem* also concerned pg_upgrade.
http://www.postgresql.org/message-id/20121207141236.gb4...@alvh.no-ip.org
-- 
Michael


20130317_reindexdb_concurrently.patch
Description: Binary data


20130317_dump_only_valid_index.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-16 Thread Fujii Masao
On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 I have been working on improving the code of the 2 patches:

I found pg_dump dumps even the invalid index. But pg_dump should
ignore the invalid index?
This problem exists even without REINDEX CONCURRENTLY patch. So we might need to
implement the bugfix patch separately rather than including the bugfix
code in your patches.
Probably the backport would be required. Thought?

We should add the concurrent reindex option into reindexdb command?
This can be really
separate patch, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-16 Thread Michael Paquier
On 2013/03/17, at 0:35, Fujii Masao masao.fu...@gmail.com wrote:

 On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I have been working on improving the code of the 2 patches:
 
 I found pg_dump dumps even the invalid index. But pg_dump should
 ignore the invalid index?
 This problem exists even without REINDEX CONCURRENTLY patch. So we might need 
 to
 implement the bugfix patch separately rather than including the bugfix
 code in your patches.
 Probably the backport would be required. Thought?
Hum... Indeed, they shouldn't be included... Perhaps this is already known?
 
 We should add the concurrent reindex option into reindexdb command?
 This can be really
 separate patch, though.
Yes, they definitely should be separated for simplicity.
Btw, those patches seem trivial, I'll send them.

Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-09 Thread Fujii Masao
On Sat, Mar 9, 2013 at 1:31 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Sat, Mar 9, 2013 at 1:37 AM, Fujii Masao masao.fu...@gmail.com wrote:

 + para
 +  Concurrent indexes based on a literalPRIMARY KEY/ or an
 literal
 +  EXCLUSION/  constraint need to be dropped with literalALTER
 TABLE

 Typo: s/EXCLUSION/EXCLUDE

 Thanks. This is corrected.


 I encountered a segmentation fault when I ran REINDEX CONCURRENTLY.
 The test case to reproduce the segmentation fault is:

 1. Install btree_gist
 2. Run btree_gist's regression test (i.e., make installcheck)
 3. Log in contrib_regression database after the regression test
 4. Execute REINDEX TABLE CONCURRENTLY moneytmp

 Oops. I simply forgot to take into account the case of system attributes
 when building column names in index_concurrent_create. Fixed in new version
 attached.

Thanks for updating the patch!

I found the problem that the patch changed the behavior of
ALTER TABLE SET TABLESPACE so that it moves also
the index on the specified table to new tablespace. Per the
document of ALTER TABLE, this is not right behavior.

I think that it's worth adding new option for concurrent rebuilding
into reindexdb command. It's better to implement this separately
from core patch, though.

You need to add the description of locking of REINDEX CONCURRENTLY
into mvcc.sgml, I think.

+   Rebuild a table concurrently:
+
+programlisting
+REINDEX TABLE CONCURRENTLY my_broken_table;

Obviously REINDEX cannot rebuild a table ;)

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-09 Thread Fujii Masao
On Fri, Mar 8, 2013 at 1:46 AM, Andres Freund and...@2ndquadrant.com wrote:
 Why do you want to temporarily mark it as valid? I don't see any
 requirement that it is set to that during validate_index() (which imo is
 badly named, but...).
 I'd just set it to valid in the same transaction that does the swap.

+1. I cannot realize yet why isprimary flag needs to be set even
in the invalid index. In current patch, we can easily get into the
inconsistent situation, i.e., a table having more than one primary
key indexes.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-09 Thread Fujii Masao
On Sun, Mar 10, 2013 at 3:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for updating the patch!

- SELECT reltoastidxid 

- FROM info_rels i 
JOIN pg_catalog.pg_class c 
-  ON 
i.reloid = c.oid));
+ SELECT indexrelid 
+ FROM info_rels i 
+   JOIN 
pg_catalog.pg_class c 
+ ON i.reloid = 
c.oid 
+   JOIN 
pg_catalog.pg_index p 
+ ON i.reloid = 
p.indrelid 
+ WHERE p.indexrelid 
= %u , FirstNormalObjectId));

This new SQL doesn't seem to be right. Old one doesn't pick up any indexes
other than toast index, but new one seems to do.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-09 Thread Michael Paquier
On Sun, Mar 10, 2013 at 4:50 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Sun, Mar 10, 2013 at 3:48 AM, Fujii Masao masao.fu...@gmail.com
 wrote:
  Thanks for updating the patch!

 - SELECT
 reltoastidxid 
 - FROM info_rels
 i JOIN pg_catalog.pg_class c 
 -  ON
 i.reloid = c.oid));
 + SELECT
 indexrelid 
 + FROM info_rels
 i 
 +   JOIN
 pg_catalog.pg_class c 
 + ON i.reloid
 = c.oid 
 +   JOIN
 pg_catalog.pg_index p 
 + ON i.reloid
 = p.indrelid 
 + WHERE
 p.indexrelid = %u , FirstNormalObjectId));

 This new SQL doesn't seem to be right. Old one doesn't pick up any indexes
 other than toast index, but new one seems to do.

Indeed, it was selecting all indexes...
I replaced it by this query reducing the selection of indexes for toast
relations:
- SELECT
reltoastidxid 
- FROM info_rels i
JOIN pg_catalog.pg_class c 
-  ON
i.reloid = c.oid));
+ SELECT
indexrelid 
+ FROM pg_index 
+ WHERE indrelid
IN (SELECT reltoastrelid 
+   FROM
pg_class 
+   WHERE
oid = %u 
+  AND
reltoastrelid != %u),
+
FirstNormalObjectId, InvalidOid));
Will send patch soon...
-- 
Michael


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-08 Thread Fujii Masao
On Fri, Mar 8, 2013 at 10:00 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Fri, Mar 8, 2013 at 1:41 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  The strange think about hoge_pkey_cct_cct is that it seems to imply
  that an invalid index was reindexed concurrently?
 
  But I don't see how it could happen either. Fujii, can you reproduce it?

 Yes, I can even with the latest version of the patch. The test case to
 reproduce it is:

 (Session 1)
 CREATE TABLE hoge (i int primary key);
 INSERT INTO hoge VALUES (generate_series(1,10));

 (Session 2)
 BEGIN;
 SELECT * FROM hoge;
 (keep this session as it is)

 (Session 1)
 SET statement_timeout TO '1s';
 REINDEX TABLE CONCURRENTLY hoge;
 \d hoge
 REINDEX TABLE CONCURRENTLY hoge;
 \d hoge

 I fixed this problem in the patch attached. It was caused by 2 things:
 - The concurrent index was seen as valid from other backend between phases 3
 and 4. So the concurrent index is made valid at phase 4, then swap is done
 and finally marked as invalid. So it remains invalid seen from the other
 sessions.
 - index_set_state_flags used heap_inplace_update, which is not completely
 safe at swapping phase, so I had to extend it a bit to use a safe
 simple_heap_update at swap phase.

Thanks!

+ para
+  Concurrent indexes based on a literalPRIMARY KEY/ or an literal
+  EXCLUSION/  constraint need to be dropped with literalALTER TABLE

Typo: s/EXCLUSION/EXCLUDE

I encountered a segmentation fault when I ran REINDEX CONCURRENTLY.
The test case to reproduce the segmentation fault is:

1. Install btree_gist
2. Run btree_gist's regression test (i.e., make installcheck)
3. Log in contrib_regression database after the regression test
4. Execute REINDEX TABLE CONCURRENTLY moneytmp

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-07 Thread Fujii Masao
On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 The strange think about hoge_pkey_cct_cct is that it seems to imply
 that an invalid index was reindexed concurrently?

 But I don't see how it could happen either. Fujii, can you reproduce it?

Yes, I can even with the latest version of the patch. The test case to
reproduce it is:

(Session 1)
CREATE TABLE hoge (i int primary key);
INSERT INTO hoge VALUES (generate_series(1,10));

(Session 2)
BEGIN;
SELECT * FROM hoge;
(keep this session as it is)

(Session 1)
SET statement_timeout TO '1s';
REINDEX TABLE CONCURRENTLY hoge;
\d hoge
REINDEX TABLE CONCURRENTLY hoge;
\d hoge

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-07 Thread Andres Freund
On 2013-03-07 09:58:58 +0900, Michael Paquier wrote:
+The recommended recovery method in such cases is to drop the
concurrent
 +index and try again to perform commandREINDEX
  CONCURRENTLY/.

 If an invalid index depends on the constraint like primary key,
  drop
 the concurrent
 index cannot actually drop the index. In this case, you need to
  issue
 alter table
 ... drop constraint ... to recover the situation. I think this
 informataion should be
 documented.

 I think we just shouldn't set -isprimary on the temporary indexes.
  Now
 we switch only the relfilenodes and not the whole index, that
  should be
 perfectly fine.
   
Sounds good. But, what about other constraint case like unique
  constraint?
Those other cases also can be resolved by not setting -isprimary?
   
   We should stick with the concurrent index being a twin of the index it
   rebuilds for consistency.
 
  I don't think its legal. We cannot simply have two indexes with
  'indisprimary'. Especially not if bot are valid.
  Also, there will be no pg_constraint row that refers to it which
  violates very valid expectations that both users and pg may have.
 
  So what to do with that?
  Mark the concurrent index as valid, then validate it and finally mark it
  as invalid inside the same transaction at phase 4?
  That's moving 2 lines of code...
 
 Sorry phase 4 is the swap phase. Validation happens at phase 3.

Why do you want to temporarily mark it as valid? I don't see any
requirement that it is set to that during validate_index() (which imo is
badly named, but...).
I'd just set it to valid in the same transaction that does the swap.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund
On 2013-03-06 13:21:27 +0900, Michael Paquier wrote:
 Please find attached updated patch realigned with your comments. You can
 find my answers inline...
 The only thing that needs clarification is the comment about
 UNIQUE_CHECK_YES/UNIQUE_CHECK_NO. Except that all the other things are
 corrected or adapted to what you wanted. I am also including now tests for
 matviews.
 
 On Wed, Mar 6, 2013 at 1:49 AM, Andres Freund and...@2ndquadrant.comwrote:
 
  On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
 
+ for (count = 0; count  num_indexes; count++)
 + index_insert(toastidxs[count], t_values,
  t_isnull,
 +  (toasttup-t_self),
 +  toastrel,
 +
 toastidxs[count]-rd_index-indisunique ?
 +  UNIQUE_CHECK_YES :
UNIQUE_CHECK_NO);
   
The indisunique check looks like a copy  pasto to me, albeit not
yours...
   
   Yes it is the same for all the indexes normally, but it looks more solid
  to
   me to do that as it is. So unchanged.
 
  Hm, if the toast indexes aren't unique anymore loads of stuff would be
  broken. Anyway, not your fault.
 
 I definitely cannot understand where you are going here. Could you be more
 explicit? Why could this be a problem? Without my patch a similar check is
 used for toast indexes.

There's no problem. I just dislike the pointless check which caters for
a situation that doesn't exist...
Forget it, sorry.

 + if (count == 0)
 + snprintf(NewToastName,
NAMEDATALEN, pg_toast_%u_index,
 +  OIDOldHeap);
 + else
 + snprintf(NewToastName,
NAMEDATALEN, pg_toast_%u_index_cct%d,
 +  OIDOldHeap,
count);
 + RenameRelationInternal(lfirst_oid(lc),
 +
 NewToastName);
 + count++;
 + }
   
Hm. It seems wrong that this layer needs to know about _cct.
   
Any other idea? For the time being I removed cct and added only a suffix
   based on the index number...
 
  Hm. It seems like throwing an error would be sufficient, that path is
  only entered for shared catalogs, right? Having multiple toast indexes
  would be a bug.
 
 Don't think so. Even if now those APIs are used only for catalog tables, I
 do not believe that this function has been designed to be used only with
 shared catalogs. Removing the cct suffix makes sense though...

Forget what I said.

 + /*
 +  * Index is considered as a constraint if it is PRIMARY KEY or
EXCLUSION.
 +  */
 + isconstraint =  indexRelation-rd_index-indisprimary ||
 + indexRelation-rd_index-indisexclusion;
   
unique constraints aren't mattering here?
   
   No they are not. Unique indexes are not counted as constraints in the
  case
   of index_create. Previous versions of the patch did that but there are
   issues with unique indexes using expressions.
 
  Hm. index_create's comment says:
   * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION
  constraint
 
  There are unique indexes that are constraints and some that are
  not. Looking at -indisunique is not sufficient to determine whether its
  one or not.
 
 Hum... OK. I changed that using a method based on get_index_constraint for
 a given index. So if the constraint Oid is invalid, it means that this
 index has no constraints and its concurrent entry won't create an index in
 consequence. It is more stable this way.

Sounds good. Just to make that clear:
To get a unique index without constraint:
CREATE TABLE table_u(id int, data int);
CREATE UNIQUE INDEX table_u__data ON table_u(data);
To get a constraint:
ALTER TABLE table_u ADD CONSTRAINT table_u__id_unique UNIQUE(id);

 + /*
 +  * Phase 3 of REINDEX CONCURRENTLY
 +  *
 +  * During this phase the concurrent indexes catch up with the
INSERT that
 +  * might have occurred in the parent table and are marked as
  valid
once done.
 +  *
 +  * We once again wait until no transaction can have the table
  open
with
 +  * the index marked as read-only for updates. Each index
validation is done
 +  * with a separate transaction to avoid opening transaction
  for an
 +  * unnecessary too long time.
 +  */
   
Maybe I am being dumb because I have the feeling I said differently in
the past, but why do we not need a WaitForMultipleVirtualLocks() here?
The comment seems to say we need to do so.
   
   Yes you said the contrary in a previous review. The purpose of this
   function is to first gather the locks and then 

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund
On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
 OK. Patches updated... Please see attached.
 With all the work done on those patches, I suppose this is close to being
 something clean...

Yes, its looking good. There are loads of improvements possible but
those can very well be made incrementally.
  I have the feeling we are talking past each other. Unless I miss
  something *there is no* WaitForMultipleVirtualLocks between phase 2 and
  3. But one WaitForMultipleVirtualLocks for all would be totally
  sufficient.
 
 OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks
 also before phase 3.
 Honestly, I am still not very comfortable with the fact that the ShareLock
 wait on parent relation is done outside each index transaction for build
 and validation... Changed as requested though...

Could you detail your concerns a bit? I tried to think it through
multiple times now and I still can't see a problem. The lock only
ensures that nobody has the relation open with the old index definition
in mind...

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Michael Paquier
On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
  OK. Patches updated... Please see attached.
  With all the work done on those patches, I suppose this is close to being
  something clean...

 Yes, its looking good. There are loads of improvements possible but
 those can very well be made incrementally.
   I have the feeling we are talking past each other. Unless I miss
   something *there is no* WaitForMultipleVirtualLocks between phase 2 and
   3. But one WaitForMultipleVirtualLocks for all would be totally
   sufficient.
  
  OK, sorry for the confusion. I added a call to
 WaitForMultipleVirtualLocks
  also before phase 3.
  Honestly, I am still not very comfortable with the fact that the
 ShareLock
  wait on parent relation is done outside each index transaction for build
  and validation... Changed as requested though...

 Could you detail your concerns a bit? I tried to think it through
 multiple times now and I still can't see a problem. The lock only
 ensures that nobody has the relation open with the old index definition
 in mind...

I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock
wait is made inside the build and validation transactions. Was there any
particular reason why CREATE INDEX CONCURRENTLY wait is done inside a
transaction block?
That's my only concern.
-- 
Michael


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund
On 2013-03-06 21:19:57 +0900, Michael Paquier wrote:
 On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund and...@2ndquadrant.comwrote:
 
  On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
   OK. Patches updated... Please see attached.
   With all the work done on those patches, I suppose this is close to being
   something clean...
 
  Yes, its looking good. There are loads of improvements possible but
  those can very well be made incrementally.
I have the feeling we are talking past each other. Unless I miss
something *there is no* WaitForMultipleVirtualLocks between phase 2 and
3. But one WaitForMultipleVirtualLocks for all would be totally
sufficient.
   
   OK, sorry for the confusion. I added a call to
  WaitForMultipleVirtualLocks
   also before phase 3.
   Honestly, I am still not very comfortable with the fact that the
  ShareLock
   wait on parent relation is done outside each index transaction for build
   and validation... Changed as requested though...
 
  Could you detail your concerns a bit? I tried to think it through
  multiple times now and I still can't see a problem. The lock only
  ensures that nobody has the relation open with the old index definition
  in mind...
 
 I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock
 wait is made inside the build and validation transactions. Was there any
 particular reason why CREATE INDEX CONCURRENTLY wait is done inside a
 transaction block?
 That's my only concern.

Well, it needs to be executed in a transaction because it needs a valid
resource owner and a previous CommitTransactionCommand() will leave that
at NULL. And there is no reason in the single-index case of CREATE INDEX
CONCURRENTLY to do it in a separate transaction.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Fujii Masao
On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 OK. Patches updated... Please see attached.

I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
I found that the index which was not marked as INVALID remained unexpectedly.

=# CREATE TABLE hoge (i int primary key);
CREATE TABLE
=# INSERT INTO hoge VALUES (generate_series(1,10));
INSERT 0 10
=# SET statement_timeout TO '1s';
SET
=# REINDEX TABLE CONCURRENTLY hoge;
ERROR:  canceling statement due to statement timeout
=# \d hoge
 Table public.hoge
 Column |  Type   | Modifiers
+-+---
 i  | integer | not null
Indexes:
hoge_pkey PRIMARY KEY, btree (i)
hoge_pkey_cct PRIMARY KEY, btree (i) INVALID

=# REINDEX TABLE CONCURRENTLY hoge;
ERROR:  canceling statement due to statement timeout
=# \d hoge
 Table public.hoge
 Column |  Type   | Modifiers
+-+---
 i  | integer | not null
Indexes:
hoge_pkey PRIMARY KEY, btree (i)
hoge_pkey_cct PRIMARY KEY, btree (i) INVALID
hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID
hoge_pkey_cct_cct PRIMARY KEY, btree (i)


+The recommended recovery method in such cases is to drop the concurrent
+index and try again to perform commandREINDEX CONCURRENTLY/.

If an invalid index depends on the constraint like primary key, drop
the concurrent
index cannot actually drop the index. In this case, you need to issue
alter table
... drop constraint ... to recover the situation. I think this
informataion should be
documented.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund
On 2013-03-07 02:09:49 +0900, Fujii Masao wrote:
 On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  OK. Patches updated... Please see attached.
 
 I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
 I found that the index which was not marked as INVALID remained unexpectedly.

Thats to be expected. Indexes need to be valid *before* we can drop the
old one. So if you abort in the right moment you will see those and
thats imo fine.
 
 =# CREATE TABLE hoge (i int primary key);
 CREATE TABLE
 =# INSERT INTO hoge VALUES (generate_series(1,10));
 INSERT 0 10
 =# SET statement_timeout TO '1s';
 SET
 =# REINDEX TABLE CONCURRENTLY hoge;
 ERROR:  canceling statement due to statement timeout
 =# \d hoge
  Table public.hoge
  Column |  Type   | Modifiers
 +-+---
  i  | integer | not null
 Indexes:
 hoge_pkey PRIMARY KEY, btree (i)
 hoge_pkey_cct PRIMARY KEY, btree (i) INVALID
 
 =# REINDEX TABLE CONCURRENTLY hoge;
 ERROR:  canceling statement due to statement timeout
 =# \d hoge
  Table public.hoge
  Column |  Type   | Modifiers
 +-+---
  i  | integer | not null
 Indexes:
 hoge_pkey PRIMARY KEY, btree (i)
 hoge_pkey_cct PRIMARY KEY, btree (i) INVALID
 hoge_pkey_cct1 PRIMARY KEY, btree (i) INVALID
 hoge_pkey_cct_cct PRIMARY KEY, btree (i)

Huh, why did that go through? It should have errored out?
 
 +The recommended recovery method in such cases is to drop the concurrent
 +index and try again to perform commandREINDEX CONCURRENTLY/.
 
 If an invalid index depends on the constraint like primary key, drop
 the concurrent
 index cannot actually drop the index. In this case, you need to issue
 alter table
 ... drop constraint ... to recover the situation. I think this
 informataion should be
 documented.

I think we just shouldn't set -isprimary on the temporary indexes. Now
we switch only the relfilenodes and not the whole index, that should be
perfectly fine.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   >