Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-03 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> It might work to build the new key in a context that's initially a
>> child of CurrentMemoryContext, then reparent it to be a child of
>> CacheMemoryContext when done. 

> That's another way (than the PG_TRY block), but I think it's more
> complicated with no gain.

I disagree: PG_TRY blocks are pretty expensive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-03 Thread Alvaro Herrera
> Robert Haas  writes:

> > We could do that, but the motivation for the current system was to
> > avoid leaking memory in a long-lived context.

Yeah, my approach here is to use a CATCH block that deletes the memory
context just created, thus avoiding a long-lived leak.

Tom Lane wrote:

> Another key point is to avoid leaving a corrupted relcache entry behind
> if you fail partway through.

Sure ... in the code as I have it we only assign the local variable to
the relcache entry if everything is succesful.  So no relcache
corruption should result.

> It might work to build the new key in a context that's initially a
> child of CurrentMemoryContext, then reparent it to be a child of
> CacheMemoryContext when done. 

That's another way (than the PG_TRY block), but I think it's more
complicated with no gain.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-27 Thread Tom Lane
Robert Haas  writes:
> On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera  
> wrote:
>> I noticed that RelationBuildPartitionKey is generating a partition key
>> in a temp context, then creating a private context and copying the key
>> into that.  That seems leftover from some previous iteration of some
>> other patch; I think it's pretty reasonable to create the new context
>> right from the start and allocate the key there directly instead.  Then
>> there's no need for copy_partition_key at all.

> We could do that, but the motivation for the current system was to
> avoid leaking memory in a long-lived context.  I think the same
> technique is used elsewhere for similar reasons.  I admit I haven't
> checked whether there would actually be a leak here if we did it as
> you propose, but I wouldn't find it at all surprising.

Another key point is to avoid leaving a corrupted relcache entry behind
if you fail partway through.  I think that the current coding is
RelationBuildPartitionKey is safe, although it's far too undercommented
for my taste.  (The ordering of the last few steps is *critical*.)

It might work to build the new key in a context that's initially a
child of CurrentMemoryContext, then reparent it to be a child of
CacheMemoryContext when done.  But you'd have to look at whether that
would end up wasting long-lived memory, if the build process generates
any cruft in the same context.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-27 Thread Robert Haas
On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera  wrote:
> I noticed that RelationBuildPartitionKey is generating a partition key
> in a temp context, then creating a private context and copying the key
> into that.  That seems leftover from some previous iteration of some
> other patch; I think it's pretty reasonable to create the new context
> right from the start and allocate the key there directly instead.  Then
> there's no need for copy_partition_key at all.

We could do that, but the motivation for the current system was to
avoid leaking memory in a long-lived context.  I think the same
technique is used elsewhere for similar reasons.  I admit I haven't
checked whether there would actually be a leak here if we did it as
you propose, but I wouldn't find it at all surprising.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-27 Thread Alvaro Herrera
Alvaro Herrera wrote:

> I'm now working on the ability to build unique indexes (and unique
> constraints) on top of this.

So I think there's not a lot of additional code required to support
unique indexes with the restrictions mentioned; proof-of-concept (with
several holes still) attached.

As before, this is not finished, as there a few things that are wrong
(such as generateClonedIndexStmt taking a RangeVar), and others that I
don't understand (such as why is rd_partkey NULL for partitioned
partitions when DefineIndex cascades to them and the columns are
checked).

I noticed that RelationBuildPartitionKey is generating a partition key
in a temp context, then creating a private context and copying the key
into that.  That seems leftover from some previous iteration of some
other patch; I think it's pretty reasonable to create the new context
right from the start and allocate the key there directly instead.  Then
there's no need for copy_partition_key at all.

Anyway, here is a preliminary submission before I close shop for the
week.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 37e683e352df5f3e6c110d94881abe888e36953e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 16 Oct 2017 12:01:12 +0200
Subject: [PATCH v2 1/7] Tweak index_create/index_constraint_create APIs

I was annoyed by index_create and index_constraint_create respective
APIs.  It's too easy to get confused when adding a new behavioral flag
given the large number of boolean flags they already have.  Turning them
into a flags bitmask makes that code easier to read, as in the attached
patch.

I split the flags in two -- one set for index_create directly and
another related to constraints.  index_create() itself receives both,
and then passes down the constraint one to index_constraint_create.  It
is shorter in LOC terms to create a single set mixing all the values,
but it seemed to make more sense this way.  Also, I chose not to turn
the booleans 'allow_sytem_table_mods' and 'is_internal' into flag bits
because of the different way they interact with this code.
---
 src/backend/catalog/index.c  | 101 +++
 src/backend/catalog/toasting.c   |   3 +-
 src/backend/commands/indexcmds.c |  33 +
 src/backend/commands/tablecmds.c |  13 +++--
 src/include/catalog/index.h  |  29 ++-
 5 files changed, 100 insertions(+), 79 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c7b2f031f0..17faeffada 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -680,19 +680,25 @@ UpdateIndexRelation(Oid indexoid,
  * classObjectId: array of index opclass OIDs, one per index column
  * coloptions: array of per-index-column indoption settings
  * reloptions: AM-specific options
- * isprimary: index is a PRIMARY KEY
- * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint
- * deferrable: constraint is DEFERRABLE
- * initdeferred: constraint is INITIALLY DEFERRED
+ * flags: bitmask that can include any combination of these bits:
+ * INDEX_CREATE_IS_PRIMARY
+ * the index is a primary key
+ * INDEX_CREATE_ADD_CONSTRAINT:
+ * invoke index_constraint_create also
+ * INDEX_CREATE_SKIP_BUILD:
+ * skip the index_build() step for the moment; caller must 
do it
+ * later (typically via reindex_index())
+ * INDEX_CREATE_CONCURRENT:
+ * do not lock the table against writers.  The index will 
be
+ * marked "invalid" and the caller must take additional 
steps
+ * to fix it up.
+ * INDEX_CREATE_IF_NOT_EXISTS:
+ * do not throw an error if a relation with the same name
+ * already exists.
+ * constr_flags: flags passed to index_constraint_create
+ * (only if INDEX_CREATE_ADD_CONSTRAINT is set)
  * allow_system_table_mods: allow table to be a system catalog
- * skip_build: true to skip the index_build() step for the moment; caller
- * must do it later (typically via reindex_index())
- * concurrent: if true, do not lock the table against writers.  The index
- * will be marked "invalid" and the caller must take additional 
steps
- * to fix it up.
  * is_internal: if true, post creation hook for new index
- * if_not_exists: if true, do not throw an error if a relation with
- * the same name already exists.
  *
  * Returns the OID of the created index.
  */
@@ -709,15 +715,10 @@ index_create(Relation heapRelation,
 Oid *classObjectId,
 int16 *coloptions,
 Datum reloptions,
-bool isprimary,
-bool 

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-23 Thread Amit Langote
On 2017/10/24 1:15, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
>>  wrote:
>>> I started with Maksim's submitted code, and developed according to the
>>> ideas discussed in this thread.  Attached is a very WIP patch series for
>>> this feature.

Nice!

>>> Many things remain to be done before this is committable:  pg_dump
>>> support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
>>> implemented.  No REINDEX support yet.  Docs not updated (but see the
>>> regression test as a guide for how this is supposed to work; see patch
>>> 0005).  CREATE INDEX CONCURRENTLY not done yet.
>>>
>>> I'm now working on the ability to build unique indexes (and unique
>>> constraints) on top of this.
>>
>> Cool.  Are you planning to do that by (a) only allowing the special
>> case where the partition key columns/expressions are included in the
>> indexed columns/expressions, (b) trying to make every insert to any
>> index check all of the indexes for uniqueness conflicts, or (c)
>> implementing global indexes?  Because (b) sounds complex - think about
>> attach operations, for example - and (c) sounds super-hard.  I'd
>> suggest doing (a) first, just on the basis of complexity.
> 
> Yes, I think (a) is a valuable thing to have -- not planning on doing
> (c) at all because I fear it'll be a huge time sink.  I'm not sure about
> (b), but it's not currently on my plan.

+1 to proceeding with (a) first.

>> I hope that you don't get so involved in making this unique index
>> stuff work that we don't get the cascading index feature, at least,
>> committed to v11.  That's already a considerable step forward in terms
>> of ease of use, and I'd really like to have it.
> 
> Absolutely -- I do plan to get this one finished regardless of unique
> indexes.

+1

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-23 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
>  wrote:
> > I started with Maksim's submitted code, and developed according to the
> > ideas discussed in this thread.  Attached is a very WIP patch series for
> > this feature.
> >
> > Many things remain to be done before this is committable:  pg_dump
> > support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
> > implemented.  No REINDEX support yet.  Docs not updated (but see the
> > regression test as a guide for how this is supposed to work; see patch
> > 0005).  CREATE INDEX CONCURRENTLY not done yet.
> >
> > I'm now working on the ability to build unique indexes (and unique
> > constraints) on top of this.
> 
> Cool.  Are you planning to do that by (a) only allowing the special
> case where the partition key columns/expressions are included in the
> indexed columns/expressions, (b) trying to make every insert to any
> index check all of the indexes for uniqueness conflicts, or (c)
> implementing global indexes?  Because (b) sounds complex - think about
> attach operations, for example - and (c) sounds super-hard.  I'd
> suggest doing (a) first, just on the basis of complexity.

Yes, I think (a) is a valuable thing to have -- not planning on doing
(c) at all because I fear it'll be a huge time sink.  I'm not sure about
(b), but it's not currently on my plan.

> I hope that you don't get so involved in making this unique index
> stuff work that we don't get the cascading index feature, at least,
> committed to v11.  That's already a considerable step forward in terms
> of ease of use, and I'd really like to have it.

Absolutely -- I do plan to get this one finished regardless of unique
indexes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-23 Thread Robert Haas
On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
 wrote:
> I started with Maksim's submitted code, and developed according to the
> ideas discussed in this thread.  Attached is a very WIP patch series for
> this feature.
>
> Many things remain to be done before this is committable:  pg_dump
> support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
> implemented.  No REINDEX support yet.  Docs not updated (but see the
> regression test as a guide for how this is supposed to work; see patch
> 0005).  CREATE INDEX CONCURRENTLY not done yet.
>
> I'm now working on the ability to build unique indexes (and unique
> constraints) on top of this.

Cool.  Are you planning to do that by (a) only allowing the special
case where the partition key columns/expressions are included in the
indexed columns/expressions, (b) trying to make every insert to any
index check all of the indexes for uniqueness conflicts, or (c)
implementing global indexes?  Because (b) sounds complex - think about
attach operations, for example - and (c) sounds super-hard.  I'd
suggest doing (a) first, just on the basis of complexity.

I hope that you don't get so involved in making this unique index
stuff work that we don't get the cascading index feature, at least,
committed to v11.  That's already a considerable step forward in terms
of ease of use, and I'd really like to have it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-23 Thread Alvaro Herrera
Hello

I started with Maksim's submitted code, and developed according to the
ideas discussed in this thread.  Attached is a very WIP patch series for
this feature.

Many things remain to be done before this is committable:  pg_dump
support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
implemented.  No REINDEX support yet.  Docs not updated (but see the
regression test as a guide for how this is supposed to work; see patch
0005).  CREATE INDEX CONCURRENTLY not done yet.

I'm now working on the ability to build unique indexes (and unique
constraints) on top of this.

The docs have not been updated yet, but the new regression test file
illustrates how this works.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 8e8bf2f587188859eba1f61a44ee9ee08d960f51 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 16 Oct 2017 12:01:12 +0200
Subject: [PATCH v1 1/5] Tweak index_create/index_constraint_create APIs

I was annoyed by index_create and index_constraint_create respective
APIs.  It's too easy to get confused when adding a new behavioral flag
given the large number of boolean flags they already have.  Turning them
into a flags bitmask makes that code easier to read, as in the attached
patch.

I split the flags in two -- one set for index_create directly and
another related to constraints.  index_create() itself receives both,
and then passes down the constraint one to index_constraint_create.  It
is shorter in LOC terms to create a single set mixing all the values,
but it seemed to make more sense this way.  Also, I chose not to turn
the booleans 'allow_sytem_table_mods' and 'is_internal' into flag bits
because of the different way they interact with this code.
---
 src/backend/catalog/index.c  | 101 +++
 src/backend/catalog/toasting.c   |   3 +-
 src/backend/commands/indexcmds.c |  33 +
 src/backend/commands/tablecmds.c |  13 +++--
 src/include/catalog/index.h  |  29 ++-
 5 files changed, 100 insertions(+), 79 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c7b2f031f0..17faeffada 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -680,19 +680,25 @@ UpdateIndexRelation(Oid indexoid,
  * classObjectId: array of index opclass OIDs, one per index column
  * coloptions: array of per-index-column indoption settings
  * reloptions: AM-specific options
- * isprimary: index is a PRIMARY KEY
- * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint
- * deferrable: constraint is DEFERRABLE
- * initdeferred: constraint is INITIALLY DEFERRED
+ * flags: bitmask that can include any combination of these bits:
+ * INDEX_CREATE_IS_PRIMARY
+ * the index is a primary key
+ * INDEX_CREATE_ADD_CONSTRAINT:
+ * invoke index_constraint_create also
+ * INDEX_CREATE_SKIP_BUILD:
+ * skip the index_build() step for the moment; caller must 
do it
+ * later (typically via reindex_index())
+ * INDEX_CREATE_CONCURRENT:
+ * do not lock the table against writers.  The index will 
be
+ * marked "invalid" and the caller must take additional 
steps
+ * to fix it up.
+ * INDEX_CREATE_IF_NOT_EXISTS:
+ * do not throw an error if a relation with the same name
+ * already exists.
+ * constr_flags: flags passed to index_constraint_create
+ * (only if INDEX_CREATE_ADD_CONSTRAINT is set)
  * allow_system_table_mods: allow table to be a system catalog
- * skip_build: true to skip the index_build() step for the moment; caller
- * must do it later (typically via reindex_index())
- * concurrent: if true, do not lock the table against writers.  The index
- * will be marked "invalid" and the caller must take additional 
steps
- * to fix it up.
  * is_internal: if true, post creation hook for new index
- * if_not_exists: if true, do not throw an error if a relation with
- * the same name already exists.
  *
  * Returns the OID of the created index.
  */
@@ -709,15 +715,10 @@ index_create(Relation heapRelation,
 Oid *classObjectId,
 int16 *coloptions,
 Datum reloptions,
-bool isprimary,
-bool isconstraint,
-bool deferrable,
-bool initdeferred,
+uint16 flags,
+uint16 constr_flags,
 bool allow_system_table_mods,
-bool skip_build,
-bool concurrent,
-bool is_internal,
-   

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-11 Thread Robert Haas
On Tue, Oct 10, 2017 at 11:22 AM, Alvaro Herrera
 wrote:
> Hmm ... yeah, ATTACH and DETACH sound acceptable to me.  On DETACH, the
> abstract index should be marked indisvalid=false unless a substitute
> index already exists; and on ATTACH when indisvalid=false we verify that
> all local indexes exist, and if so we can flip indisvalid.  That way, we
> can continue to rely on the parent index always having all its children
> when the flag is set.

True.  I don't see an immediate use for that guarantee, actually,
because the index can't be "scanned" either way. But it might be
useful someday, if for example we wanted to try plan large numbers of
children symmetrically rather than (as we do currently) planning each
one individually, or if we've got an idea about making foreign keys
work.  I think you could ignore this for now, though, if you want.

> I'm not clear on a syntax that creates the main index and hopes to later
> have the sub-indexes created.  Another approach is to do it the other
> way around, i.e. create the children first, then once they're all in
> place create the main one normally, which merely verifies that all the
> requisite children exist.  This is related to what I proposed to occur
> when a regular table is joined as a partition of the partitioned table:
> we run a verification that an index matching the parent's abstract
> indexes exists, and if not we raise an error.  (Alternatively we could
> allow the case, and mark the abstract index as indisvalid=false, but
> that seems to violate POLA).

It's not much different than
pg_catalog.binary_upgrade_create_empty_extension but I don't really
care which way pg_dump emits it.

>> Another thing that would let you do is CREATE INDEX CONCURRENTLY
>> replacement_concrete_index; ALTER INDEX abstract_index DETACH
>> PARTITION old_concrete_index, ATTACH PARTITION
>> replacement_concrete_index; DROP INDEX CONCURRENTLY
>> old_concrete_index, which seems like a thing someone might want to do.
>
> Yeah, this is a point I explicitly mentioned, and this proposal seems
> like a good way.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-10 Thread Alvaro Herrera
Robert Haas wrote:
> On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera  
> wrote:
> > 2. create one index for each existing partition.  These would be
> >identical to what would happen if you created the index directly on
> >each partition, except that there is an additional dependency to the
> >parent's abstract index.
> 
> One thing I'm a bit worried about is how to name these subordinate
> indexes.  They have to have names because that's how pg_class works,
> and those names can't all be the same, again because that's how
> pg_class works.  There's no problem right away when you first create
> the partitioned index, because you can just pick names out of a hat
> using whatever name-generating algorithm seems best.  However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names.  If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.
> 
> These are pretty annoying corner-case bugs because they're not likely
> to come up very often.  Most people won't notice or care if the index
> name changes.  But I don't think it's acceptable to just ignore the
> problem.

I agree it's a problem that needs to be addressed directly.

> An idea I had was to treat the abstract index - to use your
> term - sort of the way we treat an extension.  Normally, when you
> create an index on a partitioned table, it cascades, but for dump and
> restore purpose, we tag on some syntax that says "well, don't actually
> create the subordinate indexes, i'll tell you about those later".
> Then for each subordinate index we issue a separate CREATE INDEX
> command followed by ALTER INDEX abstract_index ATTACH PARTITION
> concrete_index or something of that sort.  That means you can't
> absolutely count on the parent index to have all of the children it's
> supposed to have but maybe that's OK.

Hmm ... yeah, ATTACH and DETACH sound acceptable to me.  On DETACH, the
abstract index should be marked indisvalid=false unless a substitute
index already exists; and on ATTACH when indisvalid=false we verify that
all local indexes exist, and if so we can flip indisvalid.  That way, we
can continue to rely on the parent index always having all its children
when the flag is set.

I'm not clear on a syntax that creates the main index and hopes to later
have the sub-indexes created.  Another approach is to do it the other
way around, i.e. create the children first, then once they're all in
place create the main one normally, which merely verifies that all the
requisite children exist.  This is related to what I proposed to occur
when a regular table is joined as a partition of the partitioned table:
we run a verification that an index matching the parent's abstract
indexes exists, and if not we raise an error.  (Alternatively we could
allow the case, and mark the abstract index as indisvalid=false, but
that seems to violate POLA).

> Another thing that would let you do is CREATE INDEX CONCURRENTLY
> replacement_concrete_index; ALTER INDEX abstract_index DETACH
> PARTITION old_concrete_index, ATTACH PARTITION
> replacement_concrete_index; DROP INDEX CONCURRENTLY
> old_concrete_index, which seems like a thing someone might want to do.

Yeah, this is a point I explicitly mentioned, and this proposal seems
like a good way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-09 Thread Ashutosh Bapat
On Sat, Oct 7, 2017 at 7:04 PM, Robert Haas  wrote:
> On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera  
> wrote:
>> 2. create one index for each existing partition.  These would be
>>identical to what would happen if you created the index directly on
>>each partition, except that there is an additional dependency to the
>>parent's abstract index.
>
> One thing I'm a bit worried about is how to name these subordinate
> indexes.  They have to have names because that's how pg_class works,
> and those names can't all be the same, again because that's how
> pg_class works.  There's no problem right away when you first create
> the partitioned index, because you can just pick names out of a hat
> using whatever name-generating algorithm seems best.  However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names.  If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.
>
> These are pretty annoying corner-case bugs because they're not likely
> to come up very often.  Most people won't notice or care if the index
> name changes.  But I don't think it's acceptable to just ignore the
> problem.  An idea I had was to treat the abstract index - to use your
> term - sort of the way we treat an extension.  Normally, when you
> create an index on a partitioned table, it cascades, but for dump and
> restore purpose, we tag on some syntax that says "well, don't actually
> create the subordinate indexes, i'll tell you about those later".
> Then for each subordinate index we issue a separate CREATE INDEX
> command followed by ALTER INDEX abstract_index ATTACH PARTITION
> concrete_index or something of that sort.  That means you can't
> absolutely count on the parent index to have all of the children it's
> supposed to have but maybe that's OK.

+1.

How about CREATE INDEX ... PARTITION OF ... FOR TABLE ...? to create
the index and attach it?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-07 Thread Maksim Milyutin

07.10.17 16:34, Robert Haas wrote:


On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera  wrote:
One thing I'm a bit worried about is how to name these subordinate
indexes.  They have to have names because that's how pg_class works,
and those names can't all be the same, again because that's how
pg_class works.  There's no problem right away when you first create
the partitioned index, because you can just pick names out of a hat
using whatever name-generating algorithm seems best.  However, when
you dump-and-restore (including but not limited to the pg_upgrade
case) you've got to preserve those names.  If you just generate a new
name that may or may not be the same as the old one, then it may
collide with a user-specified name that only occurs later in the dump.
Also, you'll have trouble if the user has applied a COMMENT or a
SECURITY LABEL to the index because that command works by name, or if
the user has a reference to the index name inside a function or
whatever.

These are pretty annoying corner-case bugs because they're not likely
to come up very often.  Most people won't notice or care if the index
name changes.  But I don't think it's acceptable to just ignore the
problem.  An idea I had was to treat the abstract index - to use your
term - sort of the way we treat an extension.  Normally, when you
create an index on a partitioned table, it cascades, but for dump and
restore purpose, we tag on some syntax that says "well, don't actually
create the subordinate indexes, i'll tell you about those later".
Then for each subordinate index we issue a separate CREATE INDEX
command followed by ALTER INDEX abstract_index ATTACH PARTITION
concrete_index or something of that sort.  That means you can't
absolutely count on the parent index to have all of the children it's
supposed to have but maybe that's OK.


AFAICS, the main problem with naming is generating new unique names for 
subordinate indexes on the stage of migrating data scheme (pg_dump, 
pg_upgrade, etc). And we cannot specify these names in the 'CREATE INDEX 
partitioned_index' statement therefore we have to regenerate their.


In this case I propose to restore index names' hierarchy *bottom-up*, 
i.e. first of all create indexes for the leaf partitions and then create 
ones for parents up to root explicitly specifying names. When creating 
index on parent table we have to check is there exist any index on child 
table that could be child index (identical criteria). If so, not 
generate new index but implicitly attach that index into parent one.
If we have incomplete index hierarchy, e.g. we dropped some indexes of 
partitions previously, then recreating of parent's index would 
regenerate (not attach) indexes for those partitions. We could drop 
those odd generated indexes after building of parent's index. This 
decision is not straightforward but provides to consider 'CREATE INDEX 
paritioned_table' statement as a cascade operation.
As a result, we can specify name for each concrete index while 
recreating a whole hierarchy.


--
Regards,
Maksim Milyutin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-07 Thread Robert Haas
On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera  wrote:
> 2. create one index for each existing partition.  These would be
>identical to what would happen if you created the index directly on
>each partition, except that there is an additional dependency to the
>parent's abstract index.

One thing I'm a bit worried about is how to name these subordinate
indexes.  They have to have names because that's how pg_class works,
and those names can't all be the same, again because that's how
pg_class works.  There's no problem right away when you first create
the partitioned index, because you can just pick names out of a hat
using whatever name-generating algorithm seems best.  However, when
you dump-and-restore (including but not limited to the pg_upgrade
case) you've got to preserve those names.  If you just generate a new
name that may or may not be the same as the old one, then it may
collide with a user-specified name that only occurs later in the dump.
Also, you'll have trouble if the user has applied a COMMENT or a
SECURITY LABEL to the index because that command works by name, or if
the user has a reference to the index name inside a function or
whatever.

These are pretty annoying corner-case bugs because they're not likely
to come up very often.  Most people won't notice or care if the index
name changes.  But I don't think it's acceptable to just ignore the
problem.  An idea I had was to treat the abstract index - to use your
term - sort of the way we treat an extension.  Normally, when you
create an index on a partitioned table, it cascades, but for dump and
restore purpose, we tag on some syntax that says "well, don't actually
create the subordinate indexes, i'll tell you about those later".
Then for each subordinate index we issue a separate CREATE INDEX
command followed by ALTER INDEX abstract_index ATTACH PARTITION
concrete_index or something of that sort.  That means you can't
absolutely count on the parent index to have all of the children it's
supposed to have but maybe that's OK.

Another thing that would let you do is CREATE INDEX CONCURRENTLY
replacement_concrete_index; ALTER INDEX abstract_index DETACH
PARTITION old_concrete_index, ATTACH PARTITION
replacement_concrete_index; DROP INDEX CONCURRENTLY
old_concrete_index, which seems like a thing someone might want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-06 Thread Maksim Milyutin

Hi!


On 06.10.2017 19:37, Alvaro Herrera wrote:

As you propose, IMO this new feature would use the standard index
creation syntax:
CREATE [UNIQUE] INDEX someindex ON parted_table (a, b);

This command currently throws an error.  We'd have it do two things:

1. create catalog rows in pg_class and pg_index for the main table,
indicating the existance of this partitioned index.  These would not
point to an actual index (since the main table itself is empty), but
instead to an "abstract" index.  This abstract index can be used by
various operations; see below.


Robert Haas proposed[1] to use the name "partitioned index" (instead of 
abstract) that have to be reflected in 'relkind' field of pg_class as 
the RELKIND_PARTITIONED_INDEX value.



I propose that an index declared UNIQUE throws an error for now.  We can
implement uniqueness (for the case where the indexed columns match the
partitioning key) later, once we sort out all the issues here first.  I
think unique indexes would be very useful even with that limitation, but
let's have it as a separate project.


Yes, global uniqueness through local unique indexes causes further work 
related with foreign keys on partitioned table, full support of INSERT 
OF CONFLICT, etc. It make sense to implement after the current stage of 
work.



I think using pg_depend as the mechanism to link partition indexes to
parent is a bad idea.  How about pg_inherits instead?  Seems more
appropriate.


Greg Stark proposed[2] to use new pg_index field of oid type that refers 
to the parent pg_index item. AFAIC pg_inherits also makes sense but 
semantically it deals with inheriting tables. IMHO the using of this 
catalog table to define relation between partitioned table and 
partitions looks like a hack to make use of constraint exclusion logic 
for partition pruning.



Creating hierachy-wide indexes for existing partitioned tables is likely
to take a long time, so we must include CONCURRENTLY as an option.  This
will need some transaction machinery support in order to ensure that
each partition gets its index created at some point in a long chain of
transactions, and that the whole thing is marked valid only at the end.
Also, if the creation is interrupted (because of a crash or a regular
shutdown), it'll be useful to be able to continue rather than being
forced to start from scratch.


This option was very difficult for me. I would be interested to see the 
implementation.



During ALTER TABLE ... ATTACH PARTITION, we check that an indexing
satisfying the abstract index exist (and we create a pg_inherit link).
If not, the command is aborted.


We could create necessary index for partition, not abort ALTER TABLE ... 
ATTACH PARTITION statement.



We need to come up with some way to generate names for each partition
index.


I think the calling 'ChooseIndexName(RelationGetRelationName(childrel), 
namespaceId, indexColNames, ...)' resolves this problem.



I am going to work on this now.


It will be great! I think this project is difficult for me on the part 
of integration described above functionality with the legacy postgres 
code. Also IMO this project is very important because it opens the way 
for such feature as global uniqueness of fields of partitioned tables. 
And any protraction in implementation is bad.


I would like to review and test your intermediate results.


1. 
https://www.postgresql.org/message-id/CA%2BTgmoY5UOUnW%3DMcwT7xUB_2W5dAkvOg5kD20Spx5gF-Ad47cA%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAM-w4HOVftuv5RVi3a%2BsRV6nBpg204w7%3DL8MwPXVvYBFo1uM1Q%40mail.gmail.com


--
Regards,
Maksim Milyutin



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-06 Thread Alvaro Herrera
Hello

I've been thinking about this issue too.  I think your patch is not
ambitious enough.  Here's my brain dump on the issue, to revive
discussion.

As you propose, IMO this new feature would use the standard index
creation syntax:
   CREATE [UNIQUE] INDEX someindex ON parted_table (a, b);

This command currently throws an error.  We'd have it do two things:

1. create catalog rows in pg_class and pg_index for the main table,
   indicating the existance of this partitioned index.  These would not
   point to an actual index (since the main table itself is empty), but
   instead to an "abstract" index.  This abstract index can be used by
   various operations; see below.

2. create one index for each existing partition.  These would be
   identical to what would happen if you created the index directly on
   each partition, except that there is an additional dependency to the
   parent's abstract index.

If partitions are themselves partitioned, we would recursively apply the
indexes to the sub-partitions by doing (1) above for the partitioned
partition.

Once the index has been created for all existing partitions, the
hierarchy-wide index becomes valid and can be used normally by the
planner/executor.  I think we could use the pg_index.indisvalid property
for the abstract index for this.

I propose that an index declared UNIQUE throws an error for now.  We can
implement uniqueness (for the case where the indexed columns match the
partitioning key) later, once we sort out all the issues here first.  I
think unique indexes would be very useful even with that limitation, but
let's have it as a separate project.

I think using pg_depend as the mechanism to link partition indexes to
parent is a bad idea.  How about pg_inherits instead?  Seems more
appropriate.


Creating hierachy-wide indexes for existing partitioned tables is likely
to take a long time, so we must include CONCURRENTLY as an option.  This
will need some transaction machinery support in order to ensure that
each partition gets its index created at some point in a long chain of
transactions, and that the whole thing is marked valid only at the end.
Also, if the creation is interrupted (because of a crash or a regular
shutdown), it'll be useful to be able to continue rather than being
forced to start from scratch.

When a new partition is added, indexes satisfying the partitioned
table's abstract indexes are created automatically.

During ALTER TABLE ... ATTACH PARTITION, we check that an indexing
satisfying the abstract index exist (and we create a pg_inherit link).
If not, the command is aborted.

REINDEX is easily supported (just reindex each partition's index
individually), but that command is blocking, which is not good.  For
concurrent operation for tables not partitioned, a typical pattern to
avoid blocking the table is to suggest creation of an identical index
using CREATE INDEX CONCURRENTLY, then drop the original one.  That's not
going to work with these partitioned indexes, which is going to be a
problem.  I don't have any great ideas about this part yet.

We need to come up with some way to generate names for each partition
index.


I am going to work on this now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-08-10 Thread Maksim Milyutin

10.08.17 23:01, Robert Haas wrote:


On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin
 wrote:

Ok, thanks for the feedback. Then I'll use a new relkind for local
partitioned index in further development.

Any update on this?



I'll continue to work soon. Sorry for so long delay.

--
Regards,
Maksim Milyutin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-08-10 Thread Robert Haas
On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin
 wrote:
> Ok, thanks for the feedback. Then I'll use a new relkind for local
> partitioned index in further development.

Any update on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-19 Thread Maksim Milyutin

On 19.04.2017 11:42, Ashutosh Bapat wrote:

On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin
 wrote:


Local partitioned indexes can be recognized through the check on the relkind
of table to which the index refers. Something like this:

heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
/* indexid is local index on partitioned table */


An index on partitioned table can be global index (yet to be
implemented) or a local index. We can not differentiate between those
just by looking at the relation on which they are built.



We could to refine the criteria for the local partitioned index later 
encapsulating it in a macro, e.g., adding a new flag from pg_index that 
differentiate the type of index on partitioned table.




Thеsе cases must be caught. But as much as partitioned tables doesn't
participate in query plans their indexes are unaccessible by executor.
Reindex operation is overloaded with my patch.



A global index would have storage for a partitioned table whereas a
local index wouldn't have any storage for a partitioned table.

I agree with Amit that we need new relkinds for local as well as global indexes.



Ok, thanks for the feedback. Then I'll use a new relkind for local 
partitioned index in further development.




--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-19 Thread Ashutosh Bapat
On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin
 wrote:
> On 18.04.2017 13:08, Amit Langote wrote:
>>
>> Hi,
>>
>
> Hi, Amit!
>
>> On 2017/04/17 23:00, Maksim Milyutin wrote:
>>>
>>>
>>> Ok, thanks for the note.
>>>
>>> But I want to discuss the relevancy of introduction of a new relkind for
>>> partitioned index. I could to change the control flow in partitioned
>>> index
>>> creation (specify conditional statement in the 'index_create' routine in
>>> attached patch) and not enter to the 'heap_create' routine. This case
>>> releases us from integrating new relkind into different places of
>>> Postgres
>>> code. But we have to copy-paste some specific code from 'heap_create'
>>> function, e.g., definition of relfilenode and tablespaceid for the new
>>> index and perhaps something more when 'heap_create' routine will be
>>> extended.
>>
>>
>> I may be missing something, but isn't it that a new relkind will be needed
>> anyway?  How does the rest of the code distinguish such index objects once
>> they are created?
>
>
> Local partitioned indexes can be recognized through the check on the relkind
> of table to which the index refers. Something like this:
>
> heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
> if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> /* indexid is local index on partitioned table */

An index on partitioned table can be global index (yet to be
implemented) or a local index. We can not differentiate between those
just by looking at the relation on which they are built.

>
>> Is it possible that some other code may try to access
>> the storage for an index whose indrelid is a partitioned table?
>>
>
> Thеsе cases must be caught. But as much as partitioned tables doesn't
> participate in query plans their indexes are unaccessible by executor.
> Reindex operation is overloaded with my patch.
>

A global index would have storage for a partitioned table whereas a
local index wouldn't have any storage for a partitioned table.

I agree with Amit that we need new relkinds for local as well as global indexes.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-18 Thread Maksim Milyutin

On 18.04.2017 13:08, Amit Langote wrote:

Hi,



Hi, Amit!


On 2017/04/17 23:00, Maksim Milyutin wrote:


Ok, thanks for the note.

But I want to discuss the relevancy of introduction of a new relkind for
partitioned index. I could to change the control flow in partitioned index
creation (specify conditional statement in the 'index_create' routine in
attached patch) and not enter to the 'heap_create' routine. This case
releases us from integrating new relkind into different places of Postgres
code. But we have to copy-paste some specific code from 'heap_create'
function, e.g., definition of relfilenode and tablespaceid for the new
index and perhaps something more when 'heap_create' routine will be extended.


I may be missing something, but isn't it that a new relkind will be needed
anyway?  How does the rest of the code distinguish such index objects once
they are created?


Local partitioned indexes can be recognized through the check on the 
relkind of table to which the index refers. Something like this:


heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
/* indexid is local index on partitioned table */


Is it possible that some other code may try to access
the storage for an index whose indrelid is a partitioned table?



Thеsе cases must be caught. But as much as partitioned tables doesn't 
participate in query plans their indexes are unaccessible by executor. 
Reindex operation is overloaded with my patch.



--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-18 Thread Amit Langote
Hi,

On 2017/04/17 23:00, Maksim Milyutin wrote:
> On 10.04.2017 14:20, Robert Haas wrote:
>> On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin
>>  wrote:
>>> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX
>>> (literal 'l').
>>
>> Seems like it should maybe be RELKIND_PARTITIONED_INDEX.  There's
>> nothing particularly "local" about it.  I suppose what you're going
>> for is that it's not global, but in a way it *is* global to the
>> partitioning hierarchy.  That's the point.  It's just that it's
>> partitioned.
>>
> 
> Ok, thanks for the note.
> 
> But I want to discuss the relevancy of introduction of a new relkind for
> partitioned index. I could to change the control flow in partitioned index
> creation (specify conditional statement in the 'index_create' routine in
> attached patch) and not enter to the 'heap_create' routine. This case
> releases us from integrating new relkind into different places of Postgres
> code. But we have to copy-paste some specific code from 'heap_create'
> function, e.g., definition of relfilenode and tablespaceid for the new
> index and perhaps something more when 'heap_create' routine will be extended.

I may be missing something, but isn't it that a new relkind will be needed
anyway?  How does the rest of the code distinguish such index objects once
they are created?  Is it possible that some other code may try to access
the storage for an index whose indrelid is a partitioned table?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-17 Thread Maksim Milyutin

On 10.04.2017 14:20, Robert Haas wrote:

On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin
 wrote:

1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX
(literal 'l').


Seems like it should maybe be RELKIND_PARTITIONED_INDEX.  There's
nothing particularly "local" about it.  I suppose what you're going
for is that it's not global, but in a way it *is* global to the
partitioning hierarchy.  That's the point.  It's just that it's
partitioned.



Ok, thanks for the note.

But I want to discuss the relevancy of introduction of a new relkind for 
partitioned index. I could to change the control flow in partitioned 
index creation (specify conditional statement in the 'index_create' 
routine in attached patch) and not enter to the 'heap_create' routine. 
This case releases us from integrating new relkind into different places 
of Postgres code. But we have to copy-paste some specific code from 
'heap_create' function, e.g., definition of relfilenode and tablespaceid 
for the new index and perhaps something more when 'heap_create' routine 
will be extended.


What do you think about this way?


--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2328b92..9c15bc9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_constraint_fn.h"
+#include "catalog/pg_depend.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_tablespace.h"
@@ -849,17 +850,33 @@ index_create(Relation heapRelation,
 	 * we fail further down, it's the smgr's responsibility to remove the disk
 	 * file again.)
 	 */
-	indexRelation = heap_create(indexRelationName,
-namespaceId,
-tableSpaceId,
-indexRelationId,
-relFileNode,
-indexTupDesc,
-RELKIND_INDEX,
-relpersistence,
-shared_relation,
-mapped_relation,
-allow_system_table_mods);
+	if (heapRelation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		indexRelation = heap_create(indexRelationName,
+	namespaceId,
+	tableSpaceId,
+	indexRelationId,
+	relFileNode,
+	indexTupDesc,
+	RELKIND_INDEX,
+	relpersistence,
+	shared_relation,
+	mapped_relation,
+	allow_system_table_mods);
+	else
+		indexRelation =
+			RelationBuildLocalRelation(indexRelationName,
+	   namespaceId,
+	   indexTupDesc,
+	   indexRelationId,
+	   (OidIsValid(relFileNode)) ?
+		relFileNode : indexRelationId,
+	   (tableSpaceId == MyDatabaseTableSpace) ?
+		InvalidOid : tableSpaceId,
+	   shared_relation,
+	   mapped_relation,
+	   relpersistence,
+	   RELKIND_INDEX);
+
 
 	Assert(indexRelationId == RelationGetRelid(indexRelation));
 
@@ -1549,10 +1566,14 @@ index_drop(Oid indexId, bool concurrent)
 		TransferPredicateLocksToHeapRelation(userIndexRelation);
 	}
 
-	/*
-	 * Schedule physical removal of the files
-	 */
-	RelationDropStorage(userIndexRelation);
+	if (userHeapRelation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+
+		/*
+		 * Schedule physical removal of the files
+		 */
+		RelationDropStorage(userIndexRelation);
+	}
 
 	/*
 	 * Close and flush the index's relcache entry, to ensure relcache doesn't
@@ -3294,6 +3315,111 @@ IndexGetRelation(Oid indexId, bool missing_ok)
 }
 
 /*
+ * Find all leaf indexes included into local index with 'indexId' oid and lock
+ * all dependent indexes and respective relations.
+ *
+ * Search is performed in pg_depend table since all indexes belonging to child
+ * tables depends on index from parent table.
+ *
+ *	indexId: the oid of local index whose leaf indexes need to find
+ *	result: list of result leaf indexes
+ *	depRel: already opened pg_depend relation
+ *	indexLockmode: lockmode for indexes' locks
+ *	heapLockmode: lockmode for relations' locks
+ */
+static void
+findDepedentLeafIndexes(Oid indexId, List **result, Relation depRel,
+		LOCKMODE indexLockmode, LOCKMODE heapLockmode)
+{
+	ScanKeyData 		key[3];
+	int	nkeys;
+	SysScanDesc 		scan;
+	HeapTuple			tup;
+	List			   *localSubIndexIds = NIL;
+	ListCell		   *lc;
+
+	ScanKeyInit([0],
+Anum_pg_depend_refclassid,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(RelationRelationId));
+	ScanKeyInit([1],
+Anum_pg_depend_refobjid,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(indexId));
+	nkeys = 2;
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+			  NULL, nkeys, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend 	foundDep = (Form_pg_depend) GETSTRUCT(tup);
+		Relation		index,
+		heap;
+
+		if (foundDep->classid != RelationRelationId)
+			continue;
+
+	

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-10 Thread Maksim Milyutin

On 10.04.2017 13:46, Greg Stark wrote:

On 4 April 2017 at 17:10, Maksim Milyutin  wrote:


3. As I noticed early pg_depend table is used for cascade deleting indexes
on partitioned table and its children. I also use pg_depend to determine
relationship between parent and child indexes when reindex executes
recursively on child indexes.

Perhaps, it's not good way to use pg_depend to determine the relationship
between parent and child indexes because the kind of this relationship is
not defined. I could propose to add into pg_index table specific field of
'oidvector' type that specify oids of dependent indexes for the current
local index.



Alternately you could have an single oid in pg_index on each of the
children that specifies which local index is its parent. That would
probably require a new index on that column so you could look up all
the children efficiently.

I think it would behave more sensibly when you're adding or removing a
partition, especially if you want to add many partitions in parallel
using multiple transactions. An oidvector of children would
effectively mean you could only be doing one partition creation or
deletion at a time.



Thanks for your comment. Your approach sounds better than mine. I'll try it.

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-10 Thread Robert Haas
On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin
 wrote:
> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX
> (literal 'l').

Seems like it should maybe be RELKIND_PARTITIONED_INDEX.  There's
nothing particularly "local" about it.  I suppose what you're going
for is that it's not global, but in a way it *is* global to the
partitioning hierarchy.  That's the point.  It's just that it's
partitioned.

> Perhaps, it's not good way to use pg_depend to determine the relationship
> between parent and child indexes because the kind of this relationship is
> not defined. I could propose to add into pg_index table specific field of
> 'oidvector' type that specify oids of dependent indexes for the current
> local index.

I agree with Greg's comment on this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-10 Thread Greg Stark
On 4 April 2017 at 17:10, Maksim Milyutin  wrote:
>
> 3. As I noticed early pg_depend table is used for cascade deleting indexes
> on partitioned table and its children. I also use pg_depend to determine
> relationship between parent and child indexes when reindex executes
> recursively on child indexes.
>
> Perhaps, it's not good way to use pg_depend to determine the relationship
> between parent and child indexes because the kind of this relationship is
> not defined. I could propose to add into pg_index table specific field of
> 'oidvector' type that specify oids of dependent indexes for the current
> local index.


Alternately you could have an single oid in pg_index on each of the
children that specifies which local index is its parent. That would
probably require a new index on that column so you could look up all
the children efficiently.

I think it would behave more sensibly when you're adding or removing a
partition, especially if you want to add many partitions in parallel
using multiple transactions. An oidvector of children would
effectively mean you could only be doing one partition creation or
deletion at a time.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-04 Thread Maksim Milyutin

On 01.03.2017 13:53, Maksim Milyutin wrote:

Hi hackers!

As I've understood from thread [1] the main issue of creating local
indexes for partitions is supporting REINDEX and DROP INDEX operations
on parent partitioned tables. Furthermore Robert Haas mentioned the
problem of creating index on key that is represented in partitions with
single value (or primitive interval) [1] i.e. under the
list-partitioning or range-partitioning with unit interval.

I would like to propose the following solution:

1. Create index for hierarchy of partitioned tables and partitions
recursively. Don't create relfilenode for indexes on parents, only
entries in catalog (much like the partitioned table's storage
elimination in [2]). Abstract index for partitioned tables is only for
the reference on indexes of child tables to perform REINDEX and DROP
INDEX operations.

2. Specify created indexes in pg_depend table so that indexes of child
tables depend on corresponding indexes of parent tables with type of
dependency DEPENDENCY_NORMAL so that index could be removed separately
for partitions and recursively/separately for partitioned tables.

3. REINDEX on index of partitioned table would perform this operation on
existing indexes of corresponding partitions. In this case it is
necessary to consider such operations as REINDEX SCHEMA | DATABASE |
SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times
in a row.

Any thoughts?

1.
https://www.postgresql.org/message-id/CA+TgmoZUwj=qynak+f7xef4w_e2g3xxdmnsnzmzjuinhrco...@mail.gmail.com

2.
https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp




I want to present the first version of patches that implement local 
indexes for partitioned tables and discuss some technical details of 
that implementation.



1. I have added a new relkind for local indexes named 
RELKIND_LOCAL_INDEX (literal 'l').


This was done because physical storage is created in the 'heap_create' 
function and we need to revoke the creating storage as with partitioned 
tables. Since information that this index belongs to partitioned tables 
is not available in 'heap_create' function (pg_index entry on the index 
is not created yet) I chose the least painful way - added a specific 
relkind for index on partitioned table.
I suppose that this act will require the integrating new relkind to 
different places of source code so I'm ready to consider another 
proposals on this point.


2. My implementation doesn't support the concurrent creating of local 
index (CREATE INDEX CONCURRENTLY). As I understand, this operation 
involves nontrivial manipulation with snapshots and I don't know how to 
implement concurrent creating of multiple indexes. In this point I ask 
help from community.


3. As I noticed early pg_depend table is used for cascade deleting 
indexes on partitioned table and its children. I also use pg_depend to 
determine relationship between parent and child indexes when reindex 
executes recursively on child indexes.


Perhaps, it's not good way to use pg_depend to determine the 
relationship between parent and child indexes because the kind of this 
relationship is not defined. I could propose to add into pg_index table 
specific field of 'oidvector' type that specify oids of dependent 
indexes for the current local index.



On this stage I want to discuss only technical details of local indexes' 
implementation. The problems related to merging existing indexes of 
partitions within local index tree, determination uniqueness of field in 
global sense through local index and syntax notes I want to arise later.



CC welcome!

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index cc5ac8b..bec3983 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -154,7 +154,8 @@ index_open(Oid relationId, LOCKMODE lockmode)
 
 	r = relation_open(relationId, lockmode);
 
-	if (r->rd_rel->relkind != RELKIND_INDEX)
+	if (r->rd_rel->relkind != RELKIND_INDEX &&
+		r->rd_rel->relkind != RELKIND_LOCAL_INDEX)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("\"%s\" is not an index",
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fc088b2..26a10e9 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1107,7 +1107,8 @@ doDeletion(const ObjectAddress *object, int flags)
 			{
 char		relKind = get_rel_relkind(object->objectId);
 
-if (relKind == RELKIND_INDEX)
+if (relKind == RELKIND_INDEX ||
+	relKind == RELKIND_LOCAL_INDEX)
 {
 	bool		concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY) != 0);
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 36917c8..91ac740 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -293,6 +293,7 @@ 

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-03-02 Thread Maksim Milyutin

On 02.03.2017 11:41, Robert Haas wrote:

Sounds generally good.  One thing to keep in mind is that - in this
system - a UNIQUE index on the parent doesn't actually guarantee
uniqueness across the whole partitioning hierarchy unless it so
happens that the index columns or expressions are the same as the
partitioning columns or expressions.  That's a little a
counterintuitive, and people have already been complaining that a
partitioned table + partitions doesn't look enough like a plain table.
However, I'm not sure there's a better alternative, because somebody
might want partition-wise unique indexes even though that doesn't
guarantee global uniqueness.  So I think if someday we have global
indexes, then we can plan to use some other syntax for that, like
CREATE GLOBAL [ UNIQUE ] INDEX.


Yes, I absolutely agree with your message that cross-partition 
uniqueness is guaranteed through global index on partitioned table apart 
from the case when the index key are the same as partitioning key (or 
index comprises partitioning key in general).


Thanks for your comment. I'll try to propose the first patches as soon 
as possible.


--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-03-02 Thread Robert Haas
On Wed, Mar 1, 2017 at 4:23 PM, Maksim Milyutin
 wrote:
> As I've understood from thread [1] the main issue of creating local indexes
> for partitions is supporting REINDEX and DROP INDEX operations on parent
> partitioned tables. Furthermore Robert Haas mentioned the problem of
> creating index on key that is represented in partitions with single value
> (or primitive interval) [1] i.e. under the list-partitioning or
> range-partitioning with unit interval.
>
> I would like to propose the following solution:
>
> 1. Create index for hierarchy of partitioned tables and partitions
> recursively. Don't create relfilenode for indexes on parents, only entries
> in catalog (much like the partitioned table's storage elimination in [2]).
> Abstract index for partitioned tables is only for the reference on indexes
> of child tables to perform REINDEX and DROP INDEX operations.
>
> 2. Specify created indexes in pg_depend table so that indexes of child
> tables depend on corresponding indexes of parent tables with type of
> dependency DEPENDENCY_NORMAL so that index could be removed separately for
> partitions and recursively/separately for partitioned tables.
>
> 3. REINDEX on index of partitioned table would perform this operation on
> existing indexes of corresponding partitions. In this case it is necessary
> to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that
> partitions' indexes wouldn't be re-indexed multiple times in a row.
>
> Any thoughts?

Sounds generally good.  One thing to keep in mind is that - in this
system - a UNIQUE index on the parent doesn't actually guarantee
uniqueness across the whole partitioning hierarchy unless it so
happens that the index columns or expressions are the same as the
partitioning columns or expressions.  That's a little a
counterintuitive, and people have already been complaining that a
partitioned table + partitions doesn't look enough like a plain table.
However, I'm not sure there's a better alternative, because somebody
might want partition-wise unique indexes even though that doesn't
guarantee global uniqueness.  So I think if someday we have global
indexes, then we can plan to use some other syntax for that, like
CREATE GLOBAL [ UNIQUE ] INDEX.

But, of course, that's just my opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: Local indexes for partitioned table

2017-03-01 Thread Maksim Milyutin

Hi hackers!

As I've understood from thread [1] the main issue of creating local 
indexes for partitions is supporting REINDEX and DROP INDEX operations 
on parent partitioned tables. Furthermore Robert Haas mentioned the 
problem of creating index on key that is represented in partitions with 
single value (or primitive interval) [1] i.e. under the 
list-partitioning or range-partitioning with unit interval.


I would like to propose the following solution:

1. Create index for hierarchy of partitioned tables and partitions 
recursively. Don't create relfilenode for indexes on parents, only 
entries in catalog (much like the partitioned table's storage 
elimination in [2]). Abstract index for partitioned tables is only for 
the reference on indexes of child tables to perform REINDEX and DROP 
INDEX operations.


2. Specify created indexes in pg_depend table so that indexes of child 
tables depend on corresponding indexes of parent tables with type of 
dependency DEPENDENCY_NORMAL so that index could be removed separately 
for partitions and recursively/separately for partitioned tables.


3. REINDEX on index of partitioned table would perform this operation on 
existing indexes of corresponding partitions. In this case it is 
necessary to consider such operations as REINDEX SCHEMA | DATABASE | 
SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times 
in a row.


Any thoughts?

1. 
https://www.postgresql.org/message-id/CA+TgmoZUwj=qynak+f7xef4w_e2g3xxdmnsnzmzjuinhrco...@mail.gmail.com
2. 
https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp


--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers