Re: [HACKERS] Syntax for partitioning

2011-11-16 Thread Dimitri Fontaine
Martijn van Oosterhout klep...@svana.org writes:
 That said, I still don't see how you can enforce a unique index over
 multiple segments over something other than the partition key while
 still allowing quick dropping of segments.  If you can fix that you can
 make it work for the current inheritence-style partitioning.

Well the Primary Key and the Physical Map Index do not need to be on the
same set of columns.

 If you happen to drop a part of the data that fits in one or more
 segments (and with a decent fillfactor you need less than 1GB to get
 there), then you can unlink() whole files at a time.  That would be the
 goal here.

 I feel uncomfortable with the happen to. You can add the magic too,
 but for scripting purposes I'd feel better if it could be done via DDL
 also. That way typos don't end up being 5 day queries all of a sudden.

If the data fills less than a segment then you can't unlink() the file,
you have to mark the tuples / pages as free space.  If you have a
partial index matching the whole portion of data you're removing, you
can still drop it before hand — or maybe the system can be instructed to
do so?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-14 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 to match the desired granularity of data removal.  I don't really see
 any way that the database can be expected to know what that is, unless
 it's told in advance.  So AFAICS you really have to have a declarative
 way of telling it how to do the partitioning --- it's not going to be
 able to infer that automatically.

Yes, I'm taking that back. Declarative is not the same thing as explicit
partitioning though, that index like physical map is declarative too,
e.g.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-14 Thread Dimitri Fontaine
Martijn van Oosterhout klep...@svana.org writes:
 While I agree that explicit partitioning is somewhat of a hack, it's a
 really useful hack.  But for me the most important use of partitioning
 is dropping a billion rows efficiently and getting the disk space
 back.  And the biggest problem is always that dropping blocks of a
 table requires fixing all the indexes.

The problem with partitions that are in fact table is that the index are
separated and you can't enforce unique globally in the partition set.

Even with that physical map idea (segment based partitioning, but
allowing a finer control than segments), you could still maintain any
number of partial indexes, but still use a single primary key e.g.

 However, in the very special case where the drop boundaries explicitly
 match the dataset, you can simply drop all the indexes.

That's the idea with partial indexes too, right?

 Now, if someone cames up with an efficient way to drop a huge number of
 rows quickly, then I admit one of the major issues is fixed.  But
 recovering the disk space is much harder.  Yes, recent versions of
 Linux come with ways to punch holes in existing files, but that doesn't
 make it quick or efficient.

If you happen to drop a part of the data that fits in one or more
segments (and with a decent fillfactor you need less than 1GB to get
there), then you can unlink() whole files at a time.  That would be the
goal here.

 I hope so, but I'm not sure I'd like partitioning support to wait on
 someone hitting on the right idea.

I would think that's exactly what's been happening to us for several
years already.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-14 Thread Martijn van Oosterhout
On Mon, Nov 14, 2011 at 10:33:36AM +0100, Dimitri Fontaine wrote:
 Martijn van Oosterhout klep...@svana.org writes:
  While I agree that explicit partitioning is somewhat of a hack, it's a
  really useful hack.  But for me the most important use of partitioning
  is dropping a billion rows efficiently and getting the disk space
  back.  And the biggest problem is always that dropping blocks of a
  table requires fixing all the indexes.
 
 The problem with partitions that are in fact table is that the index are
 separated and you can't enforce unique globally in the partition set.
 
 Even with that physical map idea (segment based partitioning, but
 allowing a finer control than segments), you could still maintain any
 number of partial indexes, but still use a single primary key e.g.

Ah, well, if you can come up with a way to get the advantages of
partition while still being able to enforce primary keys over
partitions, that would be A Really Cool Idea.

That said, I still don't see how you can enforce a unique index over
multiple segments over something other than the partition key while
still allowing quick dropping of segments.  If you can fix that you can
make it work for the current inheritence-style partitioning.

 If you happen to drop a part of the data that fits in one or more
 segments (and with a decent fillfactor you need less than 1GB to get
 there), then you can unlink() whole files at a time.  That would be the
 goal here.

I feel uncomfortable with the happen to. You can add the magic too,
but for scripting purposes I'd feel better if it could be done via DDL
also. That way typos don't end up being 5 day queries all of a sudden.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Syntax for partitioning

2011-11-12 Thread Martijn van Oosterhout
On Thu, Nov 10, 2011 at 10:19:02PM +0100, Dimitri Fontaine wrote:
 Now the aim would be to be able to implement the operation you describe
 by using the new segment map, which is an index pointing to sequential
 ranges of on-disk blocks where the data is known to share a common key
 range over the columns you're segmenting on.  I would imagine this SQL:
 
   TRUNCATE foo WHERE year  2009;
 
 As the on-disk location of the data that qualify this WHERE clause is
 known, it could be possible to (predicate) lock it and bulk remove it,
 unlinking whole segments (1GB) at a time when relevant.

While I agree that explicit partitioning is somewhat of a hack, it's a
really useful hack.  But for me the most important use of partitioning
is dropping a billion rows efficiently and getting the disk space
back.  And the biggest problem is always that dropping blocks of a
table requires fixing all the indexes.

For fixing the index of the partition key it's a simpler problem, you
could probably prune the btree relatively efficiently.  But for all
other indexes there's no better solution than walk the entire index.

However, in the very special case where the drop boundaries explicitly
match the dataset, you can simply drop all the indexes.

Now, if someone cames up with an efficient way to drop a huge number of
rows quickly, then I admit one of the major issues is fixed.  But
recovering the disk space is much harder.  Yes, recent versions of
Linux come with ways to punch holes in existing files, but that doesn't
make it quick or efficient.

  While automatic clustering would be nice, it isn't the same thing as
  partitioning.
 
 That has been my initial reaction to that kind of ideas too.  After some
 more time brewing the ideas, I'm not convinced that the use cases that
 usually drives you to the latter can't be solved with the former.

I hope so, but I'm not sure I'd like partitioning support to wait on
someone hitting on the right idea.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Syntax for partitioning

2011-11-12 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 While I agree that explicit partitioning is somewhat of a hack, it's a
 really useful hack.  But for me the most important use of partitioning
 is dropping a billion rows efficiently and getting the disk space
 back.

Right.  The only way to make that speedy is for the partition boundaries
to match the desired granularity of data removal.  I don't really see
any way that the database can be expected to know what that is, unless
it's told in advance.  So AFAICS you really have to have a declarative
way of telling it how to do the partitioning --- it's not going to be
able to infer that automatically.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-10 Thread Dimitri Fontaine
Jeff Janes jeff.ja...@gmail.com writes:
 shouldn't it need a DBA to declare it?  How is the system supposed to
 anticipate that at some point years in the future I will want to run
 the command sequence create foo_archive as select from foo where
 year2009; delete from foo where year2009, or its partition-based
 equivalent, and have it operate on several billion rows cleanly and
 quickly?  I don't think we can expect the system to anticipate what it
 has never before experienced.  This is the DBA's job.

Well, the not-fully spelled out proposal would be to still work it out
from a list of columns picked by the DBA.  I though that an existing
index would be best, but maybe just columns would be good.

I guess it's already time to play loose and invent some SQL syntax to
make it easier talking about the same thing:

  ALTER TABLE foo SEGMENT ON (year, stamp);

Now the aim would be to be able to implement the operation you describe
by using the new segment map, which is an index pointing to sequential
ranges of on-disk blocks where the data is known to share a common key
range over the columns you're segmenting on.  I would imagine this SQL:

  TRUNCATE foo WHERE year  2009;

As the on-disk location of the data that qualify this WHERE clause is
known, it could be possible to (predicate) lock it and bulk remove it,
unlinking whole segments (1GB) at a time when relevant.

 While automatic clustering would be nice, it isn't the same thing as
 partitioning.

That has been my initial reaction to that kind of ideas too.  After some
more time brewing the ideas, I'm not convinced that the use cases that
usually drives you to the latter can't be solved with the former.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-10 Thread Daniel Farina
On Thu, Nov 10, 2011 at 1:19 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Now the aim would be to be able to implement the operation you describe
 by using the new segment map, which is an index pointing to sequential
 ranges of on-disk blocks where the data is known to share a common key
 range over the columns you're segmenting on.  I would imagine this SQL:

  TRUNCATE foo WHERE year  2009;

 As the on-disk location of the data that qualify this WHERE clause is
 known, it could be possible to (predicate) lock it and bulk remove it,
 unlinking whole segments (1GB) at a time when relevant.

I am basically in agreement with you.  After wanting better
partitioning (Oracle-style) in Postgres for some time just to be free
of the mechanically painful table-inheritance version, I have come
around to thinking it's basically a bad idea, but one that with a
little bit of finessing can be made a good idea.

The reason I have started to think this is because of an old feature
that works very well: CREATE INDEX.  In spite of what people might
think, I think it's pretty clear that CREATE INDEX is not DDL: it's
actually physical advice to the system.  I have seen the
fourth-generation-language promise delivered upon quite a few times in
production, now: we witness an access pattern that becomes
problematic, we run CREATE INDEX CONCURRENTLY, the problem is solved
without any change to the application, and the index definition is
backported to our application bootstrapping process.  It would be hard
for me to understate how valuable this has been to avoid both
premature optimization and excessive panic when dealing with change.

Similar to the overall project stance on query hints, I don't think
Postgres should retreat on its ground from being a 4GL system.  I
think both indexes and a hypothetical partitioning feature should be
clearly isolated as directives to the system about how to physically
organize and access data, and any partitioning feature that creates
new relation namespace entries and expects you to manipulate them to
gain the benefits seems like extra, non-desirable surface area to me.

I think this becomes especially apparent once one considers on-line
repartitioning (I am exposing a bias here, but any feature in Postgres
that cannot be done concurrently -- like VACUUM FULL -- is very
dangerous to both me and my customers, whereas it may not be useless
or dangerous to a build-your-own data warehouse).  It feels like it
would be desirable to have the physical partitions exist in an
inconsistent-state whereby they are being brought into alignment with
the newly desired physical description.

Finally, I think a legitimate objection to this inclination is that it
can be really easy to issue a DELETE that is usually fast, but when
any mistake or change creeps in becomes very slow: I have heard from
some friends making heavy use of table partitioning via inheritance
that one of the problems is not quite exactly matching the table
constraint, and then hosing their hardware. As a result, they mangle
partitions explicitly in the application to prevent foot-gunning.
That's clearly lame (and they know it), but I think may indicate a
need to instead allow for some kind of physical-access-method
assertion checking quite apart from the logical content of the query
that can deliver a clear, crisp error to application developers if a
preferred access pattern is not usable.  My experience suggests that
while solving problems is good, turning problems into flat-out errors
is *nearly* as good, and worth some more investigation.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Thom Brown
On 15 December 2009 02:31, Jaime Casanova jcasa...@systemguards.com.ec wrote:
 On Mon, Dec 14, 2009 at 7:29 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, 2009-12-04 at 09:00 +, Simon Riggs wrote:
 On Fri, 2009-12-04 at 11:54 +0900, Itagaki Takahiro wrote:
  Here is an update partitioning syntax patch.
 
  A bug reported by Marko is fixed.

 I will review and eventually commit this, if appropriate, though it is
 3rd in my queue and will probably not be done for at least 2 weeks,
 possibly 4 weeks.

 I'll have to go back on this unfortunately, sorry about that.


 the next patch for this will arrive in the next commitfest so maybe
 you have more time then

So will this be revived any time soon?  Were there any subsequent
proposals which were posted?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Jaime Casanova
On Wed, Nov 9, 2011 at 8:26 AM, Thom Brown t...@linux.com wrote:

 So will this be revived any time soon?  Were there any subsequent
 proposals which were posted?


there was an updated patch, you can find in this thread:
http://archives.postgresql.org/message-id/20100114181323.9a33.52131...@oss.ntt.co.jp

not sure what happens after that.

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Robert Haas
On Wed, Nov 9, 2011 at 9:35 AM, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Wed, Nov 9, 2011 at 8:26 AM, Thom Brown t...@linux.com wrote:

 So will this be revived any time soon?  Were there any subsequent
 proposals which were posted?


 there was an updated patch, you can find in this thread:
 http://archives.postgresql.org/message-id/20100114181323.9a33.52131...@oss.ntt.co.jp

 not sure what happens after that.

I reviewed a later version here:

http://archives.postgresql.org/pgsql-hackers/2010-07/msg00183.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Thom Brown
On 9 November 2011 15:15, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Nov 9, 2011 at 9:35 AM, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Wed, Nov 9, 2011 at 8:26 AM, Thom Brown t...@linux.com wrote:

 So will this be revived any time soon?  Were there any subsequent
 proposals which were posted?


 there was an updated patch, you can find in this thread:
 http://archives.postgresql.org/message-id/20100114181323.9a33.52131...@oss.ntt.co.jp

 not sure what happens after that.

 I reviewed a later version here:

 http://archives.postgresql.org/pgsql-hackers/2010-07/msg00183.php

Ah yes, I've located a reference to this on the wiki now.  No wiki
updates needed.

I guess it's a matter of whether Takahiro-san has the time and desire
to pick this up again any time soon.  Whenever I cross the topic of
partitioning in PostgreSQL, it's always a tad embarrassing to explain
that it's still hacky compared to other database systems (and this
came up again last week), so this is of particular interest to me.  At
the moment there's no testing required as it's returned with feedback,
but I'm very willing to help assist in testing it should this return
to the fore again.  The idea of getting both this and materialised
views in time for 9.3 is extremely appealing; a performance release
(9.2) followed by a usability release.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Dimitri Fontaine
Thom Brown t...@linux.com writes:
 Whenever I cross the topic of
 partitioning in PostgreSQL, it's always a tad embarrassing to explain
 that it's still hacky compared to other database systems (and this
 came up again last week), so this is of particular interest to me.  At

The more I think about this problem, the more I think that the reason
why we still don't have declarative partitioning is that it basically
sucks.  Other vendors offer it because they couldn't do better, but they
are just asking the DBA to implement a service the system should be able
to care for itself.

Who knows better than PostgreSQL which part of the data are the most
often queried and how to best lay them out on disk to ensure QoS?  If
you think that's the DBA, go ask Tom to implement query hints…

More seriously, partitioning in PostgreSQL could be mostly transparent
to the users and just working: it's all about data storage locality
and we already have a sub-relation storage model. By using segment
exclusion and some level of automatic clustering (physical organization)
of data, we could have all the benefits of partitioning without the
hassle of maintaining yet another explicit level of data definition.

  In particular, what part of the declarative partitioning system is
  intended to take care about creating the right partitions before new
  data are sent to them?

In a first implementation, we could decide to partition the data over
an index that the DBA has to pick, and then maintain a segment index
which is a map of the data distribution in the physical segments, for
the indexed columns. The granularity could be different and maybe
dynamic so that you start at a block level map index for very little
tables and grow up to a segment map index for huge tables that require
thousands of segments, 1GB each.

Then the system have to organize data modifications so that it optimizes
the ranges to be the smallest possible on each map entry. And the
executor then has to be able to consult that map at query time and
exclude whole segments of data (or blocks for little tables) when the
segment indexing is able to exclude data. With some tricks because we
realize that depending on the size of the portions you skip you might
not benefit from moving the head on the platter in another way than what
the ongoing seqscan does, but we already have GUCs about that.

We might also need some internal facilities to lock out per segment (or
rather map entries) rather than per table so that we have something
like a WHERE clause support for TRUNCATE.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Thom Brown
On 9 November 2011 21:05, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Thom Brown t...@linux.com writes:
 Whenever I cross the topic of
 partitioning in PostgreSQL, it's always a tad embarrassing to explain
 that it's still hacky compared to other database systems (and this
 came up again last week), so this is of particular interest to me.  At

 The more I think about this problem, the more I think that the reason
 why we still don't have declarative partitioning is that it basically
 sucks.  Other vendors offer it because they couldn't do better, but they
 are just asking the DBA to implement a service the system should be able
 to care for itself.

 Who knows better than PostgreSQL which part of the data are the most
 often queried and how to best lay them out on disk to ensure QoS?  If
 you think that's the DBA, go ask Tom to implement query hints…

That also sounds like an argument against indexes.  Since the system
knows which parts of data are most often queried, surely it should be
managing indexes, not the DBA?

I imagine structuring data after the fact would involve rewriting data
whereas planning for upcoming data is something DBAs are expected to
do anyway using constraints, triggers, indexes etc.

And as it stands, what the DBA has to do at the moment is to exploit
table inheritance, apply constraints to each of the child tables
(mainly for constraint exclusion), then create a trigger to support
various types of update/insert/delete.  So what we have now is very
un-user-friendly, and tends to surprise end users.  The flow of data
isn't part of the table definition, it's custom-programmed into an
event on the parent table.

And partitioning may not just be about performance, it can be about
organising data and making it more manageable.  Although I agree that
if it came to partitioning across nodes for parallel access,
PostgreSQL could be in a position to make decisions about how that is
distributed.

 More seriously, partitioning in PostgreSQL could be mostly transparent
 to the users and just working: it's all about data storage locality
 and we already have a sub-relation storage model. By using segment
 exclusion and some level of automatic clustering (physical organization)
 of data, we could have all the benefits of partitioning without the
 hassle of maintaining yet another explicit level of data definition.

That could be unworkable in a high-load OLTP environment.

  In particular, what part of the declarative partitioning system is
  intended to take care about creating the right partitions before new
  data are sent to them?

 In a first implementation, we could decide to partition the data over
 an index that the DBA has to pick, and then maintain a segment index
 which is a map of the data distribution in the physical segments, for
 the indexed columns. The granularity could be different and maybe
 dynamic so that you start at a block level map index for very little
 tables and grow up to a segment map index for huge tables that require
 thousands of segments, 1GB each.

 Then the system have to organize data modifications so that it optimizes
 the ranges to be the smallest possible on each map entry. And the
 executor then has to be able to consult that map at query time and
 exclude whole segments of data (or blocks for little tables) when the
 segment indexing is able to exclude data. With some tricks because we
 realize that depending on the size of the portions you skip you might
 not benefit from moving the head on the platter in another way than what
 the ongoing seqscan does, but we already have GUCs about that.

 We might also need some internal facilities to lock out per segment (or
 rather map entries) rather than per table so that we have something
 like a WHERE clause support for TRUNCATE.

Would this solve the same set of problems that partitioning attempts
to address?  And what about the example case of quarterly data?  In
your proposed design could you drop an entire set of data without a
DELETE?

And maybe I'm not looking at it from the right angle. (probably)
Although I appreciate some thought is needed about how useful
partitioning implementations in other database systems really are.

And now to demonstrate some additional ignorance on my part... does
the standard cover this?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Greg Smith

On 11/09/2011 04:05 PM, Dimitri Fontaine wrote:

The more I think about this problem, the more I think that the reason
why we still don't have declarative partitioning is that it basically
sucks.


I think that we don't have it because no one has ever dumped the much 
larger than might be expected amount of time into pulling all the pieces 
together and smoothing out the rough parts.  I don't think there's any 
design thinking leap needed over what's already been worked out.  Just a 
lot of work to get all the edge cases right on the simplest possible 
thing that is useful.


The path to reach something that could be considered for commit includes 
something like this set of things:


1) Add partitioning catalog support
2) Create new syntax for partitioning that writes to the catalog
3) Decide how to represent partition data in memory
4) Route new INSERTed data into the right place
5) Support moving UPDATEd data into a new partition
6) Handle COPY usefully

The last rev of this submitted was still working through (1) here, i.e. 
this review from Robert:  
http://archives.postgresql.org/message-id/aanlktikp-1_8b04eyik0sdf8ua5kmo64o8sorfbze...@mail.gmail.com  
And there's a whole pile of issues I don't think have been fully 
explored about even the most basic case.  How to handle ALTER to these 
structures cleanly, locking, etc..  I don't think it's possible to 
design such that you skip a large portion of these details; someone 
needs to put some number of spend weeks+ getting them all right instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2011-11-09 Thread Jeff Janes
On Wed, Nov 9, 2011 at 1:05 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Thom Brown t...@linux.com writes:
 Whenever I cross the topic of
 partitioning in PostgreSQL, it's always a tad embarrassing to explain
 that it's still hacky compared to other database systems (and this
 came up again last week), so this is of particular interest to me.  At

 The more I think about this problem, the more I think that the reason
 why we still don't have declarative partitioning is that it basically
 sucks.  Other vendors offer it because they couldn't do better, but they
 are just asking the DBA to implement a service the system should be able
 to care for itself.

Your last sentence seems to be a better description of PostgreSQL's
current implementation of partitioning via inheritance and constraints
and triggers.  Partitioning is a service the system should be able to
care for itself, even if it does need the DBA to declare it.  And why
shouldn't it need a DBA to declare it?  How is the system supposed to
anticipate that at some point years in the future I will want to run
the command sequence create foo_archive as select from foo where
year2009; delete from foo where year2009, or its partition-based
equivalent, and have it operate on several billion rows cleanly and
quickly?  I don't think we can expect the system to anticipate what it
has never before experienced.  This is the DBA's job.


 Who knows better than PostgreSQL which part of the data are the most
 often queried and how to best lay them out on disk to ensure QoS?  If
 you think that's the DBA, go ask Tom to implement query hints…

 More seriously, partitioning in PostgreSQL could be mostly transparent
 to the users and just working: it's all about data storage locality
 and we already have a sub-relation storage model. By using segment
 exclusion and some level of automatic clustering (physical organization)
 of data, we could have all the benefits of partitioning without the
 hassle of maintaining yet another explicit level of data definition.

While automatic clustering would be nice, it isn't the same thing as
partitioning.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-12-14 Thread Simon Riggs
On Fri, 2009-12-04 at 09:00 +, Simon Riggs wrote:
 On Fri, 2009-12-04 at 11:54 +0900, Itagaki Takahiro wrote:
  Here is an update partitioning syntax patch.
  
  A bug reported by Marko is fixed.
 
 I will review and eventually commit this, if appropriate, though it is
 3rd in my queue and will probably not be done for at least 2 weeks,
 possibly 4 weeks.

I'll have to go back on this unfortunately, sorry about that.

I have enough items emerging from HS to keep me busy much longer than I
thought. I'll run with VF if that's OK, since I have some other related
changes in that area and it makes sense to understand that code also, if
OK with you.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-12-14 Thread Jaime Casanova
On Mon, Dec 14, 2009 at 7:29 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, 2009-12-04 at 09:00 +, Simon Riggs wrote:
 On Fri, 2009-12-04 at 11:54 +0900, Itagaki Takahiro wrote:
  Here is an update partitioning syntax patch.
 
  A bug reported by Marko is fixed.

 I will review and eventually commit this, if appropriate, though it is
 3rd in my queue and will probably not be done for at least 2 weeks,
 possibly 4 weeks.

 I'll have to go back on this unfortunately, sorry about that.


the next patch for this will arrive in the next commitfest so maybe
you have more time then

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-12-06 Thread Greg Smith

Simon Riggs wrote:

I will review and eventually commit this, if appropriate, though it is
3rd in my queue and will probably not be done for at least 2 weeks,
possibly 4 weeks.
  
I've marked Simon as the next reviewer and expected committer on this 
patch and have updated it to Returned with Feedback.  That's not 
saying work is going to stop on it.  It just looks like that is going to 
extend beyond when we want this CommitFest to finish, and I want to pull 
it off the list of things I'm monitoring as part of that.  Everyone 
should keep hammering away at nailing this fundamental bit down, so that 
the rest of the partitioning patch ideas floating around finally have a 
firm place to start attaching to.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-12-06 Thread Itagaki Takahiro

Greg Smith g...@2ndquadrant.com wrote:

 I've marked Simon as the next reviewer and expected committer on this 
 patch and have updated it to Returned with Feedback.

OK. I'll re-submit improved patches in the next commit fest.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-12-04 Thread Simon Riggs
On Fri, 2009-12-04 at 11:54 +0900, Itagaki Takahiro wrote:
 Here is an update partitioning syntax patch.
 
 A bug reported by Marko is fixed.

I will review and eventually commit this, if appropriate, though it is
3rd in my queue and will probably not be done for at least 2 weeks,
possibly 4 weeks.

Some brief review comments

* SQL:2008 contains PARTITION BY syntax, though in completely different
context. A possible alternative would be to use PARTITIONED BY. Please
justify either way. Possibly add short section to docs to explain this.

* There are relatively few comments in-line. Please can you provide a
README section for the code that explains how partitioning works? A
reviewer's guide would also be helpful to explain some of the design
decisions in particular places.

* All of the tests use about 4 partitions, which is the kind of thing
that makes me think the test coverage isn't wide enough. More tests
please. This would include operations on 0?, 1 and many partitions. We
also need more test failures, covering all the dumbass things people
will attempt. Also need partitioning by strange datatypes, arrays,
timestamps with timezones and stupidly long list values. Read Rob
Treat's humorous dissection of earlier partitioning features at PGcon to
see what needs to be covered.

* Docs. This is looking fairly solid, so please begin working on docs. I
won't hold you to this in next few weeks, but we know it needs doing.

* It is essential that we have large real-world(ish) performance test
results that proves this patch will work in the areas for which it is
intended. We need a test with 500 partitions, using at least 10MB
partitions to see if there are any scale-related issues. This test case
will help set targets for later developments because it will highlight
performance issues in planning, DDL and other areas. This doesn't have
to be by the patch author, but we need to see evidence that this patch
operates in its primary use case.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-24 Thread Emmanuel Cecchet

Hi,

Sorry for commenting only now but I think that we need to be able to 
store the partitions in different tablespaces. Even if originally the 
create table creates all partitions in the same tablespace, individual 
partitions should be allowed to be moved in different tablespaces using 
alter table or alter partition.  I think that other databases allows the 
user to define a tablespace for each partition in the create table 
statement.
In a warehouse, you might want to split your partitions on different 
volumes and over time, move older partitions to storage with higher 
compression if that data is not to be accessed frequently anymore. 
Altering tablespaces for partitions is important in that context.


Are you also planning to provide partitioning extensions to 'create 
table as'?


Thanks
Emmanuel


Here is a WIP partitioning patch. The new syntax are:
  1. CREATE TABLE parent (...);
  2. ALTER TABLE parent PARTITION BY { RANGE | LIST } ( key );
  3. CREATE TABLE child (...);
  4. ALTER TABLE child INHERIT parent AS PARTITION VALUES ...;

We can also use CREATE TABLE PARTITION BY as 1+2+3+4 and
CREATE PARTITION as 3+4. I think INHERIT AS PARTITION is rarely
used typically, but such orthogonality seems to be cleaner.

The most complex logic of the patch is in ATExecAddInherit(). It scans
existing partitions and generate CHECK constraint for the new partition.

Any comments to the design?  If no objections, I'd like to stop adding
features in this CommitFest and go for remaining auxiliary works
-- pg_dump, object dependency checking, documentation, etc.

  

-
 Catalog changes
-


In addition to pg_partition, I added pg_inherits.inhvalues field.
The type of field is anyarray and store partition values.
For range partition, an upper bound value is stored in the array.
For list partition, list values are stored in it. These separated
value fields will be useful to implement partition triggers in the
future. In contrast, reverse engineering of check constraints is messy.

CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS
{
Oid inhrelid;
Oid inhparent;
int4inhseqno;
anyarrayinhvalues;  /* values for partition */
} FormData_pg_inherits;

  

CREATE TABLE pg_partition (
partrelid oid REFERENCES oid ON pg_class,-- partitioned table oid
partopr   oid REFERENCES oid ON pg_operator, -- operator to compare keys
partkind  char, -- kind of partition: 'R' (range) or 'L' (list)
partkey   text,   -- expression tree of partition key
PRIMARY KEY (partrelid)
) WITHOUT OIDS;



--
 Limitations and Restrictions
--
* We can create a new partition as long as partitioning keys
  are not conflicted with existing partitions. Especially,
  we cannot add any partitions if we have overflow partitions
  because a new partition always split the overflow partition.

* We cannot reuse an existing check constraint as a partition
  constraint. ALTER TABLE INHERIT AS PARTITION brings on
  a table scan to add a new CHECK constraint.

* No partition triggers nor planner and executor improvements.
  It would come in the future development.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

  




  



--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-24 Thread Itagaki Takahiro

Emmanuel Cecchet m...@asterdata.com wrote:

 I think that other databases allows the 
 user to define a tablespace for each partition in the create table 
 statement.

WITH and TABLESPACE clause are supported for each partition.

 =# CREATE TABLE parent (...) PARTITION BY (key)
(
  PARTITION child_1 VALUES LESS THAN 10 WITH (...) TABLESPACE tbs_1
);
 =# CREATE PARTITION child_2 ON parent
VALUES LESS THAN 20 WITH (...) TABLESPACE tbl_2;

 Are you also planning to provide partitioning extensions to 'create 
 table as'?

Ah, I forgot that. It would be possible to have the feature.
There are no syntax issues. But it would be done after we support
automatic INSERT routing. We can create the table will partitions,
but tuples are not divided into child partitions because we have
no insert-triggers at the time of CREATE TABLE AS.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-24 Thread Greg Smith
I just made a few updates to 
http://wiki.postgresql.org/wiki/Table_partitioning , merging in the 
stuff that had been on the ToDo page and expanding the links to 
discussion on this list a bit.  The number of submitted patches over the 
last couple of years that handle some subset of the desired feature set 
here is really remarkable when you see them all together.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-24 Thread Emmanuel Cecchet

Greg Smith wrote:
I just made a few updates to 
http://wiki.postgresql.org/wiki/Table_partitioning , merging in the 
stuff that had been on the ToDo page and expanding the links to 
discussion on this list a bit.  The number of submitted patches over 
the last couple of years that handle some subset of the desired 
feature set here is really remarkable when you see them all together.


Should we add the 'WITH (...) TABLESPACE tbs' options to the syntax 
since they are supported?

Do we support ALTER ... SET TABLESPACE?

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-24 Thread Itagaki Takahiro

Emmanuel Cecchet m...@asterdata.com wrote:

 Should we add the 'WITH (...) TABLESPACE tbs' options to the syntax 
 since they are supported?

Added the description.

 Do we support ALTER ... SET TABLESPACE?

DROP/ALTER PARTITION are synonyms for DROP/ALTER TABLE.
SET TABLESPACE is also supported. Added the description.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-20 Thread Robert Haas
On Fri, Nov 20, 2009 at 2:08 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2009-11-19 at 10:53 -0500, Robert Haas wrote:
 On Thu, Nov 19, 2009 at 9:58 AM, Markus Wanner mar...@bluegap.ch wrote:
  Hi,
 
  Robert Haas wrote:
 
  Settling on a syntax, and an internal representation for that syntax,
 
  I've been under the impression that this was only about syntax. What are 
  the
  internal additions?

 I haven't looked at it in detail, but it adds a new pg_partition
 table.  Whether that table is suitably structured for use by the
 optimizer is not clear to me.

 If it does, then my review comments to Kedar still apply:

 * why do we want another catalog table? what's wrong with pg_inherits?
 It might need additional columns, and it certainly needs another index.

That might work, I haven't looked at it enough to be sure one way or the other.

 * We need an internal data structure (discussed on this thread also).
 Leaving stuff in various catalog tables would not be the same thing at
 all.

Ultimately I'm guessing that for query optimization we'll need to
include the relevant info in the relcache.  But I think that can wait
until we're ready to actually make the optimizer changes - not much
point in caching data that is never used.  Right now I think it's
enough to verify (which I haven't) that the schema of the catalog
table is suitable for straightforward construction of the data that
will eventually need to be cached.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-19 Thread Markus Wanner

Hi,

Robert Haas wrote:
Settling on a syntax, and an internal representation for that syntax, 


I've been under the impression that this was only about syntax. What are 
the internal additions?


Generally speaking, I'd agree with Simon or even vote for doing the 
internals first and add the syntactic sugar only later on.



seems like it will make subsequent
discussions about those projects considerably more straightforward,


..or subsequent implementations more complicated, because you have to 
support an awkward syntax.



and it has some value in and of itself since similar notation is used
by other databases.


That point is well taken, but it would be more compelling if it were the 
same or at least a compatible syntax.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-19 Thread Robert Haas
On Thu, Nov 19, 2009 at 9:58 AM, Markus Wanner mar...@bluegap.ch wrote:
 Hi,

 Robert Haas wrote:

 Settling on a syntax, and an internal representation for that syntax,

 I've been under the impression that this was only about syntax. What are the
 internal additions?

I haven't looked at it in detail, but it adds a new pg_partition
table.  Whether that table is suitably structured for use by the
optimizer is not clear to me.

 Generally speaking, I'd agree with Simon or even vote for doing the
 internals first and add the syntactic sugar only later on.

That's not really possible in this case.  The internals consist of
taking advantage of the fact that we have explicit knowledge of how
the partitions are defined vs. just relying on the (slow) constraint
exclusion logic.  We can't do that unless, in fact, we have that
explicit knowledge, and that requires inventing syntax.

 That point is well taken, but it would be more compelling if it were the
 same or at least a compatible syntax.

There's been an effort to make it close, but I haven't followed it in
enough detail to know how close.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-19 Thread Peter Eisentraut
On ons, 2009-11-18 at 13:52 +0900, Itagaki Takahiro wrote:
  partinfo = (PartitionInfo *) malloc(ntups * sizeof(PartitionInfo));
 
 Oops, it should be palloc. Thanks.

A very low-level comment:

1) Please stop casting the results of palloc and malloc.  We are not
writing C++ here.

2) I would prefer that you apply sizeof on the variable, not on the
type.  That way, the expression is independent of any type changes of
the variable, and can be reviewed without having to scroll around for
the variable definition.

So how about,

partinfo = palloc(ntups * sizeof(*partinfo));


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-19 Thread Nikhil Sontakke
Hi,

  partinfo = (PartitionInfo *) malloc(ntups * sizeof(PartitionInfo));

 1) Please stop casting the results of palloc and malloc.  We are not
 writing C++ here.


I thought it was/is a good C programming practice to typecast (void *)
always to the returning structure type!!

Regards,
Nikhils

 2) I would prefer that you apply sizeof on the variable, not on the
 type.  That way, the expression is independent of any type changes of
 the variable, and can be reviewed without having to scroll around for
 the variable definition.

 So how about,

 partinfo = palloc(ntups * sizeof(*partinfo));


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-19 Thread Peter Eisentraut
On fre, 2009-11-20 at 11:14 +0530, Nikhil Sontakke wrote:
 Hi,
 
   partinfo = (PartitionInfo *) malloc(ntups * sizeof(PartitionInfo));
 
  1) Please stop casting the results of palloc and malloc.  We are not
  writing C++ here.
 
 
 I thought it was/is a good C programming practice to typecast (void *)
 always to the returning structure type!!

This could be preferable if you use sizeof on the type, so that you have
an additional check that the receiving variable actually has that type.
But if you use sizeof on the variable itself, it's unnecessary: You just
declare the variable to be of some type earlier, and then the expression
allocates ntups of it, without having to repeat the type information.

 
 Regards,
 Nikhils
 
  2) I would prefer that you apply sizeof on the variable, not on the
  type.  That way, the expression is independent of any type changes of
  the variable, and can be reviewed without having to scroll around for
  the variable definition.
 
  So how about,
 
  partinfo = palloc(ntups * sizeof(*partinfo));
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 
 
 -- 
 http://www.enterprisedb.com
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-19 Thread Tom Lane
Nikhil Sontakke nikhil.sonta...@enterprisedb.com writes:
 partinfo = (PartitionInfo *) malloc(ntups * sizeof(PartitionInfo));
 
 1) Please stop casting the results of palloc and malloc.  We are not
 writing C++ here.

 I thought it was/is a good C programming practice to typecast (void *)
 always to the returning structure type!!

Yes.  The above is good style because it ensures that the variable
you're assigning the pointer to is the right type to match the sizeof
computation.  In C++ you'd use operator new instead and still have that
type-check without the cast, but indeed we are not writing C++ here.

The *real* bug in the quoted code is that it's using malloc.  There are
a few places in PG where it's appropriate to use malloc not palloc, but
pretty darn few.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-19 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 2) I would prefer that you apply sizeof on the variable, not on the
 type.  That way, the expression is independent of any type changes of
 the variable, and can be reviewed without having to scroll around for
 the variable definition.

FWIW, I think the general project style has been the other way.
Yes, it means you write the type name three times not once, but
the other side of that coin is that it makes it more obvious what
is happening (and gives you an extra chance to realize that the
type you wrote is wrong ...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-19 Thread Simon Riggs
On Thu, 2009-11-19 at 10:53 -0500, Robert Haas wrote:
 On Thu, Nov 19, 2009 at 9:58 AM, Markus Wanner mar...@bluegap.ch wrote:
  Hi,
 
  Robert Haas wrote:
 
  Settling on a syntax, and an internal representation for that syntax,
 
  I've been under the impression that this was only about syntax. What are the
  internal additions?
 
 I haven't looked at it in detail, but it adds a new pg_partition
 table.  Whether that table is suitably structured for use by the
 optimizer is not clear to me.

If it does, then my review comments to Kedar still apply: 

* why do we want another catalog table? what's wrong with pg_inherits?
It might need additional columns, and it certainly needs another index.

* We need an internal data structure (discussed on this thread also).
Leaving stuff in various catalog tables would not be the same thing at
all.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-18 Thread Simon Riggs
On Wed, 2009-11-18 at 13:24 +0900, Itagaki Takahiro wrote:
 Simon Riggs si...@2ndquadrant.com wrote:
 
  Why not just wait until we have a whole patch and then apply?
 
 A whole patch can be written by many contributers instead of only
 one person, no?  I think we need to split works for partitioning
 into serveral parts to encourage developing it. I just did one of
 the parts, syntax. Anothe patch Partitioning option for COPY
 will do a good job in the field of INSERT. 

If we can agree the parts that are required, I would at least be
confident that we have understood this enough to allow one part to
proceed ahead of the others.

For partitioning the parts are these

1. Syntax for explicit partitioning
2. Internal data representations
3. Optimizations
   many and various
4. Data Routing
   a) Data routing on INSERT/COPY
   b) UPDATE handling when the UPDATE causes partition migration

If this patch puts forward a solution for (2) also, then it is
potentially worthwhile. That is the real blocking point here. Once we
have that other people will quickly fill in the later parts.

I foresee a data structure that is a sorted list of boundary-values,
cached on the parent-relation. This should be accessible to allow
bsearch of particular values during both planning and execution. Same
rules apply as btree operator classes. For multi-level hierarchies the
parent level should have the union of all sub-hierarchies. I think we
need an index on pg_inherits also.

So please do (1) and (2), not just (1) in isolation.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Marko Tiikkaja

Hi,

I'm reviewing your patch.  The patch applies without problems and the
feature works as advertised.  I have yet to look at the code in detail,
but it looks sane and seems to work.  However, this looks like a mistake:

partinfo = (PartitionInfo *) malloc(ntups * sizeof(PartitionInfo));

or am I missing something?


The syntax itself seems a bit weird in some cases.  Say you have:
PARTITION BY RANGE ( foo USING  )
(
PARTITION bar VALUES LESS THAN 0
);

which translates to CHECK (bar  0).  That doesn't sound at all like
LESS THAN to me.  This syntax seems to be the same Oracle uses, and I
think it's nice for the general case, but I think the reversed operator
weirdness is a bit too much.  Maybe we should use something like

PARTITION bar VALUES OPERATOR 0

when the user specifies the operator?


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Jaime Casanova
On Thu, Nov 12, 2009 at 5:54 AM, Itagaki Takahiro
itagaki.takah...@oss.ntt.co.jp wrote:
 I added psql and pg_dump support to Partitioning Syntax patch.
 Paritioning information can be retrieved with a new system function
 pg_get_partitiondef(parentRelid). Both psql and pg_dump use it.


i haven't seen the patch but:


  * If a table with the same name already exists when a partition
    is created, the table is re-used as partition. This behavior
    is required for pg_dump to be simple.


i guess the table must be empty, if not we should be throw an error...
and i actually prefer some more explicit syntax for this not just
reusing a table

      PARTITION name VALUES LESS THAN { range_upper | MAXVALUE }
    | PARTITION name VALUES IN ( list_value [,...] | DEFAULT )


i remember someone making a comment about actually using operators
instead of LESS THEN and family

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Marko Tiikkaja

Jaime Casanova wrote:

 * If a table with the same name already exists when a partition
   is created, the table is re-used as partition. This behavior
   is required for pg_dump to be simple.



i guess the table must be empty, if not we should be throw an error...
and i actually prefer some more explicit syntax for this not just
reusing a table


I'd be OK with only a notification - even if the table wasn't empty -,
similar to how inheritance combines rows currently.  The patch currently
silently reuses the table unless it has rows which don't satisfy the
CHECK constraint, in which case it gives you the default CHECK
constraint error.


 PARTITION name VALUES LESS THAN { range_upper | MAXVALUE }
   | PARTITION name VALUES IN ( list_value [,...] | DEFAULT )



i remember someone making a comment about actually using operators
instead of LESS THEN and family


That doesn't sound like a bad idea..


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Simon Riggs
On Thu, 2009-10-29 at 11:15 +0900, Itagaki Takahiro wrote:

 I think syntax support is a good start.

I don't see a syntax-only patch as being any use at all to this
community.

We go to enormous lengths in other areas to never allow patches with
restrictions. Why would we allow a patch that is essentially 100%
restriction? i.e. It does nothing at all. Worse than that, it will
encourage people to believe it exists in full, when that isn't the case.

The syntax has never really been in question, so it doesn't really move
us forwards in any direction. This is exactly the kind of shallow
feature we have always shied away from and that other databases have
encouraged.

The only reason I can see is that it allows people to develop non-open
source code that matches how Postgres will work when we get our act
together. That seems likely to discourage, rather than encourage the
funding of this work for open source. It may even muddy the water for
people that don't understand that the real magic happens in the
internals, not in the syntax.

Why not just wait until we have a whole patch and then apply?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Robert Haas
On Tue, Nov 17, 2009 at 4:31 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2009-10-29 at 11:15 +0900, Itagaki Takahiro wrote:

 I think syntax support is a good start.

 I don't see a syntax-only patch as being any use at all to this
 community.

 We go to enormous lengths in other areas to never allow patches with
 restrictions. Why would we allow a patch that is essentially 100%
 restriction? i.e. It does nothing at all. Worse than that, it will
 encourage people to believe it exists in full, when that isn't the case.

 The syntax has never really been in question, so it doesn't really move
 us forwards in any direction. This is exactly the kind of shallow
 feature we have always shied away from and that other databases have
 encouraged.

 The only reason I can see is that it allows people to develop non-open
 source code that matches how Postgres will work when we get our act
 together. That seems likely to discourage, rather than encourage the
 funding of this work for open source. It may even muddy the water for
 people that don't understand that the real magic happens in the
 internals, not in the syntax.

 Why not just wait until we have a whole patch and then apply?

Because big patches are really hard to get applied.  Personally, I
think a syntax-only patch makes a lot of sense, as long as the design
is carefully thought about so that it can serve as a foundation for
future work in this area.  I don't think the whole patch is even
necessarily a well-defined concept in this instance: different people
could have very different ideas about what would constitute a complete
solution, or which aspects of a complete solution are most important
or should be pursued first.  Settling on a syntax, and an internal
representation for that syntax, seems like it will make subsequent
discussions about those projects considerably more straightforward,
and it has some value in and of itself since similar notation is used
by other databases.

At least, that's MHO.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Itagaki Takahiro

Marko Tiikkaja marko.tiikk...@cs.helsinki.fi wrote:

 Jaime Casanova wrote:
   PARTITION name VALUES LESS THAN { range_upper | MAXVALUE }
 | PARTITION name VALUES IN ( list_value [,...] | DEFAULT )
  
  i remember someone making a comment about actually using operators
  instead of LESS THEN and family
 
 That doesn't sound like a bad idea..

I prefer to use widely-used syntax instead of postgres original one.
Oracle and MySQL already use LESS THAN and IN for partitioning.
I assume almost all user only use the default operators.
I don't want to break de facto standard for small utilization area.

I think truly what we want is a new partition kind in addition to
RANGE and LIST. If we want to split geometric data into paritions,
we need to treat the the partition key with gist-list operation.
I agree with a plan to add some additional parition kinds,
but want to keep RANGE and LIST partitions in the current syntax.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Itagaki Takahiro

Simon Riggs si...@2ndquadrant.com wrote:

 Why not just wait until we have a whole patch and then apply?

A whole patch can be written by many contributers instead of only
one person, no?  I think we need to split works for partitioning
into serveral parts to encourage developing it. I just did one of
the parts, syntax. Anothe patch Partitioning option for COPY
will do a good job in the field of INSERT. 

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Itagaki Takahiro

Jaime Casanova jcasa...@systemguards.com.ec wrote:

  * If a table with the same name already exists when a partition
is created, the table is re-used as partition. This behavior
is required for pg_dump to be simple.
 
 i guess the table must be empty, if not we should be throw an error...
 and i actually prefer some more explicit syntax for this not just
 reusing a table

Yeah, an explicit syntax is better.
I've researched other syntax, but I cannot find any good ones.

 * ALTER TABLE child INHERIT parent AS PARTITION
= implemenation PARTITION is an INHERIT is revealed to user.
 * ALTER PARTITION child ATTACH TO parent
= child is not a partition yet at that point.
 * ALTER TABLE parent ADD PARTITION child
= partition need to be a full-reserved word.

Are there better idea?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-17 Thread Itagaki Takahiro
Marko Tiikkaja marko.tiikk...@cs.helsinki.fi wrote:

 this looks like a mistake:
 partinfo = (PartitionInfo *) malloc(ntups * sizeof(PartitionInfo));

Oops, it should be palloc. Thanks.

 Maybe we should use something like
 PARTITION bar VALUES OPERATOR 0
 when the user specifies the operator?

I think we could have reasonable restrictions to the operator
for future optimization. Is the VALUES OPERATOR syntax too freedom?

For the same reason, USING operator also might be too freedom.
RANGE (and maybe also LIST) partition keys should be sortable,
operator class name might be better to the option instead of
any operators. i.e.,
  PARTITION BY RANGE ( foo [ USING operator ] )
should be: 
  PARTITION BY RANGE ( foo [ btree_ops_name ] )

If we do so, there will be no inconsistency in LESS THAN syntax
because btree_ops always have  operator.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-01 Thread Itagaki Takahiro

Devrim GNDZ dev...@gunduz.org wrote:

 Is this the same as / similar to Oracle's syntax? 

Yes.

 IIRC Nikhil's patch was Oracle's syntax

No. See:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_7002.htm#i2125922

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-11-01 Thread Nikhil Sontakke
Hi,

 Is this the same as / similar to Oracle's syntax?

 Yes.

 IIRC Nikhil's patch was Oracle's syntax

 No. See:
 http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_7002.htm#i2125922


Any decent prevalent partitioning syntax should be ok IMHO. IIRC,
MySQL paritioning syntax is also pretty similar to Oracle's.

Regards,
Nikhils
-- 
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Peter Eisentraut
On Thu, 2009-10-29 at 15:19 -0700, Jeff Davis wrote:
 I can't help but wonder if the PERIOD type might be better for
 representing a partition range. It would make it easier to express and
 enforce the constraint that no two partition ranges overlap ;)

I can't help but wonder if the period type might better be a generic
container for pairs of scalar, totally-ordered types.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Devrim GÜNDÜZ
On Thu, 2009-10-29 at 11:15 +0900, Itagaki Takahiro wrote:
 I'd like to improve partitioning feature in 8.5.

Nice.

 Here is syntax I propose:

snip

Is this the same as / similar to Oracle's syntax? 

IIRC Nikhil's patch was Oracle's syntax, and I prefer having that one
instead of inventing our own wheel.

Regards,
-- 
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com 
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Jeff Davis
On Fri, 2009-10-30 at 10:03 +0200, Peter Eisentraut wrote:
 I can't help but wonder if the period type might better be a generic
 container for pairs of scalar, totally-ordered types.

That would be ideal. However, it doesn't really look like our type
system was built to handle that kind of thing.

We could use typmod, I suppose, but even that won't hold a full Oid. Any
ideas/suggestions?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread pg
  PARTITION BY RANGE ( a_expr )
 ...
 PARTITION BY HASH ( a_expr )
 PARTITIONS num_partitions;

 Unless someone comes up with a maintenance plan for stable hashfunctions, we 
 should probably not dare look into this yet.

What would cover the common use case of per-day quals and drops over an 
extended history period, say six or nine months? You don't get quite the same 
locality of reference, generally, with an unpartitioned table, due to slop in 
the arrival of rows. Ideally, you don't want to depend on an administrator, or 
even an administrative script, to continually intervene in the structure of a 
table, as would be the case with partitioning by range, and you don't want to 
coalesce multiple dates, as an arbitrary hash might do. What the administrator 
would want would be to decide what rows were too old to keep, then process 
(e.g. archive, summarize, filter) and delete them.

Suppose that the number of partitions were taken as a hint rather than as a 
naming modulus, and that any quasi-hash function had to be specified explicitly 
(although storage assignment could be based on a hash of the quasi-hash 
output). If a_expr were allowed to include a to-date conversion of a timestamp, 
day-by-day partitioning would fall out naturally. If, in addition, 
single-parameter (?) functions were characterized as range-preserving and 
order-preserving, plan generation could be improved for time ranges on 
quasi-hash-partitioned tables, without a formal indexing requirement.

There are cases where additional partition dimensions would be useful, for 
eventual parallelized operation on large databases, and randomizing quasi-hash 
functions would help. IMHO stability is not needed, except to the extent that 
hash functions have properties that lend themselves to plan generation and/or 
table maintenance.

It is not clear to me what purpose there would be in dropping a partition. This 
would be tantamount to deleting all of the rows in a partition, if it were 
analogous to dropping a table, and would require some sort of compensatory 
aggregation of existing partitions (in effect, a second partitioning 
dimension), if it were merely structural.

Perhaps I'm missing something here.

David Hudson





Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Heikki Linnakangas
Jeff Davis wrote:
 On Fri, 2009-10-30 at 10:03 +0200, Peter Eisentraut wrote:
 I can't help but wonder if the period type might better be a generic
 container for pairs of scalar, totally-ordered types.
 
 That would be ideal. However, it doesn't really look like our type
 system was built to handle that kind of thing.
 
 We could use typmod, I suppose, but even that won't hold a full Oid. Any
 ideas/suggestions?

Wait, it doesn't? A typmod is a 32-bit integer, like Oids. Am I missing
something?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Jeff Davis
On Fri, 2009-10-30 at 19:12 +0200, Heikki Linnakangas wrote:
 Wait, it doesn't? A typmod is a 32-bit integer, like Oids. Am I missing
 something?

Oid is unsigned, typmod is signed. We might be able to get away with it,
but -1 is treated specially in some places outside of the type-specific
functions, e.g. exprTypmod().

I haven't looked at all of these places yet, so maybe a few simple
changes would allow us to treat typmod as a full 32 bits. Or perhaps it
could just be expanded to a signed 64-bit int. What do you think?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Stephen Frost
* Jeff Davis (pg...@j-davis.com) wrote:
 Oid is unsigned, typmod is signed. We might be able to get away with it,
 but -1 is treated specially in some places outside of the type-specific
 functions, e.g. exprTypmod().

Any negative result returned from the input handling function is
considered an error, as I recall.  It's more than just '-1'..

 I haven't looked at all of these places yet, so maybe a few simple
 changes would allow us to treat typmod as a full 32 bits. Or perhaps it
 could just be expanded to a signed 64-bit int. What do you think?

That was shot down previously due to the way typmods are passed around
currently..  Not that it wouldn't be really nice..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 I haven't looked at all of these places yet, so maybe a few simple
 changes would allow us to treat typmod as a full 32 bits. Or perhaps it
 could just be expanded to a signed 64-bit int. What do you think?

Neither is likely to happen, and even disregarding that, I doubt people
would be real happy with a design like this.  Where are you going to
put the typmod for the contained type?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Robert Haas
On Fri, Oct 30, 2009 at 5:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Davis pg...@j-davis.com writes:
 I haven't looked at all of these places yet, so maybe a few simple
 changes would allow us to treat typmod as a full 32 bits. Or perhaps it
 could just be expanded to a signed 64-bit int. What do you think?

 Neither is likely to happen, and even disregarding that, I doubt people
 would be real happy with a design like this.  Where are you going to
 put the typmod for the contained type?

IMO, the real problem is that the type interface is poorly
encapsulated.  There's way too much code that knows about the internal
details of a type - namely, that it's a 32-bit integer modified by a
second 32-bit integer.  I think there are still places where the code
doesn't even know about typmod.  If we're going to go to the trouble
of changing anything, I think it should probably involve inserting an
abstraction layer that will make future extensions easier.  But I have
a feeling that's going to be a tough sell.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Jeff Davis
On Fri, 2009-10-30 at 17:39 -0400, Robert Haas wrote:
 IMO, the real problem is that the type interface is poorly
 encapsulated.  There's way too much code that knows about the internal
 details of a type - namely, that it's a 32-bit integer modified by a
 second 32-bit integer.  I think there are still places where the code
 doesn't even know about typmod.  If we're going to go to the trouble
 of changing anything, I think it should probably involve inserting an
 abstraction layer that will make future extensions easier.  But I have
 a feeling that's going to be a tough sell.

Yeah. We're way off topic for partitioning, so I think it's best to just
table this discussion until someone comes up with a good idea.

It's not the end of the world to write some generic C code, and have
multiple types make use of it, e.g. PERIOD, PERIODTZ, INT4RANGE,
FLOAT8RANGE, etc. It's a little redundant and creates some catalog
bloat, but I'm not too concerned about it right now. Certainly not
enough to rewrite the type system.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 IMO, the real problem is that the type interface is poorly
 encapsulated.  There's way too much code that knows about the internal
 details of a type - namely, that it's a 32-bit integer modified by a
 second 32-bit integer.  I think there are still places where the code
 doesn't even know about typmod.  If we're going to go to the trouble
 of changing anything, I think it should probably involve inserting an
 abstraction layer that will make future extensions easier.  But I have
 a feeling that's going to be a tough sell.

Yup, you're right.  It would be an enormous amount of work and break a
lot of third-party code, for largely hypothetical future benefits.
We've got better places to invest our limited manpower.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Nikhil Sontakke
Hi,

 So, I'll take over the work if there are no ones to do it.
 I'm thinking to add syntax support first. Table partitioning was
 proposed many times, but it is still not applied into core.
 The reason is it is too difficult to make perfect partitioning
 feature at once. I think syntax support is a good start.

Guess we are back to square one again on Partitioning :), but as long
as someone is willing to walk the whole nine yards with it, that would
be just great!

I had proposed Oracle style syntax a while back and had also submitted
a WIP patch then. Again then my motive was to move forward in a
piece-meal fashion on this feature. First solidify the syntax, keep
using the existing inheritance mechanism and go one step at a time. I
think a feature like Partitioning needs this kind of an approach,
because it might turn out to be a lot of work with a lot of very many
sub items.

So +1 on solidifying the syntax first and then sorting out the other
minute, intricate details later..

Regards,
Nikhils


 First, I will add syntax for CREATE TABLE, ALTER TABLE ADD/DROP PARTITION.
 The syntax is borrowed from from Oracle and MySQL. Their characteristics
 are using LESS THAN in range partitioning. The keyword PARTITION is
 added to the full-reserved keyword list to support ADD/DROP PARTITION.

 Those syntax is merely a syntax sugar for INHERITS with CHECK. Declarations
 are translated into CHECK constraints. I have a plan to adjust pg_dump to
 dump definitions of partitioning in the correct format, but the actual
 implementation will be still based on constraint exclusion. In addition,
 hash partitioning is not implemented; syntax is parsed but not implemented
 error are raised for now.

 Here is syntax I propose:
 
 ALTER TABLE table_name ADD PARTITION name ...;
 ALTER TABLE table_name DROP PARTITION [IF EXISTS] name [CASCADE | RESTRICT];

 Range partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY RANGE ( a_expr )
    (
      PARTITION name VALUES LESS THAN [(] const [)],
      PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition
    );

 List partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY LIST ( a_expr )
    (
      PARTITION name VALUES [IN] ( const [, ...] ),
      PARTITION name VALUES [IN] [(] DEFAULT [)]       -- overflow partition
    );

 Hash partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY HASH ( a_expr )
    PARTITIONS num_partitions;

  CREATE TABLE table_name ( columns )
    PARTITION BY HASH ( a_expr )
    (
      PARTITION name,
      ...
    );

 Note:
  * Each partition can have optional WITH (...) and TABLESPACE clauses.
  * '(' and ')' are optional to support both Oracle and MySQL syntax.
 

 Comments welcome.

 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Grzegorz Jaskiewicz


On 29 Oct 2009, at 02:15, Itagaki Takahiro wrote:


I'd like to improve partitioning feature in 8.5.
Kedar-san's previous work is wonderful, but I cannot see any updated  
patch.

http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com

So, I'll take over the work if there are no ones to do it.
I'm thinking to add syntax support first. Table partitioning was
proposed many times, but it is still not applied into core.
The reason is it is too difficult to make perfect partitioning
feature at once. I think syntax support is a good start.

First, I will add syntax for CREATE TABLE, ALTER TABLE ADD/DROP  
PARTITION.
The syntax is borrowed from from Oracle and MySQL. Their  
characteristics
are using LESS THAN in range partitioning. The keyword PARTITION  
is

added to the full-reserved keyword list to support ADD/DROP PARTITION.

Those syntax is merely a syntax sugar for INHERITS with CHECK.  
Declarations
are translated into CHECK constraints. I have a plan to adjust  
pg_dump to

dump definitions of partitioning in the correct format, but the actual
implementation will be still based on constraint exclusion. In  
addition,
hash partitioning is not implemented; syntax is parsed but not  
implemented

error are raised for now.

Here is syntax I propose:

ALTER TABLE table_name ADD PARTITION name ...;
ALTER TABLE table_name DROP PARTITION [IF EXISTS] name [CASCADE |  
RESTRICT];


Range partitioning:
 CREATE TABLE table_name ( columns )
   PARTITION BY RANGE ( a_expr )
   (
 PARTITION name VALUES LESS THAN [(] const [)],
 PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow  
partition

   );

List partitioning:
 CREATE TABLE table_name ( columns )
   PARTITION BY LIST ( a_expr )
   (
 PARTITION name VALUES [IN] ( const [, ...] ),
 PARTITION name VALUES [IN] [(] DEFAULT [)]   -- overflow  
partition

   );

Hash partitioning:
 CREATE TABLE table_name ( columns )
   PARTITION BY HASH ( a_expr )
   PARTITIONS num_partitions;

 CREATE TABLE table_name ( columns )
   PARTITION BY HASH ( a_expr )
   (
 PARTITION name,
 ...
   );

Note:
 * Each partition can have optional WITH (...) and TABLESPACE clauses.
 * '(' and ')' are optional to support both Oracle and MySQL syntax.


Comments welcome.


+1000

Thanks !

(most anticipated feature for 8.5, here, next to replication [well, I  
am interested in multi master, but that's not going to happen :P ] )



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Heikki Linnakangas
Itagaki Takahiro wrote:
 The keyword PARTITION is
 added to the full-reserved keyword list to support ADD/DROP PARTITION.

Any chance to avoid that? PARTITION seems like something people might
well use as a column or variable name. OTOH, it is reserved in SQL2008
and SQL2003.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Greg Stark
On Thu, Oct 29, 2009 at 3:35 AM, Nikhil Sontakke
nikhil.sonta...@enterprisedb.com wrote:
 So +1 on solidifying the syntax first and then sorting out the other
 minute, intricate details later..

I like that idea as well but I have a concern. What will we do with
pg_dump. If the PARTITION commands are just syntactic sugar for
creating constraints and inherited tables then pg_dump will have to
generate the more generic commands for those objects. When we
eventually have real partitioning then restoring such a dump will not
create real partitions, just inherited tables. Perhaps we need some
kind of option to reverse-engineer partitioning commands from the
inheritance structure,  but I fear having pg_dump reverse engineer
inherited tables to produce partitioning commands will be too hard and
error-prone. Hopefully that's too pessimistic though, if they were
produced by PARTITION commands they should be pretty regular.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Andres Freund
On Thursday 29 October 2009 18:33:22 Greg Stark wrote:
 On Thu, Oct 29, 2009 at 3:35 AM, Nikhil Sontakke
 
 nikhil.sonta...@enterprisedb.com wrote:
  So +1 on solidifying the syntax first and then sorting out the other
  minute, intricate details later..
 
 I like that idea as well but I have a concern. What will we do with
 pg_dump. If the PARTITION commands are just syntactic sugar for
 creating constraints and inherited tables then pg_dump will have to
 generate the more generic commands for those objects. When we
 eventually have real partitioning then restoring such a dump will not
 create real partitions, just inherited tables. Perhaps we need some
 kind of option to reverse-engineer partitioning commands from the
 inheritance structure,  but I fear having pg_dump reverse engineer
 inherited tables to produce partitioning commands will be too hard and
 error-prone. Hopefully that's too pessimistic though, if they were
 produced by PARTITION commands they should be pretty regular.
One could have a system catalog containing the partitioning information and 
generate the constraints et al. from that and mark them in pg_depend...


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Peter Eisentraut
On tor, 2009-10-29 at 11:15 +0900, Itagaki Takahiro wrote:
 Range partitioning:
   CREATE TABLE table_name ( columns )
 PARTITION BY RANGE ( a_expr )
 (
   PARTITION name VALUES LESS THAN [(] const [)],
   PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition
 );

Maybe this needs to mention the actual operator name instead of LESS
THAN, in case the operator is not named  or the user wants to use a
different one.

 Hash partitioning:
   CREATE TABLE table_name ( columns )
 PARTITION BY HASH ( a_expr )
 PARTITIONS num_partitions;
 
   CREATE TABLE table_name ( columns )
 PARTITION BY HASH ( a_expr )
 (
   PARTITION name,
   ...
 );

Unless someone comes up with a maintenance plan for stable hash
functions, we should probably not dare look into this yet.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Jeff Davis
On Fri, 2009-10-30 at 00:10 +0200, Peter Eisentraut wrote:
 On tor, 2009-10-29 at 11:15 +0900, Itagaki Takahiro wrote:
  Range partitioning:
CREATE TABLE table_name ( columns )
  PARTITION BY RANGE ( a_expr )
  (
PARTITION name VALUES LESS THAN [(] const [)],
PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition
  );
 
 Maybe this needs to mention the actual operator name instead of LESS
 THAN, in case the operator is not named  or the user wants to use a
 different one.

I can't help but wonder if the PERIOD type might be better for
representing a partition range. It would make it easier to express and
enforce the constraint that no two partition ranges overlap ;)

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Itagaki Takahiro

Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:

  The keyword PARTITION is
  added to the full-reserved keyword list to support ADD/DROP PARTITION.
 
 Any chance to avoid that? PARTITION seems like something people might
 well use as a column or variable name. OTOH, it is reserved in SQL2008
 and SQL2003.

CREATE TABLE does not require PARTITION to be a reserved keyword,
but there are conflicts in ALTER TABLE ADD/DROP PARTITION:

  * ALTER TABLE ... DROP [COLUMN]  name [CASCADE | RESTRICT]
  * ALTER TABLE ... DROP PARTITION name [CASCADE | RESTRICT]

There are some solutions:

 1. Change COLUMN not to an optional word (unlikely)
 2. Change syntax of DROP PARTITION to DROP TABLE PARITION or so
 3. Change ALTER TABLE ADD/DROP PARTITION to top level 
  = CREATE/DROP PARTITION name ON table_name

Any better ideas?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Robert Haas
On Thu, Oct 29, 2009 at 9:51 PM, Itagaki Takahiro
itagaki.takah...@oss.ntt.co.jp wrote:

 Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:

  The keyword PARTITION is
  added to the full-reserved keyword list to support ADD/DROP PARTITION.

 Any chance to avoid that? PARTITION seems like something people might
 well use as a column or variable name. OTOH, it is reserved in SQL2008
 and SQL2003.

 CREATE TABLE does not require PARTITION to be a reserved keyword,
 but there are conflicts in ALTER TABLE ADD/DROP PARTITION:

  * ALTER TABLE ... DROP [COLUMN]  name [CASCADE | RESTRICT]
  * ALTER TABLE ... DROP PARTITION name [CASCADE | RESTRICT]

 There are some solutions:

  1. Change COLUMN not to an optional word (unlikely)
  2. Change syntax of DROP PARTITION to DROP TABLE PARITION or so
  3. Change ALTER TABLE ADD/DROP PARTITION to top level
      = CREATE/DROP PARTITION name ON table_name

 Any better ideas?

I'm not sure if this is better, but what about:

CREATE PARTITION name ON TABLE name
DROP PARTITION name

Since partitions will live in pg_class and are in some sense top
level objects, it seems like it would make sense to use a syntax that
is similar to the one we use for indices...  we can't say DROP COLUMN
name, because the table must be specified.  But a partition name must
be unambiguous, so making the user write it out explicitly doesn't
seem friendly.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Itagaki Takahiro

Peter Eisentraut pete...@gmx.net wrote:

 On tor, 2009-10-29 at 11:15 +0900, Itagaki Takahiro wrote:
  Range partitioning:
CREATE TABLE table_name ( columns )
  PARTITION BY RANGE ( a_expr )
  (
PARTITION name VALUES LESS THAN [(] const [)],
PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition
  );
 
 Maybe this needs to mention the actual operator name instead of LESS
 THAN, in case the operator is not named  or the user wants to use a
 different one.

How about to use sortby or index_elem here?

PARTITION BY RANGE '(' sortby-or-index_elem ')' '(' RangePartitions ')'

sortby:
a_expr USING qual_all_Op opt_nulls_order
  | a_expr opt_asc_desc  opt_nulls_order

index_elem:
ColId  opt_class opt_asc_desc opt_nulls_order
  | func_expr  opt_class opt_asc_desc opt_nulls_order
  | '(' a_expr ')' opt_class opt_asc_desc opt_nulls_order

We should allow only btree operator class here because we need to
extract GREATER-THAN-OR-EQUAL operator from LESS THAN. In addition,
we will be able to optimize parition search in the future if we
restrict a range partition key should be comparable scalar value.

Multidimensional partitioning will be implemented with another
approach, something like PARTITION BY GIST, because it would
require different oprimization from range partitioning.
BTW, PARTITION BY pg_am.amname crossed my mind here,
but it is not well-investigated yet.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Greg Stark
On Thu, Oct 29, 2009 at 7:14 PM, Robert Haas robertmh...@gmail.com wrote:
 CREATE TABLE does not require PARTITION to be a reserved keyword,
 but there are conflicts in ALTER TABLE ADD/DROP PARTITION:

  * ALTER TABLE ... DROP [COLUMN]  name [CASCADE | RESTRICT]
  * ALTER TABLE ... DROP PARTITION name [CASCADE | RESTRICT]

 There are some solutions:

Do we need a DROP PARTITION command at all? What would it even do?
Drop the partition from the parent table and throw it away in one
step? I think in actual practice people usually remove the partition
from the parent table first, then do things like archive it before
actually throwing it away.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-29 Thread Itagaki Takahiro

Greg Stark gsst...@mit.edu wrote:

  * ALTER TABLE ... DROP [COLUMN]  name [CASCADE | RESTRICT]
  * ALTER TABLE ... DROP PARTITION name [CASCADE | RESTRICT]
 
 Do we need a DROP PARTITION command at all? What would it even do?

Currently no. So, it would be good to treat PARTITION as
just a synonym of TABLE. Not only DROP PARTITION but also
ALTER PARTITION will work.

 CREATE PARTITION name ON table_name
 DROP PARTITION name
  ALTER PARTITION name ...

We might need to specify partition keys with another syntax.
ALTER TABLE will have only one new command PARTITION BY.
and we reuse TABLE command for PARTITION in other operations.

  ALTER TABLE table_name PARTITION BY RANGE (expr) (...)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syntax for partitioning

2009-10-28 Thread Pavel Stehule
2009/10/29 Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp:
 I'd like to improve partitioning feature in 8.5.
 Kedar-san's previous work is wonderful, but I cannot see any updated patch.
 http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com

 So, I'll take over the work if there are no ones to do it.
 I'm thinking to add syntax support first. Table partitioning was
 proposed many times, but it is still not applied into core.
 The reason is it is too difficult to make perfect partitioning
 feature at once. I think syntax support is a good start.

 First, I will add syntax for CREATE TABLE, ALTER TABLE ADD/DROP PARTITION.
 The syntax is borrowed from from Oracle and MySQL. Their characteristics
 are using LESS THAN in range partitioning. The keyword PARTITION is
 added to the full-reserved keyword list to support ADD/DROP PARTITION.

 Those syntax is merely a syntax sugar for INHERITS with CHECK. Declarations
 are translated into CHECK constraints. I have a plan to adjust pg_dump to
 dump definitions of partitioning in the correct format, but the actual
 implementation will be still based on constraint exclusion. In addition,
 hash partitioning is not implemented; syntax is parsed but not implemented
 error are raised for now.

 Here is syntax I propose:
 
 ALTER TABLE table_name ADD PARTITION name ...;
 ALTER TABLE table_name DROP PARTITION [IF EXISTS] name [CASCADE | RESTRICT];

 Range partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY RANGE ( a_expr )
    (
      PARTITION name VALUES LESS THAN [(] const [)],
      PARTITION name VALUES LESS THAN [(] MAXVALUE [)] -- overflow partition
    );

 List partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY LIST ( a_expr )
    (
      PARTITION name VALUES [IN] ( const [, ...] ),
      PARTITION name VALUES [IN] [(] DEFAULT [)]       -- overflow partition
    );

 Hash partitioning:
  CREATE TABLE table_name ( columns )
    PARTITION BY HASH ( a_expr )
    PARTITIONS num_partitions;

  CREATE TABLE table_name ( columns )
    PARTITION BY HASH ( a_expr )
    (
      PARTITION name,
      ...
    );

 Note:
  * Each partition can have optional WITH (...) and TABLESPACE clauses.
  * '(' and ')' are optional to support both Oracle and MySQL syntax.
 

 Comments welcome.

+1

Pavel


 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers