subject:"Re\: \[HACKERS\] Partitioning option for COPY"

Re: [HACKERS] Partitioning option for COPY

2009-11-26 Thread Emmanuel Cecchet


Hannu Krosing wrote:

After triggers can't change tuple, thus cant change routing.
  
An after trigger can always issue an update of the tuple but that should 
be trapped by the regular mechanism that will deal with updates (when we 
have it available).
Also the description omits the execution of before and after statement 
triggers. While those can apply to the parent table (but the same 
question about what happens if the after statement modifies routing 
decision still applies), what does it mean in the case of COPY to have 
statement triggers on the child tables? 



What statement triggers do you mean ?

I don't think we have ON COPY triggers ? 
  

I mean

CREATE TRIGGER /name/ { BEFORE | AFTER } /event/
   ON /table/ FOR EACH STATEMENT


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Hannu Krosing

On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote:
 Itagaki Takahiro wrote:
  I just edited a wiki page for this discussion.
  I hope it can be a help.
  http://wiki.postgresql.org/wiki/Table_partitioning

 I guess the problem of handling user triggers is still open.
 If we allow triggers on partitions, badly written logic could lead to 
 infinite loops in routing. In the case of COPY, an after statement 
 trigger could change all the routing decisions taken for each row.

A simple update to the row can cause it to move between partitions, no ?

 I am not sure what the semantic should be if you have triggers defined on the 
 parent and child tables. Which triggers do you fire if the insert is on 
 the parent table but the tuple ends up in a child table?

I'd propose that triggers on both parent table and selected child are
executed.

1. first you execute before triggers on parent table, which may 
   change which partition the row belongs to

2.  then you execute before triggers on selected child table

   2.1 if this changes the child table selection repeat from 2.

3. save the tuple in child table

4. execute after triggers of the final selected child table

5. execute after triggers of parent table

order of 4. and 5. is selected arbitrarily, others are determined by
flow.

 If the new implementation hides the child tables, 

If you hide child tables, you suddenly need a lot of new syntax to
unhide them, so that partitions can be manipulated. Currently it is
easy to do it with INHERIT / NO INHERIT.

 it might be safer to 
 not allow triggers on child tables altogether and use the parent table 
 as the single point of entry to access the partition (and define 
 triggers). With the current proposed implementation, would it be 
 possible to define a view using child tables?

the child tables are there, and they _are_ defined, either implicitly
(using constraints, which constraint exclusion resolves to a set of
child tables) or explicitly, using child table names directly.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Emmanuel Cecchet


Hannu Krosing wrote:

On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote:
  

Itagaki Takahiro wrote:


I just edited a wiki page for this discussion.
I hope it can be a help.
http://wiki.postgresql.org/wiki/Table_partitioning
  
  

I guess the problem of handling user triggers is still open.
If we allow triggers on partitions, badly written logic could lead to 
infinite loops in routing. In the case of COPY, an after statement 
trigger could change all the routing decisions taken for each row.



A simple update to the row can cause it to move between partitions, no ?
  

Yes.
I am not sure what the semantic should be if you have triggers defined on the 
parent and child tables. Which triggers do you fire if the insert is on 
the parent table but the tuple ends up in a child table?



I'd propose that triggers on both parent table and selected child are
executed.

1. first you execute before triggers on parent table, which may 
   change which partition the row belongs to


2.  then you execute before triggers on selected child table

   2.1 if this changes the child table selection repeat from 2.

3. save the tuple in child table

4. execute after triggers of the final selected child table
  

What if that trigger changes again the child table selection?

5. execute after triggers of parent table
  
Same here, what if the trigger changes the child table selection. Do we 
re-execute triggers on the new child table?
Also it is debatable whether we should execute an after trigger on a 
table where nothing was really inserted.

order of 4. and 5. is selected arbitrarily, others are determined by
flow.
  
Also the description omits the execution of before and after statement 
triggers. While those can apply to the parent table (but the same 
question about what happens if the after statement modifies routing 
decision still applies), what does it mean in the case of COPY to have 
statement triggers on the child tables? You cannot know in advance where 
the tuples are going to go and fire the before statement triggers. If 
you had to fire after statement triggers, in which order would you fire 
them?
If the new implementation hides the child tables, 



If you hide child tables, you suddenly need a lot of new syntax to
unhide them, so that partitions can be manipulated. Currently it is
easy to do it with INHERIT / NO INHERIT.
  
Agreed, but I think that we will discover some restrictions that will 
apply to child tables.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Robert Haas

On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 I'd propose that triggers on both parent table and selected child are
 executed.

I was thinking we should make the partitioning decision FIRST, before
any triggers are fired, and then fire only those triggers relevant to
the selected partition.  If the BEFORE triggers on the partition
modify the tuple in a way that makes it incompatible with the table
constraints on that partition, the insert (or update) fails.

Firing triggers on more than one table is pretty substantially
incompatible with what we do elsewhere and I'm not clear what we get
in exchange.  What is the use case for this?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Emmanuel Cecchet


Robert Haas wrote:

On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
  

I'd propose that triggers on both parent table and selected child are
executed.



I was thinking we should make the partitioning decision FIRST, before
any triggers are fired, and then fire only those triggers relevant to
the selected partition.  If the BEFORE triggers on the partition
modify the tuple in a way that makes it incompatible with the table
constraints on that partition, the insert (or update) fails.

Firing triggers on more than one table is pretty substantially
incompatible with what we do elsewhere and I'm not clear what we get
in exchange.  What is the use case for this?
  
I don't have a use case for this but I was puzzled with that when I had 
to implement trigger support in COPY with partitioning.
I came to the same conclusion as you and made the operation fail if the 
trigger was trying to move the tuple to another partition. However, I 
had a problem with after row triggers that I had to call synchronously 
to be able to detect the change. We will need something to tell us that 
an after row trigger did not mess with the routing decision.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Robert Haas

On Wed, Nov 25, 2009 at 9:21 AM, Emmanuel Cecchet m...@asterdata.com wrote:
 Robert Haas wrote:
 On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing ha...@2ndquadrant.com
 wrote:

 I'd propose that triggers on both parent table and selected child are
 executed.

 I was thinking we should make the partitioning decision FIRST, before
 any triggers are fired, and then fire only those triggers relevant to
 the selected partition.  If the BEFORE triggers on the partition
 modify the tuple in a way that makes it incompatible with the table
 constraints on that partition, the insert (or update) fails.

 Firing triggers on more than one table is pretty substantially
 incompatible with what we do elsewhere and I'm not clear what we get
 in exchange.  What is the use case for this?

 I don't have a use case for this but I was puzzled with that when I had to
 implement trigger support in COPY with partitioning.
 I came to the same conclusion as you and made the operation fail if the
 trigger was trying to move the tuple to another partition. However, I had a
 problem with after row triggers that I had to call synchronously to be able
 to detect the change. We will need something to tell us that an after row
 trigger did not mess with the routing decision.

*scratches head*

I'm confused.  Only a BEFORE ROW trigger can possibly change
anything...  the return value of an AFTER ROW trigger is ignored.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Tom Lane

It seems like the easiest way to resolve this without weird corner
cases is to say that we fire triggers belonging to the parent table.
The individual partition child tables either shouldn't have triggers
at all, or we should restrict the cases in which those are considered
applicable.

As an example, what are you going to do with statement-level triggers?
Fire them for *every* child whether it receives a row or not?  Doesn't
seem like the right thing.

Again, this solution presupposes an explicit concept of partitioned
tables within the system...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Hannu Krosing

On Wed, 2009-11-25 at 08:39 -0500, Emmanuel Cecchet wrote:
 Hannu Krosing wrote:
  On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote:

  Itagaki Takahiro wrote:
  
  I just edited a wiki page for this discussion.
  I hope it can be a help.
  http://wiki.postgresql.org/wiki/Table_partitioning


  I guess the problem of handling user triggers is still open.
  If we allow triggers on partitions, badly written logic could lead to 
  infinite loops in routing. In the case of COPY, an after statement 
  trigger could change all the routing decisions taken for each row.
  
 
  A simple update to the row can cause it to move between partitions, no ?

 Yes.
  I am not sure what the semantic should be if you have triggers defined on 
  the 
  parent and child tables. Which triggers do you fire if the insert is on 
  the parent table but the tuple ends up in a child table?
  
 
  I'd propose that triggers on both parent table and selected child are
  executed.
 
  1. first you execute before triggers on parent table, which may 
 change which partition the row belongs to
 
  2.  then you execute before triggers on selected child table
 
 2.1 if this changes the child table selection repeat from 2.
 
  3. save the tuple in child table
 
  4. execute after triggers of the final selected child table

 What if that trigger changes again the child table selection?
  5. execute after triggers of parent table

 Same here, what if the trigger changes the child table selection. Do we 
 re-execute triggers on the new child table?

After triggers can't change tuple, thus cant change routing.

 Also it is debatable whether we should execute an after trigger on a 
 table where nothing was really inserted.
  order of 4. and 5. is selected arbitrarily, others are determined by
  flow.

 Also the description omits the execution of before and after statement 
 triggers. While those can apply to the parent table (but the same 
 question about what happens if the after statement modifies routing 
 decision still applies), what does it mean in the case of COPY to have 
 statement triggers on the child tables? 

What statement triggers do you mean ?

I don't think we have ON COPY triggers ? 

 You cannot know in advance where 
 the tuples are going to go and fire the before statement triggers. If 
 you had to fire after statement triggers, in which order would you fire 
 them?
  If the new implementation hides the child tables, 
  
 
  If you hide child tables, you suddenly need a lot of new syntax to
  unhide them, so that partitions can be manipulated. Currently it is
  easy to do it with INHERIT / NO INHERIT.

 Agreed, but I think that we will discover some restrictions that will 
 apply to child tables.

I think we should keep the possibility to populate partitions offline
and then plug then into table as partitions (current INHERIT) and also
to extract partition into separate table (NO INHERIT).



-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Hannu Krosing

On Wed, 2009-11-25 at 11:30 -0500, Tom Lane wrote:
 It seems like the easiest way to resolve this without weird corner
 cases is to say that we fire triggers belonging to the parent table.
 The individual partition child tables either shouldn't have triggers
 at all, or we should restrict the cases in which those are considered
 applicable.

Agreed. maybe allow only ROW-level AFTER triggers (for logging late
arrivals and updates on tables partitioned on time for example )

 As an example, what are you going to do with statement-level triggers?
 Fire them for *every* child whether it receives a row or not?  Doesn't
 seem like the right thing.
 
 Again, this solution presupposes an explicit concept of partitioned
 tables within the system...

For explicit partitioned tables with hidden partitions it is of course
best to not add extra effort for allowing triggers to be defined on
those (hidden) partitions.

If the partition tables are visible, some trigger support would be good.

   regards, tom lane


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-25 Thread Robert Haas

On Wed, Nov 25, 2009 at 11:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It seems like the easiest way to resolve this without weird corner
 cases is to say that we fire triggers belonging to the parent table.
 The individual partition child tables either shouldn't have triggers
 at all, or we should restrict the cases in which those are considered
 applicable.

 As an example, what are you going to do with statement-level triggers?
 Fire them for *every* child whether it receives a row or not?  Doesn't
 seem like the right thing.

Just the tables that get a row?  I don't know, your way may be best,
but it seems like tables on individual partitions might be useful in
some situations.

 Again, this solution presupposes an explicit concept of partitioned
 tables within the system...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-24 Thread Hannu Krosing

On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote:

 But it looks like it is a waste of everybody's time to continue this 
 discussion further. Just move the patch to the rejected patches and 
 let's wait for Itagaki's implementation.

Emmanuel, please try to work together with Itagaki san on getting the
bigger vision implemented, as this is a thing that can benefit a lot
from more people who have taken the time to learn about the parts of
code involved.

Even though this patch will not get in, most of the effort in developing
it is not actual coding, but familiarizing yourself with the other code
involved. 

Coding actual patches should be easy once you know the code _and_ the
desired result. 

You probably already know a lot of what is required to help us to common
goal of a clean implementation of partitioning.

 -- 
 Emmanuel Cecchet
 Aster Data
 Web: http://www.asterdata.com
 
 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-24 Thread Itagaki Takahiro


Hannu Krosing ha...@2ndquadrant.com wrote:

 Even though this patch will not get in, most of the effort in developing
 it is not actual coding, but familiarizing yourself with the other code
 involved. 

I just edited a wiki page for this discussion.
I hope it can be a help.
http://wiki.postgresql.org/wiki/Table_partitioning

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-24 Thread Simon Riggs

On Tue, 2009-11-24 at 17:30 +0900, Itagaki Takahiro wrote:
 Hannu Krosing ha...@2ndquadrant.com wrote:
 
  Even though this patch will not get in, most of the effort in developing
  it is not actual coding, but familiarizing yourself with the other code
  involved. 
 
 I just edited a wiki page for this discussion.
 I hope it can be a help.
 http://wiki.postgresql.org/wiki/Table_partitioning
 

Good job. Looks like a clear path forwards to me.

I've made a couple of minor clarifications.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-24 Thread Emmanuel Cecchet


Itagaki Takahiro wrote:

I just edited a wiki page for this discussion.
I hope it can be a help.
http://wiki.postgresql.org/wiki/Table_partitioning
  

I guess the problem of handling user triggers is still open.
If we allow triggers on partitions, badly written logic could lead to 
infinite loops in routing. In the case of COPY, an after statement 
trigger could change all the routing decisions taken for each row. I am 
not sure what the semantic should be if you have triggers defined on the 
parent and child tables. Which triggers do you fire if the insert is on 
the parent table but the tuple ends up in a child table?
If the new implementation hides the child tables, it might be safer to 
not allow triggers on child tables altogether and use the parent table 
as the single point of entry to access the partition (and define 
triggers). With the current proposed implementation, would it be 
possible to define a view using child tables?


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-24 Thread Itagaki Takahiro


Emmanuel Cecchet m...@asterdata.com wrote:

 I guess the problem of handling user triggers is still open.
 If we allow triggers on partitions, badly written logic could lead to 
 infinite loops in routing.

Infinite loops are not a partition-related problem, no?
We can also find infinite loops in user defined functions,
recursive queries, etc. I think the only thing we can do for it
is to *stop* loops instead of prevention, like max_stack_depth.

 With the current proposed implementation, would it be 
 possible to define a view using child tables?

No, if you mean using a partition-view. I'm thinking we are moving
our implementation of partitioning from view-based to built-in feature.
Do you have any use-cases that requires view-based partitioning?
Was the inheritance-based partitioning not enough for it?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-24 Thread Emmanuel Cecchet


Itagaki Takahiro wrote:

Emmanuel Cecchet m...@asterdata.com wrote:

  

I guess the problem of handling user triggers is still open.
If we allow triggers on partitions, badly written logic could lead to 
infinite loops in routing.


Infinite loops are not a partition-related problem, no?
We can also find infinite loops in user defined functions,
recursive queries, etc. I think the only thing we can do for it
is to *stop* loops instead of prevention, like max_stack_depth.
  
I was thinking a trigger on child1 updating the partition key forcing 
the tuple to move to child2. And then a trigger on child2 updating the 
key again to move the tuple back to child1. You end up with an infinite 
loop.
With the current proposed implementation, would it be 
possible to define a view using child tables?



No, if you mean using a partition-view. I'm thinking we are moving
our implementation of partitioning from view-based to built-in feature.
Do you have any use-cases that requires view-based partitioning?
Was the inheritance-based partitioning not enough for it?
  
Nevermind, I was thinking about the implications of materialized views 
but Postgres does not have materialized views!


I have other questions related to create table but I will post them in 
the 'syntax for partitioning' thread.


Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:
 Hi,
  I have extracted the partitioning option for COPY (removed the error 
  logging part) from the previous patch.
  
 
  We can use an INSERT trigger to route tuples into partitions even now.
  Why do you need an additional router for COPY? 
 Tom has already explained on the list why using a trigger was a bad idea 
 (and I know we can use a trigger since I am the one who wrote it).
 If you look at the code you will see that you can do optimizations in 
 the COPY code that you cannot do in the trigger.
 
   Also, it would be nicer
  that the router can works not only in COPY but also in INSERT.

 As 8.5 will at best provide a syntactic hack on top of the existing 
 constraint implementation, I think that it will not hurt to have routing 
 in COPY since we will not have it anywhere otherwise.
  BTW, I'm working on meta data of partitioning now. Your partitioning
  option in COPY could be replaced with the catalog.

 This implementation is only for the current 8.5 and it will not be 
 needed anymore once we get a fully functional partitioning in Postgres 
 which seems to be for a future version.

Yes, the trigger way of doing this is a bad way. 

I regret to say that the way proposed here isn't much better, AFAICS.
Let me explain why I think that, but -1 to anyone applying this patch.

This patch proposes keeping a cache of last visited partitions to reduce
the overhead of data routing.

What I've requested is that partitioning work by using a data structure
held in relcache for inheritance parents. This differs in 3 ways from
this patch
a) it has a clearly defined location for the cached metadata, with
clearly identified and well worked out mechanisms for cache invalidation
b) the cache can be built once when it is first needed, not slowly grown
as parts of the metadata are used
c) it would be available for all parts of the server, not just COPY.

The easiest way to build that metadata is when structured partitioning
info is available. i.e. the best next action is to complete and commit
Itagaki's partitioning syntax patch. Then we can easily build the
metadata for partitioning, which can then be used in COPY for data
routing.

Anyway, I want data routing, as is the intention of this patch. I just
don't think this patch is a useful way to do it. It is too narrow in its
scope and potentially buggy in its approach to developing a cache and
using trigger-like stuff. 

ISTM that with the right metadata in the right place, a cleaner and
easier solution is still possible for 8.5. The code within COPY should
really just reduce to a small piece of code to derive the correct
relation for the desired row and then use that during heap_insert().

I have just discussed partitioning with Itagaki-san at JPUG, so I know
his plans. Itagaki-san and Manu, please can you work together to make
this work for 8.5? 

---

A more detailed explanation of Partitioning Metadata:

Partitioning Metadata is information held on the relcache for a table
that has child partitions. Currently, a table does not cache info about
its children, which prevents various optimisations.

We would have an extra pointer on the Relation struct that points to a
PartitioningMetadata struct. We can fill in this information when we
construct the relcache for a relation, or we can populate it on demand
the first time we attempt to use that information (if it exists).

We want to hold an array of partition boundary values. This will then
allow us to use bsearch to find the partition that a specific value
applies to. Thus it can be used for routing data from INSERTs or COPY,
can be used for identifying which partitions need to be
included/excluded from an APPEND node. Using this will be O(logN) rather
than O(N), so allowing us to have much larger number of partitions when
required. Note that it can also be used within the executor to perform
dynamic partition elimination, thus allowing us to easily implement
partition aware joins etc.

To construct the array we must sort the partition boundary values and
prove that the partition definitions do not overlap. That is much easier
to do when the partitions are explicitly defined. (Plus, there is no
requirement to have, or mechanism to specify, unique partitions
currently, although most users assume this in their usage).

I imagine we would have an API called something like
RelationIdentifyPartition() where we provide value(s) for the
PartitioningKey column(s) and we then return the Oid of the partition
that holds that value. That function would build the metadata, if not
already cached, then bsearch it to provide the Oid.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Simon,

I think you should read the thread and the patch before making any false 
statements like you did in your email.


1. The patch does not use any trigger for routing.
2. This is just an option for COPY that is useful for loading operations 
in the datawarehouse world. It is not meant to implement full 
partitioning as explained many times already in this thread.
3. This patch elaborates on existing mechanisms and cannot rely on a 
meta-data representation of partitions which does not exist yet and will 
probably not exist in 8.5


You should justify your statements when you say 'potentially buggy in 
its approach to developing a cache and using trigger-like stuff'. I 
understand that you don't like it because this is not what you want but 
this is not my fault. This is not an implementation of partitioning like 
COPY does not do update/delete/alter/...
And yes the use case is 'narrow' like any option in COPY. It is like 
complaining that the CSV option is not useful because you want to load 
binary dumps.


If Itagaki gets the support of the community to get his implementation 
accepted, I will gladly use it. Contributing? If Aster is willing to 
contribute a code monkey to implement your specs, why not but you will 
have to convince them.


You should really think twice about the style of your emails that cast a 
detestable tone to discussions on pg-hackers.


Emmanuel



On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:
  

Hi,

I have extracted the partitioning option for COPY (removed the error 
logging part) from the previous patch.



We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY? 
  
Tom has already explained on the list why using a trigger was a bad idea 
(and I know we can use a trigger since I am the one who wrote it).
If you look at the code you will see that you can do optimizations in 
the COPY code that you cannot do in the trigger.




 Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
  
  
As 8.5 will at best provide a syntactic hack on top of the existing 
constraint implementation, I think that it will not hurt to have routing 
in COPY since we will not have it anywhere otherwise.


BTW, I'm working on meta data of partitioning now. Your partitioning
option in COPY could be replaced with the catalog.
  
  
This implementation is only for the current 8.5 and it will not be 
needed anymore once we get a fully functional partitioning in Postgres 
which seems to be for a future version.



Yes, the trigger way of doing this is a bad way. 


I regret to say that the way proposed here isn't much better, AFAICS.
Let me explain why I think that, but -1 to anyone applying this patch.

This patch proposes keeping a cache of last visited partitions to reduce
the overhead of data routing.

What I've requested is that partitioning work by using a data structure
held in relcache for inheritance parents. This differs in 3 ways from
this patch
a) it has a clearly defined location for the cached metadata, with
clearly identified and well worked out mechanisms for cache invalidation
b) the cache can be built once when it is first needed, not slowly grown
as parts of the metadata are used
c) it would be available for all parts of the server, not just COPY.

The easiest way to build that metadata is when structured partitioning
info is available. i.e. the best next action is to complete and commit
Itagaki's partitioning syntax patch. Then we can easily build the
metadata for partitioning, which can then be used in COPY for data
routing.

Anyway, I want data routing, as is the intention of this patch. I just
don't think this patch is a useful way to do it. It is too narrow in its
scope and potentially buggy in its approach to developing a cache and
using trigger-like stuff. 


ISTM that with the right metadata in the right place, a cleaner and
easier solution is still possible for 8.5. The code within COPY should
really just reduce to a small piece of code to derive the correct
relation for the desired row and then use that during heap_insert().

I have just discussed partitioning with Itagaki-san at JPUG, so I know
his plans. Itagaki-san and Manu, please can you work together to make
this work for 8.5? 


---

A more detailed explanation of Partitioning Metadata:

Partitioning Metadata is information held on the relcache for a table
that has child partitions. Currently, a table does not cache info about
its children, which prevents various optimisations.

We would have an extra pointer on the Relation struct that points to a
PartitioningMetadata struct. We can fill in this information when we
construct the relcache for a relation, or we can populate it on demand
the first time we attempt to use that information (if it exists).

We want to hold an array of partition boundary values. This will then
allow us to use bsearch to find the partition

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Robert Haas

On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet m...@asterdata.com wrote:
 I think you should read the thread and the patch before making any false
 statements like you did in your email.

 1. The patch does not use any trigger for routing.

Whoa, whoa!  I don't think that Simon said that it did.  But even if I
am wrong and he did...

 You should really think twice about the style of your emails that cast a
 detestable tone to discussions on pg-hackers.

...I certainly don't think this comment is justified.  This is a
technical discussion about the best way of solving a certain problem,
and I don't believe that any of the discussion up to this point has
been anything other than civil.  I can tell that you are frustrated
that your patch is not getting the support you would like to see it
get, but launching ad hominem attacks on Simon or anyone else is not
going to help.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Robert Haas wrote:

On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet m...@asterdata.com wrote:
  

I think you should read the thread and the patch before making any false
statements like you did in your email.

1. The patch does not use any trigger for routing.



Whoa, whoa!  I don't think that Simon said that it did.  But even if I
am wrong and he did...
  
Quote from Simon's email: It is too narrow in its scope and potentially 
buggy in its approach to developing a cache and using trigger-like stuff.

You should really think twice about the style of your emails that cast a
detestable tone to discussions on pg-hackers.


...I certainly don't think this comment is justified.  This is a
technical discussion about the best way of solving a certain problem,
and I don't believe that any of the discussion up to this point has
been anything other than civil.  I can tell that you are frustrated
that your patch is not getting the support you would like to see it
get, but launching ad hominem attacks on Simon or anyone else is not
going to help

We certainly don't live in the same civilization then.

I am not frustrated if my patch does not get in because of technical 
considerations and I am happy so far with Jan's feedback that helped a 
lot. I think there is a misunderstanding between what Simon wants 
('Anyway, I want data routing, as is the intention of this patch.') and 
what this patch is about. This patch is just supposed to load tuples in 
a hierarchy of tables as this is a recurrent use case in datawarehouse 
scenarios. It is not supposed to solve data routing in general 
(otherwise that would be integrated standard in COPY and not as an option).


But it looks like it is a waste of everybody's time to continue this 
discussion further. Just move the patch to the rejected patches and 
let's wait for Itagaki's implementation.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote:

 I think you should read the thread and the patch 

I did read the thread and patch in full before posting. My opinions are
given to help you and the community towards a desirable common goal.

I was unaware you were developing these ideas and so was unable to
provide comments until now. My review of Kedar's patch in July did lay
out in general terms a specific implementation route for future work on
partitioning. I had thought I might not have made those comments clearly
enough, so gave a more specific description of what I consider to be a
more workable and general solution for cacheing and using partitioning
metadata.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote:

 I think you should read the thread and the patch 

I did read the thread and patch in full before posting. My opinions are
given to help you and the community towards a desirable common goal.

I was unaware you were developing these ideas and so was unable to
provide comments until now. My review of Kedar's patch in July did lay
out in general terms a specific implementation route for future work on
partitioning. I had thought I might not have made those comments clearly
enough, so gave a more specific description of what I consider to be a
more workable and general solution for cacheing and using partitioning
metadata.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Simon Riggs wrote:

I was unaware you were developing these ideas and so was unable to
provide comments until now. 
The first patch was published to this list on September 10 (almost 2.5 
months ago) along with the wiki page describing the problem and the 
solution.

What should I have done to raise awareness further?

/E

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote:

 I think there is a misunderstanding between what Simon wants 
 ('Anyway, I want data routing, as is the intention of this patch.') and 
 what this patch is about. This patch is just supposed to load tuples in 
 a hierarchy of tables as this is a recurrent use case in datawarehouse 
 scenarios. It is not supposed to solve data routing in general 
 (otherwise that would be integrated standard in COPY and not as an option).

I have not misunderstood. You wish to solve a very specific problem,
with very specific code. I've done that myself on occasion. My opinion
is that we should solve many of the partitioning problems with one set
of central, common code. If we do not do this we will need 3-4 times as
much code, most of which will be similar and yet must be exactly the
same. That alone is enough to block the patch's proposed method (IMHO).

 But it looks like it is a waste of everybody's time to continue this 
 discussion further. Just move the patch to the rejected patches and 
 let's wait for Itagaki's implementation.

The lack of discussion and design in this area has held back the last
few patches, by various authors; we should learn from that. Also,
working in isolation on narrow problems will not move us forwards as
fast as if we all work together on pieces of the whole vision for
partitioning. My piece was to think through how to link each of the
different aspects of partitioning and to propose a solution. Please join
with Itagaki to move this forwards - your further contributions will be
valuable.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 10:43 -0500, Emmanuel Cecchet wrote:
 Simon Riggs wrote:
  I was unaware you were developing these ideas and so was unable to
  provide comments until now. 

 The first patch was published to this list on September 10 (almost 2.5 
 months ago) along with the wiki page describing the problem and the 
 solution.

 What should I have done to raise awareness further?

...Read my detailed comments in response to Kedar's patch and post
comments on that thread to say you didn't agree with that proposal and
that you were thinking of another way entirely. ~14 July. 4 months ago.

...Contact me personally when you saw that I hadn't responded to your
later posts, knowing that I have recent track record as a reviewer of
partitioning patches.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 Anyway, I want data routing, as is the intention of this patch. I just
 don't think this patch is a useful way to do it. It is too narrow in its
 scope and potentially buggy in its approach to developing a cache and
 using trigger-like stuff. 

FWIW, I agree --- there are two really fundamental problems with this
patch:

* It only applies to COPY.  You'd certainly want routing for INSERT as
  well.  And it shouldn't be necessary to specify an option.

* Building this type of infrastructure on top of independent, not
  guaranteed consistent table constraints is just throwing more work
  into a dead end.  The patch is already full of special-case errors
  for possible inconsistency of the constraints, and I don't think it's
  bulletproof even so (what if someone is altering the constraints
  concurrently? What if there's more than one legal destination?)
  And the performance necessarily sucks.

What we need first is an explicit representation of partitioning, and
then to build routing code on top of that.  I haven't looked at
Itagaki-san's syntax patch at all, but I think it's at least starting
in a sensible place.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Greg Smith


Simon Riggs wrote:

...Read my detailed comments in response to Kedar's patch and post
comments on that thread to say you didn't agree with that proposal and
that you were thinking of another way entirely.

Useful background here is:

http://wiki.postgresql.org/wiki/Table_partitioning
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php
http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com
http://archives.postgresql.org/message-id/1247564358.11347.1308.ca...@ebony.2ndquadrant

The basic problem here is that Emmanuel and Aster developed a useful 
answer to one of the more pressing implementation details needed here, 
but did so without being involved in the much larger discussion of how 
to implement general, more automated partitioning in PostgreSQL that (as 
you can see from the date of the first links there) has been going on 
for years already.  What we did wrong as a community is not more 
explicitly tell Emmanuel the above when he first submitted code a few 
months ago, before he'd invested more time on a subset implementation 
that was unlikely to be committed.  As I already commented upthread, I 
was just happy to see coding progress being made on part of the design 
that nobody had hacked on before to my knowledge; I didn't consider then 
how Emmanuel was going to be disappointed by the slow rate that code 
would be assimilated into the design going on in this area.


What would probably be helpful here is to take the mess of raw data 
above and turn it into a simpler partitioning roadmap.  There's a stack 
of useful patches here, multiple contributors who have gotten familiar 
with the implementation details required, and enough time left that it's 
possible to pull something together in time for 8.5--but only if 
everyone is clear on exactly what direction to push toward.  I'm going 
to reread the history here myself and see if I can write something 
helpful here.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 12:23 -0500, Greg Smith wrote:

 What would probably be helpful here is to take the mess of raw data 
 above and turn it into a simpler partitioning roadmap.  

Thanks for summarising.

I briefly tried to do that on the thread for Itagaki-san's patch. That's
a first stab at things, at least. 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Itagaki Takahiro


Tom Lane t...@sss.pgh.pa.us wrote:

 What we need first is an explicit representation of partitioning, and
 then to build routing code on top of that.  I haven't looked at
 Itagaki-san's syntax patch at all, but I think it's at least starting
 in a sensible place.

I have the following development plan for partitioning.
I'll continue to use inherits-based partitioning... at least in 8.5.

8.5 Alpha 3:
Syntax and catalog changes (on-disk structure).
I think pg_dump is the biggest stopper in the phase.

8.5 Alpha 4:
Internal representation (on-memory structure), that will replace
insert-triggers first, and also replace CHECK constraints if
possible (but probably non-INSERT optimizations will slide to 8.6).

The internal representation of RANGE partitions will be an array of
pairs of { upper-value, parition-relid } for each parent table.
An insert target partition are determined using binary search on insert.
It will be faster than sequential checks of CHECK constraint
especially in large number of child tables. The array will be kept
in CacheMemoryContext or query context to reduce access to the system
catalog. RelationData or TupleDesc will have an additional field for it.


 * It only applies to COPY.  You'd certainly want routing for INSERT as
   well.  And it shouldn't be necessary to specify an option.

Sure. We need the routingin both INSERT and COPY. Even if Emmanuel-san's
patch will be committed in Alpha 3, the code would be discarded in Alpha 4.

 * Building this type of infrastructure on top of independent, not
   guaranteed consistent table constraints is just throwing more work
   into a dead end.

I think the current approach is not necessarily wrong for CHECK-based
partitioning, but I'd like to have more specialized or generalized
functionality for the replacement of triggers.

If we will take specialized approach, triggers will be replaced with
built-in feature. We can only use RANGE and LIST partitions.

On the other hand, it might be interesting to take some generalized
approach; For example, spliting BEFORE INSERT triggers into 3 phases:
  1. Can cancel the insert and modify the new tuple.
  2. Can cancel the insert, but cannot modify tuple.
  3. Neigher can cancel nor modify.
We call triggers in the number order. INSERT TRIGGERs are implemented
in 2nd phase, so we're not afraid of modifing partition keys.
(3rd phase will be used for replication trigger.)

However, I think generalized one is overkill.
A specialized approach would be enough.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Nikhil Sontakke

Hi,

 What would probably be helpful here is to take the mess of raw data
 above and turn it into a simpler partitioning roadmap.

 Thanks for summarising.


Yeah, excellent summary Greg. As you rightly pointed out, partitioning
needs a broad roadmap so that the community can contribute in unison.
That ways we can in future avoid decent efforts like Manu's which
might not bear any fruit because of the prevailing confusion today..

 I briefly tried to do that on the thread for Itagaki-san's patch. That's
 a first stab at things, at least.

+1. Itagaki-san's patch seems like a firm foot forward.

Regards,
Nikhils
-- 
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-22 Thread Jan Urbański

Emmanuel Cecchet wrote:
 Jan,
 
 A couple of nitpicks first:

  o) the route_tuple_to_child recurses to child tables of child tables,
 which is undocumented and requires a check_stack_depth() call if it's
 really desirable
   
 The recursive call is as deep as the inheritance hierarchy. I am not
 sure what we are supposed to do if check_stack_depth() fails.

I think that check_stack_depth() just throws an elog(ERROR) when the
stack depth is exceeded, so you can just add a check_stack_depth() call
somewhere at the beginning of route_tuple_to_child and that's it.

  o) the error messages given when a trigger modifies the tuple should be
 one sentence, I suggest dropping the Aborting insert part
   
 Where are those rules about error messages specified?

http://www.postgresql.org/docs/current/static/error-style-guide.html

Dropping Aborting insert is just a suggestion, it's possible the error
message will sound OK to a native English speaker.

  o) there are two places with Close the relation but keep the lock
 comments. Why is in necessary to keep the locks? I confess I don't know
 why *wouldn't* it be necessary, but maybe the comment could explain
 that? Or is it just my lack of understanding and it should be obvious
 that the lock needs to be kept?
   
 As we did write to the table, we must maintain the lock on it until the
 operation or transaction is complete.

OK, understood.

  o) the result of relation_open is explicitly cast to Relation, the
 result of try_relation_open is not (a minor gripe)
   
 The first cast was unnecessary, I removed it.

OK.

  o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted
 from just above, I guess there was a reason why you needed that code,
 but I also suspect that's a string indication that something's wrong
 with the abstractions in your patch.

 As I explained to Tom, if the after row trigger is called asynchronously
 I get a relcache leak on the child table at the end of the copy
 operation. If the trigger is called synchronously (like a before row
 trigger) it works fine. Also calling the after row trigger synchronously
 allows me to detect any potential problem between the actions of the
 trigger and the routing decision. I am open to any suggestion for a more
 elegant solution.

OK, my competence ends here :-) Someone with a better knowledge of the
code should comment on that, I certainly don't have a better proposal.

 Oh, actually, the cache is outright *wrong*, as the attached test6.sql
 shows. Ugh, let's just forget about that LRU cache for now.
   
 Point taken, I have removed the cache from the GUC variables and it is
 now only used for the duration of the COPY operation.

OK, that looks better.

  o) the patch could use some more docs, especially about descending into
 child tables.
   
 Do you mean an overall comment explaining the design? Otherwise there is
 a comment for every single 'if' and block of code in the patch. Be more
 specific if you have a special location where you think comments are
 missing or too vague.

I was thinking more about SGML docs. They could mention that BEFORE
triggers are fired both for the parent table and for the child table,
while AFTER triggers will only be called on the target table. I'd add a
sentence or two explaining what happens if you have a three-level
inheritance hierarchy (that the tuple will be inserted in the bottommost
table of the hierarchy).

  o) my main concern is still valid: the design was never agreed upon.

 I already commented on that part in another message and this is not
 related to that patch but to the politics of implementing partitioning
 in Postgres. Now if the rejection of the patch is based on political
 stances rather than technical once, I can understand that too.

Sure, sorry if it sounded harsh. As I said I have virtually no field
experience with PG, so I might have a wrong perspective. I also don't
feel particulary eligible to judge your approach of handling automatic
partitioning designwise.

Except for the really minor things like checking stack depth and adding
a few sentences to the SGML docs, I think it's time someone more
qualified looked at the patch. If you'd like to send a new version, I'll
wait for it and mark it as ready for committer review. Thanks for
persisting with the patch and sorry for nitpicking so much :-)

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-22 Thread Stephan Szabo


On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:

 As I explained to Tom, if the after row trigger is called asynchronously
 I get a relcache leak on the child table at the end of the copy
 operation. If the trigger is called synchronously (like a before row
 trigger) it works fine. Also calling the after row trigger synchronously
 allows me to detect any potential problem between the actions of the
 trigger and the routing decision. I am open to any suggestion for a more
 elegant solution.

Well, I think there are still some issues there that at least need to be
better documented.

For example,
 create or replace function fi() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from i where i.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 create or replace function fc() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from c where c.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 create or replace function fp() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from p where p.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 drop table i;
 drop table c;
 drop table p cascade;

 create table i(i int, p int);
 create trigger tri after insert on i for each row execute procedure fi();

 create table c(i int, p int);
 create trigger trc after insert on c for each row execute procedure fc();

 create table p(i int, p int);
 create table p1 (check (i  0 and i = 10)) inherits (p);
 create table p2 (check (i  10 and i = 20)) inherits (p);
 create table p3 (check (i  20 and i = 30)) inherits (p);
 create trigger trp1 after insert on p1 for each row execute procedure fp();
 create trigger trp2 after insert on p2 for each row execute procedure fp();
 create trigger trp3 after insert on p3 for each row execute procedure fp();

insert into i values (1,3),(2,1),(3,NULL);
copy c from stdin;
1   3
2   1
3   \N
\.
copy p from stdin with (partitioning);
1   3
2   1
3   \N
\.

gives me a successful load into i and c, but not into p with the current
patch AFAICS while a load where the 3 row is first does load.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-22 Thread Emmanuel Cecchet


Stephan Szabo wrote:

On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:

  

As I explained to Tom, if the after row trigger is called asynchronously
I get a relcache leak on the child table at the end of the copy
operation. If the trigger is called synchronously (like a before row
trigger) it works fine. Also calling the after row trigger synchronously
allows me to detect any potential problem between the actions of the
trigger and the routing decision. I am open to any suggestion for a more
elegant solution.



Well, I think there are still some issues there that at least need to be
better documented.

For example,
 create or replace function fi() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from i where i.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 create or replace function fc() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from c where c.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 create or replace function fp() returns trigger as '
  begin
   if (NEW.p is not null) then
if (select count(*) from p where p.i = NEW.p) = 0 then
 raise exception ''No parent'';
end if;
   end if;
   return NEW;
  end;
 ' language 'plpgsql';

 drop table i;
 drop table c;
 drop table p cascade;

 create table i(i int, p int);
 create trigger tri after insert on i for each row execute procedure fi();

 create table c(i int, p int);
 create trigger trc after insert on c for each row execute procedure fc();

 create table p(i int, p int);
 create table p1 (check (i  0 and i = 10)) inherits (p);
 create table p2 (check (i  10 and i = 20)) inherits (p);
 create table p3 (check (i  20 and i = 30)) inherits (p);
 create trigger trp1 after insert on p1 for each row execute procedure fp();
 create trigger trp2 after insert on p2 for each row execute procedure fp();
 create trigger trp3 after insert on p3 for each row execute procedure fp();

insert into i values (1,3),(2,1),(3,NULL);
copy c from stdin;
1   3
2   1
3   \N
\.
copy p from stdin with (partitioning);
1   3
2   1
3   \N
\.

gives me a successful load into i and c, but not into p with the current
patch AFAICS while a load where the 3 row is first does load.
  
Well, if you don't insert anything in p (the table, try to avoid using 
the same name for the table and the column in an example), copy will 
insert (1,3) in p1 and then the trigger will evaluate


select count(*) from p where p.i = NEW.p = NEW.p is 3 and the only p.i 
available is 1.

This should return 0 rows and raise the exception. This seems normal to me.
The only reason it works for i is because you inserted the values before 
the copy.


Am I missing something?
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-22 Thread Stephan Szabo


On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:

 Stephan Szabo wrote:
  On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:
 
 
  As I explained to Tom, if the after row trigger is called asynchronously
  I get a relcache leak on the child table at the end of the copy
  operation. If the trigger is called synchronously (like a before row
  trigger) it works fine. Also calling the after row trigger synchronously
  allows me to detect any potential problem between the actions of the
  trigger and the routing decision. I am open to any suggestion for a more
  elegant solution.
 
 
  Well, I think there are still some issues there that at least need to be
  better documented.
 
  For example,
   create or replace function fi() returns trigger as '
begin
 if (NEW.p is not null) then
  if (select count(*) from i where i.i = NEW.p) = 0 then
   raise exception ''No parent'';
  end if;
 end if;
 return NEW;
end;
   ' language 'plpgsql';
 
   create or replace function fc() returns trigger as '
begin
 if (NEW.p is not null) then
  if (select count(*) from c where c.i = NEW.p) = 0 then
   raise exception ''No parent'';
  end if;
 end if;
 return NEW;
end;
   ' language 'plpgsql';
 
   create or replace function fp() returns trigger as '
begin
 if (NEW.p is not null) then
  if (select count(*) from p where p.i = NEW.p) = 0 then
   raise exception ''No parent'';
  end if;
 end if;
 return NEW;
end;
   ' language 'plpgsql';
 
   drop table i;
   drop table c;
   drop table p cascade;
 
   create table i(i int, p int);
   create trigger tri after insert on i for each row execute procedure fi();
 
   create table c(i int, p int);
   create trigger trc after insert on c for each row execute procedure fc();
 
   create table p(i int, p int);
   create table p1 (check (i  0 and i = 10)) inherits (p);
   create table p2 (check (i  10 and i = 20)) inherits (p);
   create table p3 (check (i  20 and i = 30)) inherits (p);
   create trigger trp1 after insert on p1 for each row execute procedure fp();
   create trigger trp2 after insert on p2 for each row execute procedure fp();
   create trigger trp3 after insert on p3 for each row execute procedure fp();
 
  insert into i values (1,3),(2,1),(3,NULL);
  copy c from stdin;
  1   3
  2   1
  3   \N
  \.
  copy p from stdin with (partitioning);
  1   3
  2   1
  3   \N
  \.
 
  gives me a successful load into i and c, but not into p with the current
  patch AFAICS while a load where the 3 row is first does load.
 
 Well, if you don't insert anything in p (the table, try to avoid using
 the same name for the table and the column in an example), copy will
 insert (1,3) in p1 and then the trigger will evaluate

 select count(*) from p where p.i = NEW.p = NEW.p is 3 and the only p.i 
 available is 1.

 This should return 0 rows and raise the exception. This seems normal to me.

 The only reason it works for i is because you inserted the values before
 the copy.

 Am I missing something?

I believe so unless I am.

There are three separate cases being run for comparison purposes.
Multi-row insert on i where an after trigger on i checks the parents
within i, a copy on c where an after trigger on c checks the parents
within c, a copy on p (with inheritance) where an after trigger on p*
checks the parents within the p hierarchy.

So, in the case of the multi-row insert, it's inserting (1,3), but it
doesn't immediately check, it inserts (2,1) and (3,NULL) before running
the checks. The same seems to happen for the base copy. Copy with
inheritance seems to be working differently. That may or may not be okay,
but if it's different it needs to be prominently mentioned in
documentation.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-22 Thread Robert Haas

On Sun, Nov 22, 2009 at 12:35 PM, Jan Urbański wulc...@wulczer.org wrote:
 I was thinking more about SGML docs. They could mention that BEFORE
 triggers are fired both for the parent table and for the child table,
 while AFTER triggers will only be called on the target table. I'd add a
 sentence or two explaining what happens if you have a three-level
 inheritance hierarchy (that the tuple will be inserted in the bottommost
 table of the hierarchy).

I have a hard time believing this is OK, even with documentation.
While it might be OK in some (many?) particular use cases to fire
triggers in this way, making COPY with the partitioning option fire
different triggers at different times than COPY without the
partitioning option - and in fact every other method of getting data
into a table, all of which are consistent with each other and with
COPY without the partitioning option - seems like a bad idea to me.  I
don't think the behavior described above is OK, and I also don't think
that the changes in the timing of AFTER-trigger firing are OK.  I
understand that without that change there was a relcache leak, but I
think that just means that bug needs to be found and fixed.

I would also like to see some more discussion of the basic mechanism
of this patch.  Essentially, what it's trying to do is traverse the
inheritance hierarchy looking for a table whose constraints match the
current tuple, and then inserting the tuple there.  First - is this a
good idea?  It's embeds some assumptions about how inheritance
hierarchies are set up which don't seem totally unreasonable, but even
so I'm not sure we want to go that route.  Second - in lieu of
accepting this approach, do we want to wait for Itagaki Takahiro's
partitioning syntax patch to go in (as I am hoping that it will) and
then do something more structured based on the notation introduced
there?

One thing that biases me toward thinking that maybe we should wait is
the fact that this patch relies on an MRU list to determine into which
child table a particular tuple should be inserted.  If the constraints
on the child tables are not mutually exclusive, the tuple routing
won't be deterministic, which seems undesirable to me.  On the other
hand, if we got rid of the MRU cache and made the order deterministic
(say, alphabetical by partition name) then I'm guessing this would be
quite slow for large numbers of partitions when most of the tuples
need to go into the later partitions.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-21 Thread Jan Urbański

Emmanuel Cecchet wrote:
 Hi Jan,
 
 Here is the updated patch.
 Note that the new code in trigger is a copy/paste of the before row
 insert trigger code modified to use the pointers of the after row
 trigger functions.

Hi,

ok, this version applied, compiled and ran the regression tests fine. I
tried a few things and was not able to break it this time.

A couple of nitpicks first:

 o) the route_tuple_to_child recurses to child tables of child tables,
which is undocumented and requires a check_stack_depth() call if it's
really desirable

 o) the error messages given when a trigger modifies the tuple should be
one sentence, I suggest dropping the Aborting insert part

 o) there are two places with Close the relation but keep the lock
comments. Why is in necessary to keep the locks? I confess I don't know
why *wouldn't* it be necessary, but maybe the comment could explain
that? Or is it just my lack of understanding and it should be obvious
that the lock needs to be kept?

 o) the result of relation_open is explicitly cast to Relation, the
result of try_relation_open is not (a minor gripe)

And a couple of more important things:

 o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted
from just above, I guess there was a reason why you needed that code,
but I also suspect that's a string indication that something's wrong
with the abstractions in your patch. Again I don't really know how else
you could achieve what you want. It just looks fishy if you need to
modify trigger.c to add an option to COPY.

 o) publicizing ExecRelCheck might also indicate a problem, but I guess
that can be defended, as the patch is basically based on using that
function for each incoming tuple

 o)  the LRU OID cache is a separate optimisation that could be
separated from the patch. I didn't do any performance tests, and I trust
that a cache like that helps with some workloads, but I think we could
do a better effort that a simplistic cache like that. Also, I'm not 100%
sure it's OK to just stick it into CacheMemoryContext... Maybe it could
go into the COPY statement context? You said you don't want to start
with a cold cache always, but OTOH if you're loading into different
tables in the same backend, the cache will actually hurt...

[thinks of something really bad... types up a quick test...]

Oh, actually, the cache is outright *wrong*, as the attached test6.sql
shows. Ugh, let's just forget about that LRU cache for now.

 o) the patch could use some more docs, especially about descending into
child tables.

 o) my main concern is still valid: the design was never agreed upon.
The approach of using inheritance info for automatic partitioning is, at
least IMHO, too restricted. Regular INSERTs won't get routed to child
tables. Data from writable CTEs won't get routed. People wanting to do
partitioning on something else that constraints are stuffed.

I strongly suspect the patch will get rejected on the grounds of lack of
community agreement on partitioning, but I'd hate to see your work
wasted. It's not too late to open a discussion on how automatic
partitioning could work (or start working out a common proposal with the
people discussing in the Syntax for partitioning thread).

Marking as Waiting on Author, although I'd really like to see a solid
design being agreed upon, and then the code.

Cheers,
Jan
drop table parent cascade;
drop table parent2 cascade;

create table parent(i int);
create table c1 (check (i  0 and i = 1)) inherits (parent);

create table parent2(i int);
create table c12 (check (i  0 and i = 1)) inherits (parent2);

set copy_partitioning_cache_size = 1;

copy parent from stdin with (partitioning);
1
\.

copy parent2 from stdin with (partitioning);
1
\.

-- all tuples went to parent !
select * from parent;
-- is empty
select * from parent2;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-21 Thread Greg Smith


Jan Urbański wrote:

 o) my main concern is still valid: the design was never agreed upon.
The approach of using inheritance info for automatic partitioning is, at
least IMHO, too restricted. Regular INSERTs won't get routed to child
tables. Data from writable CTEs won't get routed. People wanting to do
partitioning on something else that constraints are stuffed.
  
Whether or not the other paths to load data are supported, COPY is the 
one you have to get right before this sort of feature is useful to the 
sort of use-cases that need partitioning the most.  While your concerns 
are valid, I hope Emmanuel doesn't take your feedback the wrong way.  
I'll take a working prototype that needs improvement over a paper design 
with no implementation anytime.  He's coming at this bottom-up starting 
with the little details that needs to be right, the partitioning syntax 
patch is starting at the top and working downward, there seems to be 
clear progress being made toward the eventual goal somewhere in the 
middle of all that here.  Considering how long this area has been bogged 
down in discussion without action, I'm rather glad we're seeing working 
proof of concepts bits rather than just talking about things.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-21 Thread Emmanuel Cecchet


Jan Urbański wrote:

 o) my main concern is still valid: the design was never agreed upon.
The approach of using inheritance info for automatic partitioning is, at
least IMHO, too restricted. Regular INSERTs won't get routed to child
tables. Data from writable CTEs won't get routed. People wanting to do
partitioning on something else that constraints are stuffed.
  
Well, this patch does not claim to implement partitioning for Postgres, 
it just offers partitioning as an option for COPY (and COPY only) based 
on the existing mechanism in Postgres.
I have already participated in lengthy and relatively sterile 
discussions on how to implement a full-blown partitioning but we never 
reached the beginning of an agreement and it was decided that a 
step-by-step approach would be better. I will propose another 
implementation of partitioning in COPY once Postgres has another 
representation than constraints on child tables to implement it.

I strongly suspect the patch will get rejected on the grounds of lack of
community agreement on partitioning, but I'd hate to see your work
wasted. It's not too late to open a discussion on how automatic
partitioning could work (or start working out a common proposal with the
people discussing in the Syntax for partitioning thread).
  
This is not my call. Right now the syntax for partitioning does not 
change anything to Postgres, it just adds syntactic sugar on top of the 
existing implementation. It will not route anything or answer any of the 
needs you mentioned in your previous point.

Marking as Waiting on Author, although I'd really like to see a solid
design being agreed upon, and then the code.
  
You are asking the wrong person if you want me to lead the partitioning 
design discussions. I already tried once and I was unsuccessful. As 
nothing as changed I don't see why I would be more successful this time.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-20 Thread Jan Urbański

Emmanuel Cecchet wrote:
 Hi Jan,
 
 Here is a new version of the patch with the following modifications:
 - used oid list from pg_list.h
 - properly handles triggers and generate an error if needed (updated doc
 as well)
 - added your test cases + extra bad trigger cases

Hi,

that got broken by the WHEN triggers patch
(c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
TriggerEnabled function signature, the code currently does not compile.

I'll continue reading, in the meantime could you send a updated patch?

Thanks,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-20 Thread Tom Lane

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 that got broken by the WHEN triggers patch
 (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
 TriggerEnabled function signature, the code currently does not compile.

[ squint... ]  What is that patch doing touching the innards of
trigger.c in the first place?  I can't see any reason for trigger.c
to be associated with partitioning.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-20 Thread Emmanuel Cecchet


Tom Lane wrote:

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
  

that got broken by the WHEN triggers patch
(c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
TriggerEnabled function signature, the code currently does not compile.



[ squint... ]  What is that patch doing touching the innards of
trigger.c in the first place?  I can't see any reason for trigger.c
to be associated with partitioning.
  
The problem I had is that if I used the standard trigger mechanism for 
after row inserts on a child table where the trigger is called 
asynchronously, I had a relcache leak on the child table. I tried to ask 
for help on that earlier on but it got lost with other discussions on 
the patch. So I tried to call the after trigger synchronously on the 
child table and it worked.
So the patch is just adding a synchronous call to after row insert 
triggers that is called when the tuple is moved to a child table (also 
allows to detect for triggers that are messing with the routing).


I would be happy to follow any recommendation for a more elegant 
solution to the problem.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Emmanuel Cecchet


Tom Lane wrote:

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
  

Program received signal SIGSEGV, Segmentation fault.
0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
1821child_relation_id =
child_oid_cell-oid_value;
(gdb) p child_oid_cell
$1 = (OidCell *) 0x7f7f7f7f



This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure.  If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.
  
I was using the CacheMemoryContext. Could someone tell me why this is 
wrong and what should have been the appropriate context to use?


Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Tom Lane

Emmanuel Cecchet m...@asterdata.com writes:
 Tom Lane wrote:
 This looks like the patch is trying to create a data structure in a
 memory context that's not sufficiently long-lived for the use of the
 structure.  If you do this in a non-cassert build, it will seem to
 work, some of the time, if the memory in question happens to not
 get reallocated to something else.
 
 I was using the CacheMemoryContext. Could someone tell me why this is 
 wrong and what should have been the appropriate context to use?

Well, (a) I doubt you really were creating the list in
CacheMemoryContext, else it'd have not gotten clobbered; (b) creating
statement-local data structures in CacheMemoryContext is entirely
unacceptable anyway, because then they represent a permanent memory
leak.

The right context for statement-lifetime data structures is generally
the CurrentMemoryContext the statement code is called with.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet m...@asterdata.com writes:
  

Tom Lane wrote:


This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure.  If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.

  
I was using the CacheMemoryContext. Could someone tell me why this is 
wrong and what should have been the appropriate context to use?



Well, (a) I doubt you really were creating the list in
CacheMemoryContext, else it'd have not gotten clobbered; (b) creating
statement-local data structures in CacheMemoryContext is entirely
unacceptable anyway, because then they represent a permanent memory
leak.
  

Well I thought that this code would do it:

child_table_lru = (OidLinkedList 
*)MemoryContextAlloc(
+   CacheMemoryContext, 
sizeof(OidLinkedList));
...
+   /* Add the new entry in head of the list */
+   new_head = (OidCell *) MemoryContextAlloc(
+   CacheMemoryContext, sizeof(OidCell));



The right context for statement-lifetime data structures is generally
the CurrentMemoryContext the statement code is called with.
  
Actually the list is supposed to stay around between statement 
executions. You don't want to restart with a cold cache at every 
statement so I really want this structure to stay in memory at a more 
global level.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Tom Lane

Emmanuel Cecchet m...@asterdata.com writes:
 Actually the list is supposed to stay around between statement 
 executions. You don't want to restart with a cold cache at every 
 statement so I really want this structure to stay in memory at a more 
 global level.

Cache?  Why do you need a cache for COPY?  Repeated bulk loads into the
same table within a single session doesn't seem to me to be a case that
is common enough to justify a cache.

(BTW, the quoted code seems to be busily reinventing OID Lists.  Don't
do that.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Emmanuel Cecchet


Tom Lane wrote:

Emmanuel Cecchet m...@asterdata.com writes:
  
Actually the list is supposed to stay around between statement 
executions. You don't want to restart with a cold cache at every 
statement so I really want this structure to stay in memory at a more 
global level.



Cache?  Why do you need a cache for COPY?  Repeated bulk loads into the
same table within a single session doesn't seem to me to be a case that
is common enough to justify a cache.
  
Actually the cache is only activated if you use the partitioning option. 
It is just a list of oids of child tables where tuples were inserted.
It is common to have multiple COPY operations in the same session when 
you are doing bulk loading in a warehouse.

(BTW, the quoted code seems to be busily reinventing OID Lists.  Don't
do that.)
  
Yes, I understood that I should use an OidList instead. But I was trying 
to understand what I did wrong here (besides reinventing the oid list ;-)).

Why do I get this segfault if I use memory from CacheMemoryContext?

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-17 Thread Tom Lane

Emmanuel Cecchet m...@asterdata.com writes:
 Tom Lane wrote:
 Cache?  Why do you need a cache for COPY?

 Actually the cache is only activated if you use the partitioning option. 
 It is just a list of oids of child tables where tuples were inserted.

Umm ... why is that useful enough to be cached?

 Why do I get this segfault if I use memory from CacheMemoryContext?

Well, CacheMemoryContext will never be reset, so either you freed the
data structure yourself or there's something wrong with the pointer
you think is pointing at the data structure ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-16 Thread Jan Urbański

Hi,

I'll hopefully look at the next version of the patch tommorrow.

Emmanuel Cecchet wrote:
  o test1.sql always segfaults for me, poking around with gdb suggests
 it's a case of an uninitialised cache list (another reason to use the
 builtin one).
   
 I was never able to reproduce that problem. I don't know where this
 comes from.

 I have integrated your tests in the regression test suite and I was
 never able to reproduce the segfault you mentioned. What platform are
 you using?

In the meantime I tried the test1.sql file again and it still segfaulted
for me.
I'm using 32bit Linux, PG compiled with:

$ ./configure CFLAGS=-O0 --enable-cassert --enable-debug --without-perl
--without-python --without-openssl --without-tcl

and then I start postmaster, fire up psql, attach gdb to the backend, do
\i test1.sql and get:

Program received signal SIGSEGV, Segmentation fault.
0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
1821child_relation_id =
child_oid_cell-oid_value;
(gdb) bt
#0  0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
#1  0x081950e3 in CopyFrom (cstate=0x871e0dc) at copy.c:2480
#2  0x08192532 in DoCopy (stmt=0x86fb144, queryString=0x86fa73c copy
parent from stdin with (partitioning);) at copy.c:1227

(gdb) p child_oid_cell
$1 = (OidCell *) 0x7f7f7f7f

(gdb) p child_oid_cell-oid_value
Cannot access memory at address 0x7f7f7f7f


That 0x7f7f7f7f looks like clobbered memory, the memory management funcs
do that when cassert is enabled, IIRC.

Cheers,
Jan

-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-16 Thread Tom Lane

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 Program received signal SIGSEGV, Segmentation fault.
 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
 tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
 1821child_relation_id =
 child_oid_cell-oid_value;
 (gdb) p child_oid_cell
 $1 = (OidCell *) 0x7f7f7f7f

This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure.  If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.

A good rule of thumb is to never do code development in a non-cassert
build.  You're just setting yourself up for failure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-16 Thread Greg Smith


Tom Lane wrote:

A good rule of thumb is to never do code development in a non-cassert
build. 
And the same rule goes for review, too; I'll update the review 
guidelines to spell that out more clearly.  Basically, if you're doing 
any work on new code, you should have cassert turned on, *except* if 
you're doing performance testing.  The asserts slow things down enough 
(particularly with large shared_buffers values) to skew performance 
tests, but in all other coding situations you should have them enabled.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-14 Thread Jan Urbański

Emmanuel Cecchet wrote:
 Hi all,

Hi!,

 partitioning option for COPY

Here's the review:

== Submission ==
The patch is contextual, applies cleanly to current HEAD, compiles fine.
The docs build cleanly.

== Docs ==
They're reasonably clear, although they still mention ERROR_LOGGING,
which was taken out of this patch. They could use some wordsmithing, but
I didn't go into details, as there were more severe issues with the patch.

One thing that made me cautious was the mention that triggers modifying
tuples will make random errors appear. As is demonstrated later,
triggers are a big issue.

== Regression tests ==
They ran fine, there's one additional regression test that exercises the
new option.

== Style/nitpicks ==
Minor gripes include:
 o instead of using an ad-hoc data structure for the LRU cache list, I'd
suggest an OidList from pg_list.h.
 o some mentions of method in comments should be changed to function
 o trailing whitespace in the patch (it's not the end of the world, of
course)

== Issues ==
Attached are 3 files that demonstrate problems the patch has.
 o test1.sql always segfaults for me, poking around with gdb suggests
it's a case of an uninitialised cache list (another reason to use the
builtin one).
 o test2.sql demonstrates, that indices on child tables are not being
updated, probably because after resultRelInfo in
check_tuple_constraints() gets created is never has ri_NumIndices set,
and so the code that was supposed to take care of indices is never
called. Looks like a copy-paste error.
 o test3.sql demonstrates, that some triggers that I would expect to be
fired are in fact not fired. I guess it's the same reason as mentioned:
ri_TrigDesc never gets set, so the code that calls triggers is dead.

I stopped there, because unfortunately, apart from all that there's one
fundamental problem with this patch, namely we probably don't want it.

As it stands it's more of a proof of concept than a really usable
solution, it feels like built from spare (copied from around copy.c)
parts. IMHO it's much too narrow for a general partitioning solution,
even if the design it's based upon would be accepted. It's assuming a
lot of things about the presence of child tables (with proper
constraints), the absence of triggers, and so on.

Granted, it solves a particular problem (bulk loading into a partitioned
table, with not extra features like triggers and with standard
inheritance/exclusive check constraints setup), but that's not good
enough in my opinion, even if all other issues would be addressed.

Now I'm not a real Postgres user, it's been a while since I worked in a
PG shop (or a DB shop for that matter), but from what I understand from
following this community for a while, a patch like that doesn't have a
lot of chances to be committed. That said, my puny experience with real
PG installations and their needs must be taken into account here.

I'll mark this patch as Waiting on Author, but I have little doubt
that even after fixing those probably trivial segfaults etc. the patch
would be promptly rejected by a committer. I suggest withdrawing it from
this commitfest and trying to work out a more complete design first that
would address the needs of a bigger variety of users, or joining some of
the already underway efforts to bring full-featured partitioning into
Postgres.

Best,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-14 Thread Jan Urbański

Jan Urbański wrote:
 Emmanuel Cecchet wrote:
 Hi all,
 
 Hi!,
 
 partitioning option for COPY

 Attached are 3 files that demonstrate problems the patch has.

And the click-before-you-think prize winner is... me.

Test cases attached, see the comments for expected/actual results.

Jan
-- segfaults, probably uninitialised cache oid list

-- disabling cache fixes it
-- set copy_partitioning_cache_size = 0;

drop table parent cascade;

create table parent(i int);
create table c1 (check (i  0 and i = 1)) inherits (parent);

copy parent from stdin with (partitioning);
1
\.

drop table parent cascade;

create table parent(i int);
create table c1 (check (i  0 and i = 1)) inherits (parent);


copy parent from stdin with (partitioning);
1
\.
set copy_partitioning_cache_size = 0;

drop table parent cascade;

create table parent(i int, j int);
create table c1 (check (i  0 and i = 1)) inherits (parent);
create table c2 (check (i  1 and i = 2)) inherits (parent);
create table c3 (check (i  2 and i = 3)) inherits (parent);

create index c1_idx on c1(j);

copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent';

copy parent from '/tmp/parent' with (partitioning);

analyse;

set enable_seqscan to false;
-- no rows, index was not updated
select * from c1 where j = 3;

set enable_seqscan to true;
set enable_indexscan to false;
-- some rows
select * from c1 where j = 3;
set copy_partitioning_cache_size = 0;

drop table parent cascade;
drop table audit cascade;
drop function audit();

create table parent(i int);
create table c1 (check (i  0 and i = 1)) inherits (parent);
create table c2 (check (i  1 and i = 2)) inherits (parent);
create table c3 (check (i  2 and i = 3)) inherits (parent);

create table audit(i int);

create function audit() returns trigger as $$ begin insert into audit(i) values 
(new.i); return new; end; $$ language plpgsql;

create trigger parent_a after insert on parent for each row execute procedure 
audit();
-- the before trigger on the parent would get fired
-- create trigger parent_a2 before insert on parent for each row execute 
procedure audit();
create trigger c1_a before insert on c1 for each row execute procedure audit();
create trigger c1_a2 after insert on c1 for each row execute procedure audit();

copy parent from stdin with (partitioning);
1
2
3
\.

-- no rows
select * from audit;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-11 Thread Itagaki Takahiro


Emmanuel Cecchet m...@asterdata.com wrote:

 I have extracted the partitioning option for COPY (removed the error 
 logging part) from the previous patch.

We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY?  Also, it would be nicer
that the router can works not only in COPY but also in INSERT.

BTW, I'm working on meta data of partitioning now. Your partitioning
option in COPY could be replaced with the catalog.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-11 Thread Emmanuel Cecchet


Hi,
I have extracted the partitioning option for COPY (removed the error 
logging part) from the previous patch.



We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY? 
Tom has already explained on the list why using a trigger was a bad idea 
(and I know we can use a trigger since I am the one who wrote it).
If you look at the code you will see that you can do optimizations in 
the COPY code that you cannot do in the trigger.



 Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
  
As 8.5 will at best provide a syntactic hack on top of the existing 
constraint implementation, I think that it will not hurt to have routing 
in COPY since we will not have it anywhere otherwise.

BTW, I'm working on meta data of partitioning now. Your partitioning
option in COPY could be replaced with the catalog.
  
This implementation is only for the current 8.5 and it will not be 
needed anymore once we get a fully functional partitioning in Postgres 
which seems to be for a future version.


Best regards,
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-11 Thread Itagaki Takahiro


Emmanuel Cecchet m...@asterdata.com wrote:

 If you look at the code you will see that you can do optimizations in 
 the COPY code that you cannot do in the trigger.

Since the optimizations is nice, I hope it will work not only in COPY
but also in INSERT. An idea is moving the partitioning cache into
Relation cache, and also moving the routing routines into heap_insert().
My concern is just in the modified position; I think you don't have to
change your logic itself.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-11 Thread Josh Berkus


 We can use an INSERT trigger to route tuples into partitions even now.
 Why do you need an additional router for COPY?  Also, it would be nicer
 that the router can works not only in COPY but also in INSERT.

Yeah, but performance on an insert trigger is impractical for large
volumes of data.

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

56 matches

Mail list logo