date:20141203

Re: [HACKERS] On partitioning

2014-12-03 Thread Jim Nasby


On 12/2/14, 9:43 PM, Amit Langote wrote:

What is an overflow partition and why do we want that?


That would be a default partition. That is, where the tuples that don't belong 
elsewhere (other defined partitions) go. VALUES clause of the definition for 
such a partition would look like:

(a range partition) ... VALUES LESS THAN MAXVALUE
(a list partition) ... VALUES DEFAULT

There has been discussion about whether there shouldn't be such a place for 
tuples to go. That is, it should generate an error if a tuple can't go anywhere 
(or support auto-creating a new one like in interval partitioning?)


If we are going to do this, should the data just go into the parent? That's 
what would happen today.

FWIW, I think an overflow would be useful, but there should be a way to 
(dis|en)able it.


What are you going to do if the partitioning key has two columns of
different data types?


Sorry, this totally eluded me. Perhaps, the 'values' needs some more thought. 
They are one of the most crucial elements of the scheme.

I wonder if your suggestion of pg_node_tree plays well here. This then could be 
a list of CONSTs or some such... And I am thinking it's a concern only for 
range partitions, no? (that is, a multicolumn partition key)

I think partkind switches the interpretation of the field as appropriate. Am I 
missing something? By the way, I had mentioned we could have two values fields 
each for range and list partition kind.


The more SQL way would be records (composite types). That would make catalog 
inspection a LOT easier and presumably make it easier to change the 
partitioning key (I'm assuming ALTER TYPE cascades to stored data). Records are 
stored internally as tuples; not sure if that would be faster than a List of 
Consts or a pg_node_tree. Nodes would theoretically allow using things other 
than Consts, but I suspect that would be a bad idea.

Something else to consider... our user-space support for ranges is now 
rangetypes, so perhaps that's what we should use for range partitioning. The 
up-side (which would be a double-edged sword) is that you could leave holes in 
your partitioning map. Note that in the multi-key case we could still have a 
record of rangetypes.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread David Rowley

On 3 December 2014 at 08:13, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Bottom line, given all the restrictions on whether the optimization can
  happen, I have very little enthusiasm for the whole idea.  I do not think
  the benefit will be big enough to justify the amount of mess this will
  introduce.

 This optimization applies to a tremendous number of real-world cases,
 and we really need to have it.  This was a huge problem for me in my
 previous life as a web developer.  The previous work that we did to
 remove LEFT JOINs was an enormous help, but it's not enough; we need a
 way to remove INNER JOINs as well.

 I thought that David's original approach of doing this in the planner
 was a good one.  That fell down because of the possibility that
 apparently-valid referential integrity constraints might not be valid
 at execution time if the triggers were deferred.  But frankly, that
 seems like an awfully nitpicky thing for this to fall down on.  Lots
 of web applications are going to issue only SELECT statements that run
 as as single-statement transactions, and so that issue, so troubling
 in theory, will never occur in practice.  That doesn't mean that we
 don't need to account for it somehow to make the code safe, but any
 argument that it abridges the use case significantly is, in my
 opinion, not credible.

 Anyway, David was undeterred by the rejection of that initial approach
 and rearranged everything, based on suggestions from Andres and later
 Simon, into the form it's reached now.  Kudos to him for his
 persistance.  But your point that we might have chosen a whole
 different plan if it had known that this join was cheaper is a good
 one.  However, that takes us right back to square one, which is to do
 this at plan time.  I happen to think that's probably better anyway,
 but I fear we're just going around in circles here.  We can either do
 it at plan time and find some way of handling the fact that there
 might be deferred triggers that haven't fired yet; or we can do it at
 execution time and live with the fact that we might have chosen a plan
 that is not optimal, though still better than executing a
 completely-unnecessary join.


 Just so that I don't end up going around in circles again, let me
summarise my understanding of the pros and cons of each of the states that
this patch has been in.

*** Method 1: Removing Inner Joins at planning time:

Pros:

1. Plan generated should be optimal, i.e should generate the same plan for
the query as if the removed relations were never included in the query's
text.
2. On successful join removal planning likely will be faster as there's
less paths to consider having fewer relations and join combinations.

Cons:
1. Assumptions must be made during planning about the trigger queue being
empty or not. During execution, if there are pending fk triggers which need
to be executed then we could produce wrong results.

*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)

Pros:
1. The plan can be executed as normal if there are any foreign key triggers
pending.

Cons:
1. Planner may not generate optimal plan. e.g sort nodes may be useless for
Merge joins
2. Code needed to be added to all join methods to allow skipping, nested
loop joins suffered from a small overhead.
3. Small overhead from visiting extra nodes in the plan which would not be
present if those nodes had been removed.
4. Problems writing regression tests due to having to use EXPLAIN ANALYZE
to try to work out what's going on, and the output containing variable
runtime values.

*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)

Pros:
1. The plan can be executed as normal if there are any foreign key triggers
pending.
2. Does not require extra code in all join types  (see cons #2 above)
3. Does not suffer from extra node visiting overhead (see cons #3 above)

Cons:
1. Executor must modify the plan.
2. Planner may have generated a plan which is not optimal for modification
by the executor (e.g. Sort nodes for merge join, or index scans for
pre-sorted input won't become seqscans which may be more efficient as
ordering may not be required after removing a merge join)

With each of the methods listed above, someone has had a problem with, and
from the feedback given I've made changes based and ended up with the next
revision of the patch.

Tom has now pointed out that he does not like the executor modifying the
plan, which I agree with to an extent as it I really do hate the extra
useless nodes that I'm unable to remove from the plan.

I'd like to propose Method 4 which I believe solves quite a few of the
problems seen in the other method.

Method 4: (Which is I think what Mart had in mind, I've only expanded on it
a bit with thoughts about possible implementations methods)

1.

Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread José Luis Tallón


On 12/02/2014 08:21 PM, Andres Freund wrote:

[snip]

2. Instead of the single amdata field, make it possible for the
implementation to define any number of fields with any datatype in the
tuple. That would make debugging, monitoring etc. easier.

My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence.


Hmm...  having done my fair bit of testing, I can say that this isn't 
actually that bad (though depends heavily on the underlying filesystem 
and workload, of course)
With this approach, I fear extreme I/O contention with an update-heavy 
workload... unless all sequence activity is finally WAL-logged and hence 
writes to the actual files become mostly sequential and asynchronous.


May I possibly suggest a file-per-schema model instead? This approach 
would certainly solve the excessive i-node consumption problem that --I 
guess-- Andres is trying to address here.
Moreover, the one file per schema for sequences solution would fit a 
quite common model of grouping tables (in schemas) for physical 
[tablespace] location purposes

Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...


IMHO, this is jst as wasteful as the current approach (one-page file per 
sequence) in terms of disk usage and complicates the code a bit  but 
I really don't see how we can have more than one sequence state per page 
without severe (page) locking problems.
However, someone with deeper knowledge of page pinning and buffer 
manager internals could certainly devise a better solution...


Just my 2c

Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread Andres Freund

On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
 On 12/02/2014 08:21 PM, Andres Freund wrote:
 [snip]
 2. Instead of the single amdata field, make it possible for the
 implementation to define any number of fields with any datatype in the
 tuple. That would make debugging, monitoring etc. easier.
 My main problem with that approach is that it pretty much nails the door
 shut for moving sequences into a catalog table instead of the current,
 pretty insane, approach of a physical file per sequence.
 
 Hmm...  having done my fair bit of testing, I can say that this isn't
 actually that bad (though depends heavily on the underlying filesystem and
 workload, of course)
 With this approach, I fear extreme I/O contention with an update-heavy
 workload... unless all sequence activity is finally WAL-logged and hence
 writes to the actual files become mostly sequential and asynchronous.

I don't think the WAL logging would need to change much in comparison to
the current solution. We'd just add the page number to the WAL record.

The biggest advantage would be to require fewer heavyweight locks,
because the pg_sequence one could be a single fastpath lock. Currently
we have to take the sequence's relation lock (heavyweight) and then the
the page level lock (lwlock) for every single sequence used.

 May I possibly suggest a file-per-schema model instead? This approach would
 certainly solve the excessive i-node consumption problem that --I guess--
 Andres is trying to address here.

I don't think that really has any advantages.

 Currently, with
 our without seqam, it'd not be all that hard to force it into a catalog,
 taking care to to force each tuple onto a separate page...
 
 IMHO, this is jst as wasteful as the current approach (one-page file per
 sequence) in terms of disk usage and complicates the code a bit  but I
 really don't see how we can have more than one sequence state per page
 without severe (page) locking problems.

The overhead of a file is much more than wasting the remainder of a
page. Alone the requirement of separate fsyncs and everything is pretty
bothersome. The generated IO patterns are also much worse...

 However, someone with deeper knowledge of page pinning and buffer manager
 internals could certainly devise a better solution...

I think there's pretty much no chance of accepting more than one page
per

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

2014-12-03 Thread Ashutosh Bapat

On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
wrote:

 (2014/11/28 18:14), Ashutosh Bapat wrote:

 On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
 Apart from the above, I noticed that the patch doesn't consider to
 call ExplainForeignModify during EXPLAIN for an inherited
 UPDATE/DELETE, as shown below (note that there are no UPDATE remote
 queries displayed):


  So, I'd like to modify explain.c to show those queries like this:


  postgres=# explain verbose update parent set a = a * 2 where a = 5;
   QUERY PLAN
 --__
 --__-
   Update on public.parent  (cost=0.00..280.77 rows=25 width=10)
 -  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
   Output: (parent.a * 2), parent.ctid
   Filter: (parent.a = 5)
 Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1
 -  Foreign Scan on public.ft1  (cost=100.00..140.38 rows=12
 width=10)
   Output: (ft1.a * 2), ft1.ctid
   Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a
 = 5)) FOR UPDATE
 Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1
 -  Foreign Scan on public.ft2  (cost=100.00..140.38 rows=12
 width=10)
   Output: (ft2.a * 2), ft2.ctid
   Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a
 = 5)) FOR UPDATE
 (12 rows)


  Two remote SQL under a single node would be confusing. Also the node
 is labelled as Foreign Scan. It would be confusing to show an UPDATE
 command under this scan node.


 I thought this as an extention of the existing (ie, non-inherited) case
 (see the below example) to the inherited case.

 postgres=# explain verbose update ft1 set a = a * 2 where a = 5;
  QUERY PLAN
 
 -
  Update on public.ft1  (cost=100.00..140.38 rows=12 width=10)
Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1
-  Foreign Scan on public.ft1  (cost=100.00..140.38 rows=12 width=10)
  Output: (a * 2), ctid
  Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5))
 FOR UPDATE
 (5 rows)

 I think we should show update commands somewhere for the inherited case as
 for the non-inherited case.  Alternatives to this are welcome.

This is not exactly extension of non-inheritance case. non-inheritance case
doesn't show two remote SQLs under the same plan node. May be you can
rename the label Remote SQL as Remote UPDATE/INSERT/DELETE (or something to
that effect) for the DML command and the Foreign plan node should be
renamed to Foreign access node or something to indicate that it does both
the scan as well as DML. I am not keen about the actual terminology, but I
think a reader of plan shouldn't get confused.

We can leave this for committer's judgement.


  BTW, I was experimenting with DMLs being executed on multiple FDW server
 under same transaction and found that the transactions may not be atomic
 (and may be inconsistent), if one or more of the server fails to commit
 while rest of them commit the transaction. The reason for this is, we do
 not rollback the already committed changes to the foreign server, if
 one or more of them fail to commit a transaction. With foreign tables
 under inheritance hierarchy a single DML can span across multiple
 servers and the result may not be atomic (and may be inconsistent). So,


 IIUC, even the transactions over the local and the *single* remote server
 are not guaranteed to be executed atomically in the current form.  It is
 possible that the remote transaction succeeds and the local one fails, for
 example, resulting in data inconsistency between the local and the remote.


IIUC, while committing transactions involving a single remote server, the
steps taken are as follows
1. the local changes are brought to PRE-COMMIT stage, which means that the
transaction *will* succeed locally after successful completion of this
phase,
2. COMMIT message is sent to the foreign server
3. If step two succeeds, local changes are committed and successful commit
is conveyed to the client
4. if step two fails, local changes are rolled back and abort status is
conveyed to the client
5. If step 1 itself fails, the remote changes are rolled back.
This is as per one phase commit protocol which guarantees ACID for single
foreign data source. So, the changes involving local and a single foreign
server seem to be atomic and consistent.


  either we have to disable DMLs on an inheritance hierarchy which spans
 multiple servers. OR make sure that such transactions follow 2PC norms.


 -1 for disabling update queries on such an inheritance hierarchy because I
 think we

Re: [HACKERS] inherit support for foreign tables

2014-12-03 Thread Ashutosh Bapat

On Wed, Dec 3, 2014 at 4:05 PM, Ashutosh Bapat 
ashutosh.ba...@enterprisedb.com wrote:



 On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
  wrote:

 (2014/11/28 18:14), Ashutosh Bapat wrote:

 On Thu, Nov 27, 2014 at 3:52 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp
 wrote:
 Apart from the above, I noticed that the patch doesn't consider to
 call ExplainForeignModify during EXPLAIN for an inherited
 UPDATE/DELETE, as shown below (note that there are no UPDATE remote
 queries displayed):


  So, I'd like to modify explain.c to show those queries like this:


  postgres=# explain verbose update parent set a = a * 2 where a = 5;
   QUERY PLAN
 --__
 --__-
   Update on public.parent  (cost=0.00..280.77 rows=25 width=10)
 -  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
   Output: (parent.a * 2), parent.ctid
   Filter: (parent.a = 5)
 Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1
 -  Foreign Scan on public.ft1  (cost=100.00..140.38 rows=12
 width=10)
   Output: (ft1.a * 2), ft1.ctid
   Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a
 = 5)) FOR UPDATE
 Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1
 -  Foreign Scan on public.ft2  (cost=100.00..140.38 rows=12
 width=10)
   Output: (ft2.a * 2), ft2.ctid
   Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a
 = 5)) FOR UPDATE
 (12 rows)


  Two remote SQL under a single node would be confusing. Also the node
 is labelled as Foreign Scan. It would be confusing to show an UPDATE
 command under this scan node.


 I thought this as an extention of the existing (ie, non-inherited) case
 (see the below example) to the inherited case.

 postgres=# explain verbose update ft1 set a = a * 2 where a = 5;
  QUERY PLAN
 
 -
  Update on public.ft1  (cost=100.00..140.38 rows=12 width=10)
Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1
-  Foreign Scan on public.ft1  (cost=100.00..140.38 rows=12 width=10)
  Output: (a * 2), ctid
  Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5))
 FOR UPDATE
 (5 rows)

 I think we should show update commands somewhere for the inherited case
 as for the non-inherited case.  Alternatives to this are welcome.

 This is not exactly extension of non-inheritance case. non-inheritance
 case doesn't show two remote SQLs under the same plan node. May be you can
 rename the label Remote SQL as Remote UPDATE/INSERT/DELETE (or something to
 that effect) for the DML command and the Foreign plan node should be
 renamed to Foreign access node or something to indicate that it does both
 the scan as well as DML. I am not keen about the actual terminology, but I
 think a reader of plan shouldn't get confused.

 We can leave this for committer's judgement.


  BTW, I was experimenting with DMLs being executed on multiple FDW server
 under same transaction and found that the transactions may not be atomic
 (and may be inconsistent), if one or more of the server fails to commit
 while rest of them commit the transaction. The reason for this is, we do
 not rollback the already committed changes to the foreign server, if
 one or more of them fail to commit a transaction. With foreign tables
 under inheritance hierarchy a single DML can span across multiple
 servers and the result may not be atomic (and may be inconsistent). So,


 IIUC, even the transactions over the local and the *single* remote server
 are not guaranteed to be executed atomically in the current form.  It is
 possible that the remote transaction succeeds and the local one fails, for
 example, resulting in data inconsistency between the local and the remote.


 IIUC, while committing transactions involving a single remote server, the
 steps taken are as follows
 1. the local changes are brought to PRE-COMMIT stage, which means that the
 transaction *will* succeed locally after successful completion of this
 phase,
 2. COMMIT message is sent to the foreign server
 3. If step two succeeds, local changes are committed and successful commit
 is conveyed to the client
 4. if step two fails, local changes are rolled back and abort status is
 conveyed to the client
 5. If step 1 itself fails, the remote changes are rolled back.
 This is as per one phase commit protocol which guarantees ACID for single
 foreign data source. So, the changes involving local and a single foreign
 server seem to be atomic and consistent.


  either we have to disable DMLs on an inheritance hierarchy which spans
 multiple servers. OR make sure that such

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Simon Riggs

On 3 December 2014 at 09:29, David Rowley dgrowle...@gmail.com wrote:
 *** Method 3: Marking scans as possibly skippable during planning and
 removing redundant join nodes at executor startup (Simon's method)

 Pros:
 1. The plan can be executed as normal if there are any foreign key triggers
 pending.
 2. Does not require extra code in all join types  (see cons #2 above)
 3. Does not suffer from extra node visiting overhead (see cons #3 above)

 Cons:
 1. Executor must modify the plan.
 2. Planner may have generated a plan which is not optimal for modification
 by the executor (e.g. Sort nodes for merge join, or index scans for
 pre-sorted input won't become seqscans which may be more efficient as
 ordering may not be required after removing a merge join)

 With each of the methods listed above, someone has had a problem with, and
 from the feedback given I've made changes based and ended up with the next
 revision of the patch.

 Tom has now pointed out that he does not like the executor modifying the
 plan, which I agree with to an extent as it I really do hate the extra
 useless nodes that I'm unable to remove from the plan.

I guess we need an Option node. Tom and I discussed that about an aeon ago.

The Option node has a plan for each situation. At execution time, we
make the test specified in the plan and then select the appropriate
subplan.

That way we can see what is happening in the plan and the executor
doesn't need to edit anything.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 5:00 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 3 December 2014 at 09:29, David Rowley dgrowle...@gmail.com wrote:
  *** Method 3: Marking scans as possibly skippable during planning and
  removing redundant join nodes at executor startup (Simon's method)
 
  Pros:
  1. The plan can be executed as normal if there are any foreign key
 triggers
  pending.
  2. Does not require extra code in all join types  (see cons #2 above)
  3. Does not suffer from extra node visiting overhead (see cons #3 above)
 
  Cons:
  1. Executor must modify the plan.
  2. Planner may have generated a plan which is not optimal for
 modification
  by the executor (e.g. Sort nodes for merge join, or index scans for
  pre-sorted input won't become seqscans which may be more efficient as
  ordering may not be required after removing a merge join)
 
  With each of the methods listed above, someone has had a problem with,
 and
  from the feedback given I've made changes based and ended up with the
 next
  revision of the patch.
 
  Tom has now pointed out that he does not like the executor modifying the
  plan, which I agree with to an extent as it I really do hate the extra
  useless nodes that I'm unable to remove from the plan.

 I guess we need an Option node. Tom and I discussed that about an aeon ago.

 The Option node has a plan for each situation. At execution time, we
 make the test specified in the plan and then select the appropriate
 subplan.

 That way we can see what is happening in the plan and the executor
 doesn't need to edit anything.



So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?

So is the idea essentially making the planner return a set of best plans,
one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?

IMO this sounds like punting the planner's task to executor. Not to mention
some overhead for maintaining various plans that might have been discarded
early in the planning and path cost evaluation phase (consider a path with
pathkeys specified, like with ORDINALITY. Can there be edge cases where we
might end up invalidating the entire path if we let executor modify it, or,
maybe just lose the ordinality optimization?)

I agree that executor should not modify plans, but letting executor choose
the plan to execute (out of a set from planner, of course) rather than
planner giving executor a single plan and executor not caring about the
semantics, seems a bit counterintuitive to me. It might be just me though.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] About xmllint checking for the validity of postgres.xml in 9.5

2014-12-03 Thread Alvaro Herrera

Michael Paquier wrote:
 Hi all,
 
 Since commit 5d93ce2d, the output of xmllint is checked by passing
 --valid to it. Isn't that a regression with what we were doing for
 pre-9.4 versions? For example, with 9.4 and older versions it is
 possible to compile man pages even if the xml spec is not entirely
 valid when using docbook 4.2.

I don't think this is a regression.  It just means we're stricter than
before.  Is there a reason behind this tinkering?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix whitespace

2014-12-03 Thread Alvaro Herrera

Peter Eisentraut wrote:
 Fix whitespace

Does this mean you're happy with it now?  Does make check-world pass for
you?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-03 Thread Alvaro Herrera

Amit Langote wrote:

 From: Robert Haas [mailto:robertmh...@gmail.com]

  What is an overflow partition and why do we want that?
 
 That would be a default partition. That is, where the tuples that
 don't belong elsewhere (other defined partitions) go. VALUES clause of
 the definition for such a partition would look like:
 
 (a range partition) ... VALUES LESS THAN MAXVALUE 
 (a list partition) ... VALUES DEFAULT
 
 There has been discussion about whether there shouldn't be such a
 place for tuples to go. That is, it should generate an error if a
 tuple can't go anywhere (or support auto-creating a new one like in
 interval partitioning?)

In my design I initially had overflow partitions too, because I
inherited the idea from Itagaki Takahiro's patch.  Eventually I realized
that it's a useless concept, because you can always have leftmost and
rightmost partitions, which are just regular partitions (except they
don't have a low key, resp. high key).  If you don't define
unbounded partitions at either side, it's fine, you just raise an error
whenever the user tries to insert a value for which there is no
partition.

Not real clear to me how this applies to list partitioning, but I have
the hunch that it'd be better to deal with that without overflow
partitions as well.

BTW I think auto-creating partitions is a bad idea in general, because
you get into lock escalation mess and furthermore you have to waste time
checking for existance beforehand, which lowers performance.  Just have
a very easy command that users can run ahead of time (something like
CREATE PARTITION FOR VALUE now() + '30 days', whatever), and
preferrably one that doesn't fail if the partition already exist; that
way, users can have (for instance) a daily create-30-partitions-ahead
procedure which most days would only create one partition (the one for
30 days in the future) but whenever the odd case happens that the server
is turned off just at that time someday, it creates two -- one belt, 29
suspenders.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Many processes blocked at ProcArrayLock

2014-12-03 Thread Alvaro Herrera

Xiaoyulei wrote:
 I put all the stack in attachment.

Not sure that this is really all that useful.  At least I don't have the
patience to examine all this, and I'm not sure it contains the needed
info in the first place.  If you were to ensure your build is using
-fno-omit-frame-pointer in cflags and then used perf record -a -g
while the test runs and perf report -g once it's finished, you'd get a
useful profile that would show who is acquiring the problematic lock and
why.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] About xmllint checking for the validity of postgres.xml in 9.5

2014-12-03 Thread Michael Paquier

On Wed, Dec 3, 2014 at 9:43 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Michael Paquier wrote:
 Hi all,

 Since commit 5d93ce2d, the output of xmllint is checked by passing
 --valid to it. Isn't that a regression with what we were doing for
 pre-9.4 versions? For example, with 9.4 and older versions it is
 possible to compile man pages even if the xml spec is not entirely
 valid when using docbook 4.2.

 I don't think this is a regression.  It just means we're stricter than
 before.  Is there a reason behind this tinkering?
Just got surprised by how we got strict on master when doing a build
of the docs using docbook 4.2 and some old versions of docbook-dsssl
and docbook-xsl: man pages can still compile even if the spec is not
exactly correct sometimes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread Petr Jelinek


On 02/12/14 20:21, Andres Freund wrote:

On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:

To be clear: I don't think this API is very good for its stated purpose, for
implementing global sequences for use in a cluster. For the reasons I've
mentioned before.  I'd like to see two changes to this proposal:
...
2. Instead of the single amdata field, make it possible for the
implementation to define any number of fields with any datatype in the
tuple. That would make debugging, monitoring etc. easier.


My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence. Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...



I don't know, I think if we decide to change storage format we can do 
serialization/conversion in seqam layer, it does not seem to matter too 
much if the serialization into some opaque type is done in AM itself or 
by the API layer. Or we can have one relation for all sequences in 
single AM, etc. In general I don't think that the custom columns for AM 
approach prohibits future storage changes.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-03 Thread k...@rice.edu

On Wed, Dec 03, 2014 at 10:00:26AM -0300, Alvaro Herrera wrote:
 Amit Langote wrote:
 
  From: Robert Haas [mailto:robertmh...@gmail.com]
 
   What is an overflow partition and why do we want that?
  
  That would be a default partition. That is, where the tuples that
  don't belong elsewhere (other defined partitions) go. VALUES clause of
  the definition for such a partition would look like:
  
  (a range partition) ... VALUES LESS THAN MAXVALUE 
  (a list partition) ... VALUES DEFAULT
  
  There has been discussion about whether there shouldn't be such a
  place for tuples to go. That is, it should generate an error if a
  tuple can't go anywhere (or support auto-creating a new one like in
  interval partitioning?)
 
 In my design I initially had overflow partitions too, because I
 inherited the idea from Itagaki Takahiro's patch.  Eventually I realized
 that it's a useless concept, because you can always have leftmost and
 rightmost partitions, which are just regular partitions (except they
 don't have a low key, resp. high key).  If you don't define
 unbounded partitions at either side, it's fine, you just raise an error
 whenever the user tries to insert a value for which there is no
 partition.
 
Hi,

Maybe I am not clear on the concept of an overflow partition, but I
thought that it functioned to catch any record that did not fit the
partitioning scheme. You end of range with out a low key or high
key would only catch problems in those areas. If you partitioned on
work days of the week, you should not have anything on Saturday/Sunday.
How would that work? You would want to catch anything that was not a
weekday in the overflow.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-03 Thread Robert Haas

On Tue, Dec 2, 2014 at 5:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan p...@heroku.com writes:
 On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas robertmh...@gmail.com wrote:
 Right, and what I'm saying is that maybe the applicability flag
 shouldn't be stored in the SortSupport object, but passed down as an
 argument.

 But then how does that information get to any given sortsupport
 routine? That's the place that really needs to know if abbreviation is
 useful. In general, they're only passed a SortSupport object. Are you
 suggesting revising the signature required of SortSupport routines to
 add that extra flag as an additional argument?

 I think that is what he's suggesting, and I too am wondering why it's
 a good idea.

I find it somewhat confusing that we've got one flag which is only
used from the time the SortSupport object is created until the time
that it's fully initialized, and then a different way of indicating
whether we paid attention to that flag.  I'm not totally sure what the
right solution to that problem is, but the current situation feels
like something of a wart.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread José Luis Tallón


On 12/03/2014 11:24 AM, Andres Freund wrote:

On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:

snip]

I don't think the WAL logging would need to change much in comparison to
the current solution. We'd just add the page number to the WAL record.

The biggest advantage would be to require fewer heavyweight locks,
because the pg_sequence one could be a single fastpath lock. Currently
we have to take the sequence's relation lock (heavyweight) and then the
the page level lock (lwlock) for every single sequence used.


Got it, thank you for the explanation.


May I possibly suggest a file-per-schema model instead? This approach would
certainly solve the excessive i-node consumption problem that --I guess--
Andres is trying to address here.

I don't think that really has any advantages.


Just spreading the I/O load, nothing more, it seems:

Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have 
(1048576/8)=128k sequences per relation segment.
Arguably, not many real use cases will have that many sequences save 
for *massively* multi-tenant databases.


The downside being that all that random I/O --- in general, it can't 
really be sequential unless there are very very few sequences--- can't 
be spread to other spindles. Create a sequence_default_tablespace GUC 
+ ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe?

 (I could take a shot at the patch, if deemed worthwhile)


[snip]

The overhead of a file is much more than wasting the remainder of a
page. Alone the requirement of separate fsyncs and everything is pretty
bothersome. The generated IO patterns are also much worse...


Yes, you are right. I stand corrected.


[snip]
I think there's pretty much no chance of accepting more than one page
per sequence


Definitively.


Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Alvaro Herrera

Pushed with some extra cosmetic tweaks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Petr Jelinek


On 03/12/14 15:54, Alvaro Herrera wrote:

Pushed with some extra cosmetic tweaks.



Cool, thanks!

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Stephen Frost

* Atri Sharma (atri.j...@gmail.com) wrote:
 So the planner keeps all possibility satisfying plans, or it looks at the
 possible conditions (like presence of foreign key for this case, for eg)
 and then lets executor choose between them?

Right, this was one of the thoughts that I had.

 So is the idea essentially making the planner return a set of best plans,
 one for each condition? Are we assured of their optimality at the local
 level i.e. at each possibility?

We *already* have an idea of there being multiple plans (see
plancache.c).

 IMO this sounds like punting the planner's task to executor. Not to mention
 some overhead for maintaining various plans that might have been discarded
 early in the planning and path cost evaluation phase (consider a path with
 pathkeys specified, like with ORDINALITY. Can there be edge cases where we
 might end up invalidating the entire path if we let executor modify it, or,
 maybe just lose the ordinality optimization?)

The executor isn't modifying the plan, it's just picking one based on
what the current situation is (which is information that only the
executor can have, such as if there are pending deferred triggers).

 I agree that executor should not modify plans, but letting executor choose
 the plan to execute (out of a set from planner, of course) rather than
 planner giving executor a single plan and executor not caring about the
 semantics, seems a bit counterintuitive to me. It might be just me though.

I don't think it follows that the executor is now required to care about
semantics.  The planner says use plan A if X is true; use plan B is X
is not true and then the executor does exactly that.  There's nothing
about the plans provided by the planner which are being changed and
there is no re-planning going on (though, as I point out, we actually
*do* re-plan in cases where we think the new plan is much much better
than the prior plan..).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Stephen Frost

* Stephen Frost (sfr...@snowman.net) wrote:
 * Atri Sharma (atri.j...@gmail.com) wrote:
  So the planner keeps all possibility satisfying plans, or it looks at the
  possible conditions (like presence of foreign key for this case, for eg)
  and then lets executor choose between them?
 
 Right, this was one of the thoughts that I had.

Erm, I had also.  Don't mean to imply that it was all my idea or
something silly like that.

Thanks,

Stephen


signature.asc
Description: Digital signature

[HACKERS] Re: [HACKERS] On partitioning

2014-12-03 Thread Миша Тюрин


maybe vertica's approach will be a useful example

http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/AdministratorsGuide/Partitions/PartitioningTables.htm

http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/CREATETABLE.htm
... [ PARTITION BY partition-clause ]
--
Mikhail

Re: [HACKERS] Using pg_rewind for differential backup

2014-12-03 Thread Sameer Kumar

On Wed, Dec 3, 2014 at 1:32 AM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Nov 28, 2014 at 2:49 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  It also would be quite straightforward to write a separate tool to do
 just
  that. Would be better than conflating pg_rewind with this. You could use
  pg_rewind as the basis for it - it's under the same license as
 PostgreSQL.

 Agree. I would want to work on that.


 If we had such a tool in core, would that completely solve the
 differential backup problem, or would more be needed?


In my opinion yes.


 This discussion is not really adapted on hackers as pg_rewind is not
 included in Postgres core code. Please let's discuss your proposal there.


Got it. But given the above, can we keep this discussion thread here?



Best Regards,

*Sameer Kumar | Database Consultant*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: *+65 8110 0350*  T: +65 6438 3504 | www.ashnik.com

*[image: icons]*



[image: Email patch] http://www.ashnik.com/



This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Andres Freund

On 2014-12-03 11:30:32 +, Simon Riggs wrote:
 I guess we need an Option node. Tom and I discussed that about an aeon ago.
 
 The Option node has a plan for each situation. At execution time, we
 make the test specified in the plan and then select the appropriate
 subplan.
 
 That way we can see what is happening in the plan and the executor
 doesn't need to edit anything.

Given David's result where he noticed a performance impact due to the
additional branch in the join code - which I still have a bit of a hard
time to believe - it seems likely that a whole separate node that has to
pass stuff around will be more expensive.

I think the switch would actually have to be done in ExecInitNode() et
al. David, if you essentially take your previous solution and move the
if into ExecInitNode(), does it work well?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Robert Haas

On Wed, Dec 3, 2014 at 4:29 AM, David Rowley dgrowle...@gmail.com wrote:
 *** Method 1: Removing Inner Joins at planning time:

 *** Method 2: Marking scans as possibly skippable during planning, and
 skipping joins at execution (Andres' method)

 *** Method 3: Marking scans as possibly skippable during planning and
 removing redundant join nodes at executor startup (Simon's method)
[]
 a. can we invoke the planner during executor init?

I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also.  I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused.  I am less
clear on the specifics of why the former is unsafe, but what I think
it boils down to is that the plan per se needs to be finalized before
we begin execution; any replanning needs to be handled in the
plancache code.  I am not sure whether it's feasible to do something
about this at the plancache layer; we have an is_oneshot flag there,
so perhaps one-shot plans could simply test whether there are pending
triggers, and non-oneshot plans could forego the optimization until we
come up with something better.

If that doesn't work for some reason, then I think we basically have
to give up on the idea of replanning if the situation becomes unsafe
between planning and execution.  That leaves us with two alternatives.
One is to create a plan incorporating the optimization and another not
incorporating the optimization and decide between them at runtime,
which sounds expensive.  The second is to create a plan that
contemplates performing the join and skip the join if it turns out to
be possible, living with the fact that the resulting plan might be
less than optimal - in other words, option #2.  I am not sure that's
all that bad.  Planning is ALWAYS an exercise in predicting the
future: we use statistics gathered at some point in the past, which
are furthermore imprecise, to predict what will happen if we try to
execute a given plan at some point in the future.  Sometimes we are
wrong, but that doesn't prevent us from trying to our best to predict
the outcome; so here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 8:32 PM, Stephen Frost sfr...@snowman.net wrote:

 * Atri Sharma (atri.j...@gmail.com) wrote:
  So the planner keeps all possibility satisfying plans, or it looks at the
  possible conditions (like presence of foreign key for this case, for eg)
  and then lets executor choose between them?

 Right, this was one of the thoughts that I had.

  So is the idea essentially making the planner return a set of best
 plans,
  one for each condition? Are we assured of their optimality at the local
  level i.e. at each possibility?

 We *already* have an idea of there being multiple plans (see
 plancache.c).


 Thanks for pointing me there.

What I am concerned about is that in this case, the option plans are
competing plans rather than separate plans.

My main concern is that we might be not able to discard plans that we know
that are not optimal early in planning. My understanding is that planner is
aggressive when discarding potential paths. Maintaining them ahead and
storing and returning them might have issues, but that is only my thought.



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Andres Freund

On 2014-12-03 10:51:19 -0500, Robert Haas wrote:
 On Wed, Dec 3, 2014 at 4:29 AM, David Rowley dgrowle...@gmail.com wrote:
  *** Method 1: Removing Inner Joins at planning time:
 
  *** Method 2: Marking scans as possibly skippable during planning, and
  skipping joins at execution (Andres' method)
 
  *** Method 3: Marking scans as possibly skippable during planning and
  removing redundant join nodes at executor startup (Simon's method)
 []
  a. can we invoke the planner during executor init?
 
 I'm pretty sure that we can't safely invoke the planner during
 executor startup, and that doing surgery on the plan tree (option #3)
 is unsafe also.  I'm pretty clear why the latter is unsafe: it might
 be a copy of a data structure that's going to be reused.

We already have a transformation between the plan and execution
tree. I'm right now not seing why transforming the trees in
ExecInitNode() et. al. would be unsafe - it looks fairly simple to
switch between different execution plans there.

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Robert Haas

On Mon, Dec 1, 2014 at 5:34 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 I made two more changes:
 1. introduce newestCommitTs.  Original code was using lastCommitXact to
 check that no future transaction is asked for, but this doesn't really
 work if a long-running transaction is committed, because asking for
 transactions with a higher Xid but which were committed earlier would
 raise an error.

I'm kind of disappointed that, in spite of previous review comments,
this got committed with extensive use of the CommitTs naming.  I think
that's confusing, but it's also something that will be awkward if we
want to add other data, such as the much-discussed commit LSN, to the
facility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Robert Haas

On Wed, Dec 3, 2014 at 10:56 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-12-03 10:51:19 -0500, Robert Haas wrote:
 On Wed, Dec 3, 2014 at 4:29 AM, David Rowley dgrowle...@gmail.com wrote:
  *** Method 1: Removing Inner Joins at planning time:
 
  *** Method 2: Marking scans as possibly skippable during planning, and
  skipping joins at execution (Andres' method)
 
  *** Method 3: Marking scans as possibly skippable during planning and
  removing redundant join nodes at executor startup (Simon's method)
 []
  a. can we invoke the planner during executor init?

 I'm pretty sure that we can't safely invoke the planner during
 executor startup, and that doing surgery on the plan tree (option #3)
 is unsafe also.  I'm pretty clear why the latter is unsafe: it might
 be a copy of a data structure that's going to be reused.

 We already have a transformation between the plan and execution
 tree.

We do?

I think what we have is a plan tree, which is potentially stored in a
plan cache someplace and thus must be read-only, and a planstate tree,
which contains the stuff that is for this specific execution.  There's
probably some freedom to do exciting things in the planstate nodes,
but I don't think you can tinker with the plan itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-03 Thread Stephen Frost

* k...@rice.edu (k...@rice.edu) wrote:
 On Wed, Dec 03, 2014 at 10:00:26AM -0300, Alvaro Herrera wrote:
  In my design I initially had overflow partitions too, because I
  inherited the idea from Itagaki Takahiro's patch.  Eventually I realized
  that it's a useless concept, because you can always have leftmost and
  rightmost partitions, which are just regular partitions (except they
  don't have a low key, resp. high key).  If you don't define
  unbounded partitions at either side, it's fine, you just raise an error
  whenever the user tries to insert a value for which there is no
  partition.
 
 Maybe I am not clear on the concept of an overflow partition, but I
 thought that it functioned to catch any record that did not fit the
 partitioning scheme. You end of range with out a low key or high
 key would only catch problems in those areas. If you partitioned on
 work days of the week, you should not have anything on Saturday/Sunday.
 How would that work? You would want to catch anything that was not a
 weekday in the overflow.

Yeah, I'm not a big fan of just dropping data on the floor either.
That's the perview of CHECK constraints and shouldn't be a factor of the
partitioning system, imv.

There is a flip side to this though, which is that users who have those
CHECK constraints probably don't want to be bothered by having to have
an overflow partition, which leads into the question of, if we have them
as a supported capability, what would the default be?  My gut feeling is
that the default should be 'no overflow', in which case I'm not sure
it's useful as it won't be there for these cases where strange data
shows up unexpectedly and the system wants to put it somewhere.

Supporting overflow partitions would also mean supporting the ability to
move data out of those partitions and into 'real' partitions which the
user creates to deal with the odd/new data.  That doesn't strike me as
being too much fun for us to have to figure out, though if we do, we
might be able to do a better job (with less blocking happening, etc)
than the user could.

Lastly, my inclination is that it's a capability which could be added
later if there is demand for it, so perhaps the best answer is to not
include it now (feature creep and all that).

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Andres Freund

On 2014-12-03 11:11:49 -0500, Robert Haas wrote:
 On Wed, Dec 3, 2014 at 10:56 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-12-03 10:51:19 -0500, Robert Haas wrote:
  On Wed, Dec 3, 2014 at 4:29 AM, David Rowley dgrowle...@gmail.com wrote:
   *** Method 1: Removing Inner Joins at planning time:
  
   *** Method 2: Marking scans as possibly skippable during planning, and
   skipping joins at execution (Andres' method)
  
   *** Method 3: Marking scans as possibly skippable during planning and
   removing redundant join nodes at executor startup (Simon's method)
  []
   a. can we invoke the planner during executor init?
 
  I'm pretty sure that we can't safely invoke the planner during
  executor startup, and that doing surgery on the plan tree (option #3)
  is unsafe also.  I'm pretty clear why the latter is unsafe: it might
  be a copy of a data structure that's going to be reused.
 
  We already have a transformation between the plan and execution
  tree.
 
 We do?
 
 I think what we have is a plan tree, which is potentially stored in a
 plan cache someplace and thus must be read-only, and a planstate tree,
 which contains the stuff that is for this specific execution.  There's
 probably some freedom to do exciting things in the planstate nodes,
 but I don't think you can tinker with the plan itself.

Well, the planstate tree is what determines the execution, right? I
don't see what would stop us from doing something like replacing:
PlanState *
ExecInitNode(Plan *node, EState *estate, int eflags)
{
...
case T_NestLoop:
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);
by
case T_NestLoop:
if (JoinCanBeSkipped(node))
result = NonSkippedJoinNode(node);
else
result = (PlanState *) ExecInitNestLoop((NestLoop 
*) node,
estate, eflags);

Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic
from David's early patch where he put the logic entirely into the actual
execution phase.

We'd probably want to move the join nodes into a separate ExecInitJoin()
function and do the JoinCanBeSkipped() and NonSkippedJoin() node in the
generic code.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Stephen Frost

* Atri Sharma (atri.j...@gmail.com) wrote:
 What I am concerned about is that in this case, the option plans are
 competing plans rather than separate plans.

Not sure I follow this thought entirely..  The plans in the plancache
are competeing, but separate, plans.

 My main concern is that we might be not able to discard plans that we know
 that are not optimal early in planning. My understanding is that planner is
 aggressive when discarding potential paths. Maintaining them ahead and
 storing and returning them might have issues, but that is only my thought.

The planner is aggressive at discarding potential paths, but this is all
a consideration for how expensive this particular optimization is, not
an issue with the approach itself.  We certainly don't want an
optimization that doubles the time for 100% of queries planned but only
saves time in 5% of the cases, but if we can spend an extra 5% of the
time required for planning in the 1% of cases where the optimization
could possibly happen to save a huge amount of time for those queries,
then it's something to consider.

We would definitely want to spend as little time as possible checking
for this optimization in cases where it isn't possible to use the
optimization.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Robert Haas

On Wed, Dec 3, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 Well, the planstate tree is what determines the execution, right? I
 don't see what would stop us from doing something like replacing:
 PlanState *
 ExecInitNode(Plan *node, EState *estate, int eflags)
 {
 ...
 case T_NestLoop:
 result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
 estate, eflags);
 by
 case T_NestLoop:
 if (JoinCanBeSkipped(node))
 result = NonSkippedJoinNode(node);
 else
 result = (PlanState *) ExecInitNestLoop((NestLoop 
 *) node,
 estate, eflags);

 Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic
 from David's early patch where he put the logic entirely into the actual
 execution phase.

Yeah, maybe.  I think there's sort of a coding principle that the plan
and planstate trees should match up one-to-one, but it's possible that
nothing breaks if they don't, or that I've misunderstood the coding
rule in the first instance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Dec 3, 2014 at 11:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 Well, the planstate tree is what determines the execution, right? I
 don't see what would stop us from doing something like replacing:
 PlanState *
 ExecInitNode(Plan *node, EState *estate, int eflags)
 {
 ...
 case T_NestLoop:
 result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
 estate, eflags);
 by
 case T_NestLoop:
 if (JoinCanBeSkipped(node))
 result = NonSkippedJoinNode(node);
 else
 result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
 estate, eflags);
 
 Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic
 from David's early patch where he put the logic entirely into the actual
 execution phase.

 Yeah, maybe.  I think there's sort of a coding principle that the plan
 and planstate trees should match up one-to-one, but it's possible that
 nothing breaks if they don't, or that I've misunderstood the coding
 rule in the first instance.

Far better would be what I mentioned upthread: an explicit switch node
in the plan tree, analogous to the existing AlternativeSubPlan structure.

ChooseJoinSubPlan
  - plan tree requiring all tables to be joined
  - plan tree not requiring all tables to be joined

This allows sensible display by EXPLAIN and avoids the need for the core
executor code to be dirtied with implementation of the precise switch
rule: all that logic goes into the ChooseJoinSubPlan plan node code.

I would envision the planner starting out generating the first subplan
(without the optimization), but as it goes along, noting whether there
are any opportunities for join removal.  At the end, if it found that
there were such opportunities, re-plan assuming that removal is possible.
Then stick a switch node on top.

This would give optimal plans for both cases, and it would avoid the need
for lots of extra planner cycles when the optimization can't be applied
... except for one small detail, which is that the planner has a bad habit
of scribbling on its own input.  I'm not sure how much cleanup work would
be needed before that re-plan operation could happen as easily as is
suggested above.  But in principle this could be made to work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Robert Haas

On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would envision the planner starting out generating the first subplan
 (without the optimization), but as it goes along, noting whether there
 are any opportunities for join removal.  At the end, if it found that
 there were such opportunities, re-plan assuming that removal is possible.
 Then stick a switch node on top.

 This would give optimal plans for both cases, and it would avoid the need
 for lots of extra planner cycles when the optimization can't be applied
 ... except for one small detail, which is that the planner has a bad habit
 of scribbling on its own input.  I'm not sure how much cleanup work would
 be needed before that re-plan operation could happen as easily as is
 suggested above.  But in principle this could be made to work.

Doesn't this double the planning overhead, in most cases for no
benefit?  The alternative plan used only when there are deferred
triggers is rarely going to get used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would envision the planner starting out generating the first subplan
 (without the optimization), but as it goes along, noting whether there
 are any opportunities for join removal.  At the end, if it found that
 there were such opportunities, re-plan assuming that removal is possible.
 Then stick a switch node on top.
 
 This would give optimal plans for both cases, and it would avoid the need
 for lots of extra planner cycles when the optimization can't be applied
 ... except for one small detail, which is that the planner has a bad habit
 of scribbling on its own input.  I'm not sure how much cleanup work would
 be needed before that re-plan operation could happen as easily as is
 suggested above.  But in principle this could be made to work.

 Doesn't this double the planning overhead, in most cases for no
 benefit?  The alternative plan used only when there are deferred
 triggers is rarely going to get used.

Personally, I remain of the opinion that this optimization will apply in
only a tiny fraction of real-world cases, so I'm mostly concerned about
not blowing out planning time when the optimization doesn't apply.
However, even granting that that is a concern, so what?  You *have* to
do the planning twice, or you're going to be generating a crap plan for
one case or the other.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I would envision the planner starting out generating the first subplan
  (without the optimization), but as it goes along, noting whether there
  are any opportunities for join removal.  At the end, if it found that
  there were such opportunities, re-plan assuming that removal is possible.
  Then stick a switch node on top.
  
  This would give optimal plans for both cases, and it would avoid the need
  for lots of extra planner cycles when the optimization can't be applied
  ... except for one small detail, which is that the planner has a bad habit
  of scribbling on its own input.  I'm not sure how much cleanup work would
  be needed before that re-plan operation could happen as easily as is
  suggested above.  But in principle this could be made to work.
 
  Doesn't this double the planning overhead, in most cases for no
  benefit?  The alternative plan used only when there are deferred
  triggers is rarely going to get used.
 
 Personally, I remain of the opinion that this optimization will apply in
 only a tiny fraction of real-world cases, so I'm mostly concerned about
 not blowing out planning time when the optimization doesn't apply.

This was my thought also- most of the time we won't be able to apply the
optimization and we'll know that pretty early on and can skip the double
planning.  What makes this worthwhile is that there are cases where
it'll be applied regularly due to certain tools/technologies being used
and the extra planning will be more than made up for by the reduction in
execution time.

 However, even granting that that is a concern, so what?  You *have* to
 do the planning twice, or you're going to be generating a crap plan for
 one case or the other.

Yeah, I don't see a way around that..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 However, even granting that that is a concern, so what?  You *have* to
 do the planning twice, or you're going to be generating a crap plan for
 one case or the other.

 Yeah, I don't see a way around that..

Also, it occurs to me that it's only necessary to repeat the join search
part of the process, which means that in principle the mechanisms already
exist for that; see GEQO.  This means that for small join problems, the
total planning time would much less than double anyway.  For large
problems, where the join search is the bulk of the time, we could hope
that removal of unnecessary joins would reduce the join search runtime
enough that the second search would be pretty negligible next to the
first (which is not optional).  So I think it'll double the runtime
is an unfounded objection, or at least there's good reason to hope it's
unfounded.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Stephen Frost sfr...@snowman.net writes:
  * Tom Lane (t...@sss.pgh.pa.us) wrote:
  However, even granting that that is a concern, so what?  You *have* to
  do the planning twice, or you're going to be generating a crap plan for
  one case or the other.

  Yeah, I don't see a way around that..

 Also, it occurs to me that it's only necessary to repeat the join search
 part of the process, which means that in principle the mechanisms already
 exist for that; see GEQO.  This means that for small join problems, the
 total planning time would much less than double anyway.  For large
 problems, where the join search is the bulk of the time, we could hope
 that removal of unnecessary joins would reduce the join search runtime
 enough that the second search would be pretty negligible next to the
 first (which is not optional).  So I think it'll double the runtime
 is an unfounded objection, or at least there's good reason to hope it's
 unfounded.


Is it possible to only replan part of the plan in case of this
optimization? I think that we might need to only replan parts of the
original plan (as you mentioned, join search and above). So we could reuse
the original plan in part and not do a lot of replanning (an obvious case
is scan strategy, which we can assume will not change for the two plans).

I wonder if we could have a rule based system for replacement of some plan
nodes with other type of nodes. As we discover more cases like this, we
could add more rules. Wild thought though.


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Robert Haas

On Wed, Dec 3, 2014 at 12:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 However, even granting that that is a concern, so what?  You *have* to
 do the planning twice, or you're going to be generating a crap plan for
 one case or the other.

 Yeah, I don't see a way around that..

 Also, it occurs to me that it's only necessary to repeat the join search
 part of the process, which means that in principle the mechanisms already
 exist for that; see GEQO.  This means that for small join problems, the
 total planning time would much less than double anyway.  For large
 problems, where the join search is the bulk of the time, we could hope
 that removal of unnecessary joins would reduce the join search runtime
 enough that the second search would be pretty negligible next to the
 first (which is not optional).  So I think it'll double the runtime
 is an unfounded objection, or at least there's good reason to hope it's
 unfounded.

OK.  One other point of hope is that, in my experience, the queries
where you need join removal are the ones where there are lots of
tables being joined and there are often quite a few of those joins
that can be removed, not just one.  So the extra planner overhead
might pay off anyway.

(It still seems a shame to have to plan for the not-removing-the-joins
case since it will so rarely happen.  But maybe I should take what I
can get.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] changing primary key col(s) with minimal impact

2014-12-03 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have a need to provide a way to change a table's primary key
columns, in the possible presence of foreign keys, and with concurrent
use (as much as possible).

The best way I have come up with is roughly:
a. create the to-be-new-pk index concurrently
b. demote the old pk to be a unique constraint
c. alter table add constraint new pk using existing index

Step b. involves (in one transaction):
- --
* update pg_class row for the table
  relhaspkey false
* update pg_constraint row for the original pk
  contype = 'u'
  conname = 'some_new_name'
* update pg_index row for the original pk index
  indisprimary = false
* alter index original pk rename to some_new_name
- --
I have tested this (minimally) and as far as I can tell it works.

Questions:
1) any major holes in this approach?
2) any better ideas?
3) should we consider an ALTER TABLE ALTER CONSTRAINT command
   (or some such) to allow demotion of a PRIMARY KEY to a UNIQUE
   constraint?

Thanks for any thoughts/comments.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJUf0t5AAoJEDfy90M199hlUT8QAJcobqTmT0fKfBaXbIjXzJLZ
++i0JC/RwDOfnNkUjrEkv+WlQlhwHViKd04JCLWKes8EE81Vv8qUTlQDzphxeHCS
OiPMKSePHzCSphYRwzGp1gurvfcw/Yv/5wQyCcUWBqpa4DYZzUFKkeoH3d2Zzd6z
t6iR8cM21XArFnGMQN1gmAA0swStrm6CdzxydEIY3EoY8tgIGQBjDAEzs/v4bYPG
kEECRYtNu3tiy+ejslB+WWPXd0y+Ty8idRpmgMPMLqIi0RzA8s/p4xhm1INUyr31
fqlN3vaFaPEgXdsuDEqmXt7H0QWyCIwEYelPTh3Zi8RmOOzzeZKNdjaQc6mJHrO7
JFVFMVaw/CRuogui9Q/DMalpbNWjbqoVV4JtPssGK1BBvmpJrEd7QLzFLmz3QSzn
Rdb/UN8RWWAfL50MAztlSpwX/4vPbolvC7yMjg1lGvfm8g0B3qz+iHW3V0G1qX8Y
mxQD3LvnMgUN/m2EUiUr+L+Eh3fEV0M3SbQCii6b+apLjVGe25pHE8zx4QufrJKk
ftX70nRJPaoW/+LBj69n7r7wde3CSpI2/6qbjIXYKu/gutWQEPgxbbd4fxTGICz/
P16y2V4mwKT66Ma3vjz1gwCGcHSShLsJx4PAGFMpR3SE63kDvGE0Zm/RG5u9+z/r
MIkw1EL3cFpTlu+7Jtwj
=pyzW
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Tom Lane

Atri Sharma atri.j...@gmail.com writes:
 Is it possible to only replan part of the plan in case of this
 optimization? I think that we might need to only replan parts of the
 original plan (as you mentioned, join search and above). So we could reuse
 the original plan in part and not do a lot of replanning (an obvious case
 is scan strategy, which we can assume will not change for the two plans).

I think you assume wrong; or at least, I certainly would not wish to
hard-wire any such assumption.  Skipping some joins could change the
shape of the join tree *completely*, because the cost estimates will
change so much.  And that could in turn lead to making different choices
of scan methods, eg, we might or might not care about sort order of
a scan result if we change join methods.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  Is it possible to only replan part of the plan in case of this
  optimization? I think that we might need to only replan parts of the
  original plan (as you mentioned, join search and above). So we could
 reuse
  the original plan in part and not do a lot of replanning (an obvious case
  is scan strategy, which we can assume will not change for the two plans).

 I think you assume wrong; or at least, I certainly would not wish to
 hard-wire any such assumption.  Skipping some joins could change the
 shape of the join tree *completely*, because the cost estimates will
 change so much.  And that could in turn lead to making different choices
 of scan methods, eg, we might or might not care about sort order of
 a scan result if we change join methods.

 regards, tom lane


Agreed, but in some cases, we could possibly make some assumptions (if
there is no index, if a large fraction of table will be returned in scan,
FunctionScan).

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Stephen Frost

* Atri Sharma (atri.j...@gmail.com) wrote:
 Agreed, but in some cases, we could possibly make some assumptions (if
 there is no index, if a large fraction of table will be returned in scan,
 FunctionScan).

All neat ideas but how about we get something which works in the way
being asked for before we start trying to optimize it..?  Maybe I'm
missing something, but getting all of this infrastructure into place and
making sure things aren't done to the plan tree which shouldn't be (or
done to all of them if necessary..) is enough that we should get that
bit done first and then worry if there are ways we can further improve
things..

THanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:27 PM, Stephen Frost sfr...@snowman.net wrote:

 * Atri Sharma (atri.j...@gmail.com) wrote:
  Agreed, but in some cases, we could possibly make some assumptions (if
  there is no index, if a large fraction of table will be returned in scan,
  FunctionScan).

 All neat ideas but how about we get something which works in the way
 being asked for before we start trying to optimize it..?  Maybe I'm
 missing something, but getting all of this infrastructure into place and
 making sure things aren't done to the plan tree which shouldn't be (or
 done to all of them if necessary..) is enough that we should get that
 bit done first and then worry if there are ways we can further improve
 things..



Right,sorry for digressing.

I think we are in agreement as to what needs to be done (start with a plan,
note ideas and replan if necessary). The idea of executor modifying the
plan (or personally, even choosing the plan) seems counterintuitive.

Does it also make sense to recalculate the costs from scratch for the
replan? It might be, I am just asking.

Regards,

Atri

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Atri Sharma (atri.j...@gmail.com) wrote:
 Agreed, but in some cases, we could possibly make some assumptions (if
 there is no index, if a large fraction of table will be returned in scan,
 FunctionScan).

 All neat ideas but how about we get something which works in the way
 being asked for before we start trying to optimize it..?  Maybe I'm
 missing something, but getting all of this infrastructure into place and
 making sure things aren't done to the plan tree which shouldn't be (or
 done to all of them if necessary..) is enough that we should get that
 bit done first and then worry if there are ways we can further improve
 things..

Yeah; moreover, there's no evidence that hard-wiring such assumptions
would save anything.  In the example of a FunctionScan, guess what:
there's only one Path for that relation anyway.

I think the right approach for now is to emulate the GEQO precedent as
closely as possible.  Build all the single-relation Paths the same as
now, then do a join search over all the relations, then (if we've noticed
that some joins are potentially removable) do another join search over
just the nonremovable relations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Alvaro Herrera

Robert Haas wrote:
 On Mon, Dec 1, 2014 at 5:34 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  I made two more changes:
  1. introduce newestCommitTs.  Original code was using lastCommitXact to
  check that no future transaction is asked for, but this doesn't really
  work if a long-running transaction is committed, because asking for
  transactions with a higher Xid but which were committed earlier would
  raise an error.
 
 I'm kind of disappointed that, in spite of previous review comments,
 this got committed with extensive use of the CommitTs naming.  I think
 that's confusing, but it's also something that will be awkward if we
 want to add other data, such as the much-discussed commit LSN, to the
 facility.

I never saw a comment that CommitTs was an unwanted name.  There were
some that said that committs wasn't liked because it looked like a
misspelling, so we added an underscore -- stuff in lower case is
commit_ts everywhere.  Stuff in camel case didn't get the underscore
because it didn't seem necessary.  But other than that issue, the name
wasn't questioned, as far as I'm aware.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Tom Lane

Atri Sharma atri.j...@gmail.com writes:
 Does it also make sense to recalculate the costs from scratch for the
 replan? It might be, I am just asking.

The join costs would be recalculated from scratch, yes.  The
single-relation Paths would already exist and their costs would not
change.  Again, if you've not studied how GEQO works, you probably
should go do that before thinking more about how this would work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Stephen Frost sfr...@snowman.net writes:
  * Atri Sharma (atri.j...@gmail.com) wrote:
  Agreed, but in some cases, we could possibly make some assumptions (if
  there is no index, if a large fraction of table will be returned in
 scan,
  FunctionScan).

  All neat ideas but how about we get something which works in the way
  being asked for before we start trying to optimize it..?  Maybe I'm
  missing something, but getting all of this infrastructure into place and
  making sure things aren't done to the plan tree which shouldn't be (or
  done to all of them if necessary..) is enough that we should get that
  bit done first and then worry if there are ways we can further improve
  things..

 Yeah; moreover, there's no evidence that hard-wiring such assumptions
 would save anything.  In the example of a FunctionScan, guess what:
 there's only one Path for that relation anyway.

 That is precisely what I meant :) I guess I was being too over cautious
and even trying to save the time spent in evaluating whatever paths we have
and building new FunctionScan paths...


 I think the right approach for now is to emulate the GEQO precedent as
 closely as possible.  Build all the single-relation Paths the same as
 now, then do a join search over all the relations, then (if we've noticed
 that some joins are potentially removable) do another join search over
 just the nonremovable relations.


How about using geqo more liberally when replanning (decrease the number of
relations in join before geqo is hit?)



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Tom Lane

Atri Sharma atri.j...@gmail.com writes:
 On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think the right approach for now is to emulate the GEQO precedent as
 closely as possible.  Build all the single-relation Paths the same as
 now, then do a join search over all the relations, then (if we've noticed
 that some joins are potentially removable) do another join search over
 just the nonremovable relations.

 How about using geqo more liberally when replanning (decrease the number of
 relations in join before geqo is hit?)

This is going to be quite difficult enough without overcomplicating it.
Or as a wise man once said, premature optimization is the root of all
evil.  Get it working in the basic way and then see if improvement is
necessary at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I think the right approach for now is to emulate the GEQO precedent as
  closely as possible.  Build all the single-relation Paths the same as
  now, then do a join search over all the relations, then (if we've
 noticed
  that some joins are potentially removable) do another join search over
  just the nonremovable relations.

  How about using geqo more liberally when replanning (decrease the number
 of
  relations in join before geqo is hit?)

 This is going to be quite difficult enough without overcomplicating it.
 Or as a wise man once said, premature optimization is the root of all
 evil.  Get it working in the basic way and then see if improvement is
 necessary at all.


Sure, I can take a crack at it since I am working on a patch that does
require this alternative path approach. Let me try something and report my
experimental results.

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Heikki Linnakangas


On 12/03/2014 07:41 PM, Robert Haas wrote:

On Wed, Dec 3, 2014 at 12:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Stephen Frost sfr...@snowman.net writes:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

However, even granting that that is a concern, so what?  You *have* to
do the planning twice, or you're going to be generating a crap plan for
one case or the other.



Yeah, I don't see a way around that..


Also, it occurs to me that it's only necessary to repeat the join search
part of the process, which means that in principle the mechanisms already
exist for that; see GEQO.  This means that for small join problems, the
total planning time would much less than double anyway.  For large
problems, where the join search is the bulk of the time, we could hope
that removal of unnecessary joins would reduce the join search runtime
enough that the second search would be pretty negligible next to the
first (which is not optional).  So I think it'll double the runtime
is an unfounded objection, or at least there's good reason to hope it's
unfounded.


OK.  One other point of hope is that, in my experience, the queries
where you need join removal are the ones where there are lots of
tables being joined and there are often quite a few of those joins
that can be removed, not just one.  So the extra planner overhead
might pay off anyway.


Do you need to plan for every combination, where some joins are removed 
and some are not?


I hope the same mechanism could be used to prepare a plan for a query 
with parameters, where the parameters might or might not allow a partial 
index to be used. We have some smarts nowadays to use custom plans, but 
this could be better.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 Do you need to plan for every combination, where some joins are removed 
 and some are not?

I would vote for just having two plans and one switch node.  To exploit
any finer grain, we'd have to have infrastructure that would let us figure
out *which* constraints pending triggers might indicate transient
invalidity of, and that doesn't seem likely to be worth the trouble.

 I hope the same mechanism could be used to prepare a plan for a query 
 with parameters, where the parameters might or might not allow a partial 
 index to be used. We have some smarts nowadays to use custom plans, but 
 this could be better.

Interesting thought, but that would be a totally different switch
condition ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread k...@rice.edu

On Wed, Dec 03, 2014 at 02:08:27PM -0500, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  Do you need to plan for every combination, where some joins are removed 
  and some are not?
 
 I would vote for just having two plans and one switch node.  To exploit
 any finer grain, we'd have to have infrastructure that would let us figure
 out *which* constraints pending triggers might indicate transient
 invalidity of, and that doesn't seem likely to be worth the trouble.
 
  I hope the same mechanism could be used to prepare a plan for a query 
  with parameters, where the parameters might or might not allow a partial 
  index to be used. We have some smarts nowadays to use custom plans, but 
  this could be better.
 
 Interesting thought, but that would be a totally different switch
 condition ...
 
   regards, tom lane
 

Or between a node with a low rows count and a high rows count for those
pesky mis-estimation queries.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Claudio Freire

On Wed, Dec 3, 2014 at 2:09 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would envision the planner starting out generating the first subplan
 (without the optimization), but as it goes along, noting whether there
 are any opportunities for join removal.  At the end, if it found that
 there were such opportunities, re-plan assuming that removal is possible.
 Then stick a switch node on top.

 This would give optimal plans for both cases, and it would avoid the need
 for lots of extra planner cycles when the optimization can't be applied
 ... except for one small detail, which is that the planner has a bad habit
 of scribbling on its own input.  I'm not sure how much cleanup work would
 be needed before that re-plan operation could happen as easily as is
 suggested above.  But in principle this could be made to work.

 Doesn't this double the planning overhead, in most cases for no
 benefit?  The alternative plan used only when there are deferred
 triggers is rarely going to get used.

It shouldn't. It will only double (at worst) planning overhead for the
queries that do have removable joins, which would be the ones
benefiting from the extra work.

Whether that extra work pays off is the question to ask here. Perhaps
whether or not to remove the joins could be a decision made accounting
for overall plan cost and fraction of joins removed, as to avoid the
extra planning if execution will be fast anyway.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Alvaro Herrera

Alvaro Herrera wrote:
 Robert Haas wrote:

  I'm kind of disappointed that, in spite of previous review comments,
  this got committed with extensive use of the CommitTs naming.  I think
  that's confusing, but it's also something that will be awkward if we
  want to add other data, such as the much-discussed commit LSN, to the
  facility.
 
 I never saw a comment that CommitTs was an unwanted name.  There were
 some that said that committs wasn't liked because it looked like a
 misspelling, so we added an underscore -- stuff in lower case is
 commit_ts everywhere.  Stuff in camel case didn't get the underscore
 because it didn't seem necessary.  But other than that issue, the name
 wasn't questioned, as far as I'm aware.

I found one email where you said you didn't like committs and preferred
commit_timestamp instead.  I don't see how making that change would have
made you happy wrt the concern you just expressed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-03 Thread Peter Geoghegan

On Tue, Dec 2, 2014 at 1:21 PM, Peter Geoghegan p...@heroku.com wrote:
 Incidentally, I think that an under-appreciated possible source of
 regressions here is that attributes abbreviated have a strong
 physical/logical correlation. I could see a small regression for one
 such case even though my cost model indicated that it should be very
 profitable.

This was the column in question:

postgres=# select * from pg_stats where tablename = 'ohio_voters' and
attname = 'mailing_address1' ;
-[ RECORD 1 ]-
schemaname | public
tablename  | ohio_voters
attname| mailing_address1
inherited  | f
null_frac  | 0
avg_width  | 5
n_distinct | 789
most_common_vals   | {}
most_common_freqs  | {0.969267}
histogram_bounds   |  SNIP ***
correlation| 0.944785
 SNIP ***

This n_distinct is wrong, though. In fact, the number of distinct
columns is 25,946, while the number of distinct abbreviated keys is
13,691. So correlation was not the dominant factor here (although it
was probably still a factor) - rather, the dominant factor was that
the vast majority of comparisons would get away with an opportunistic
memcmp() == 0 anyway (although not with Postgres 9.4), and so my
baseline is very fast for this case.

This would not have come up had the value been represented as
NULL (as it clearly should have been), since that would not undergo
strxfrm() transformation/abbreviation in the first place. Even still,
highly skewed attributes exist in the wild, and deserve our
consideration - we do not model the distribution of values within the
set.

I believe that these cases are rare enough, and (thanks to the already
committed parts of this work) fast enough to probably not be worried
about; maybe a more complex cost model could do better, but I'm
inclined to think that it's not worth it. We'd end up slightly
improving this case at bigger cost to other, much more common cases.
Besides, equality-resolved comparisons are not necessarily much
cheaper for datatypes other than Postgres 9.5 text (in a world where
there is a variety of datatypes accelerated by abbreviation), which
discourages a general solution.

A custom format dump of this data (publicly available Ohio State voter
records) is available from:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/ohio_voters.custom.dump

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Robert Haas

On Wed, Dec 3, 2014 at 2:36 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Alvaro Herrera wrote:
 Robert Haas wrote:
  I'm kind of disappointed that, in spite of previous review comments,
  this got committed with extensive use of the CommitTs naming.  I think
  that's confusing, but it's also something that will be awkward if we
  want to add other data, such as the much-discussed commit LSN, to the
  facility.

 I never saw a comment that CommitTs was an unwanted name.  There were
 some that said that committs wasn't liked because it looked like a
 misspelling, so we added an underscore -- stuff in lower case is
 commit_ts everywhere.  Stuff in camel case didn't get the underscore
 because it didn't seem necessary.  But other than that issue, the name
 wasn't questioned, as far as I'm aware.

 I found one email where you said you didn't like committs and preferred
 commit_timestamp instead.  I don't see how making that change would have
 made you happy wrt the concern you just expressed.

Fair point.

I'm still not sure we got this one right, but I don't know that I want
to spend more time wrangling about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] postgres_fdw does not see enums

2014-12-03 Thread David Fetter

Folks,

I've been trying out 9.5-to-be's PostgreSQL FDW, and I noticed that it
doesn't seem to handle enum types.  Would this be a trivial fix?

Repro:
Create an enum type in a 9.3 instance.
Use the type in a table.
Create a foreign server pointing to this instance.
Execute IMPORT FOREIGN SCHEMA.

Pseudonymized output:

db_compare=# IMPORT FOREIGN SCHEMA public FROM SERVER blah INTO blah;
ERROR:  type public.product_type does not exist
LINE 4:   product_type public.product_type OPTIONS (column_name 'pro...
   ^
QUERY:  CREATE FOREIGN TABLE foo (
  id integer OPTIONS (column_name 'id') NOT NULL,
  event_id integer OPTIONS (column_name 'event_id') NOT NULL,
  product_type public.product_type OPTIONS (column_name 'product_type')
) SERVER blah
OPTIONS (schema_name 'public', table_name 'foo');
CONTEXT:  importing foreign table foo

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw does not see enums

2014-12-03 Thread Tom Lane

David Fetter da...@fetter.org writes:
 I've been trying out 9.5-to-be's PostgreSQL FDW, and I noticed that it
 doesn't seem to handle enum types.  Would this be a trivial fix?

No.  How would you know whether the remote side even has the enum,
let alone whether it has an identical set of members?  I don't see
that enums are noticeably easier than the general case of non-built-in
types ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw does not see enums

2014-12-03 Thread David Fetter

On Wed, Dec 03, 2014 at 05:38:47PM -0500, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  I've been trying out 9.5-to-be's PostgreSQL FDW, and I noticed
  that it doesn't seem to handle enum types.  Would this be a
  trivial fix?
 
 No.  How would you know whether the remote side even has the enum,
 let alone whether it has an identical set of members?  I don't see
 that enums are noticeably easier than the general case of
 non-built-in types ...

I must be missing something important.  When querying the remote side,
*and it's PostgreSQL*, we have catalog access that could be used to
reconstruct the enums.  Or are you thinking about the case where the
enum changes from one call to the next?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw does not see enums

2014-12-03 Thread Tom Lane

David Fetter da...@fetter.org writes:
 On Wed, Dec 03, 2014 at 05:38:47PM -0500, Tom Lane wrote:
 No.  How would you know whether the remote side even has the enum,
 let alone whether it has an identical set of members?  I don't see
 that enums are noticeably easier than the general case of
 non-built-in types ...

 I must be missing something important.  When querying the remote side,
 *and it's PostgreSQL*, we have catalog access that could be used to
 reconstruct the enums.  Or are you thinking about the case where the
 enum changes from one call to the next?

What do you mean reconstruct the enum?  We can't fix inconsistencies
between the local enum definition and the remote definition (if any).
Say the remote has a value x that we don't, it'll fail when SELECTing
a row containing that value; postgres_fdw has no way to prevent such
a failure.  Conversely, if we have a value y that doesn't exist on the
remote side, transmitting a clause enumcol = 'y' to the remote side
would fail.  postgres_fdw has no way to prevent that, either, save not
transmitting clauses involving enums (which is exactly what it does now).

I suppose we could say that if you create a foreign-table definition
that includes an enum-type column, it's on your head that the enum
exists and is compatibly defined on the far end.  Not sure about the
risk-benefit tradeoff here though.  If you get it wrong (example: the
two servers have different opinions about the sort order of the enum's
values), you would end up with *very* hard to debug wrong results.
I'm not convinced that we really want to encourage users to do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw does not see enums

2014-12-03 Thread David Fetter

On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  On Wed, Dec 03, 2014 at 05:38:47PM -0500, Tom Lane wrote:
  No.  How would you know whether the remote side even has the enum,
  let alone whether it has an identical set of members?  I don't see
  that enums are noticeably easier than the general case of
  non-built-in types ...
 
  I must be missing something important.  When querying the remote side,
  *and it's PostgreSQL*, we have catalog access that could be used to
  reconstruct the enums.  Or are you thinking about the case where the
  enum changes from one call to the next?
 
 What do you mean reconstruct the enum?

Capture its state at the time when IMPORT FOREIGN SCHEMA is executed.
Right now, if you try IMPORT SCHEMA on a foreign table with an enum in
it, postgresql_fdw errors out rather than trying to notice that
there's an enum definition which should precede creation and execute
it in the correct order.

 We can't fix inconsistencies between the local enum definition and
 the remote definition (if any).

Your objection as stated applies to just about any ALTER issued on the
remote side after the IMPORT FOREIGN SCHEMA has taken effect, not just
to changes in enums.

This is why I built functionality into DBI-Link that refreshes foreign
tables.

 Say the remote has a value x that we don't, it'll fail when
 SELECTing a row containing that value; postgres_fdw has no way to
 prevent such a failure.  Conversely, if we have a value y that
 doesn't exist on the remote side, transmitting a clause enumcol =
 'y' to the remote side would fail.  postgres_fdw has no way to
 prevent that, either, save not transmitting clauses involving enums
 (which is exactly what it does now).

 I suppose we could say that if you create a foreign-table definition
 that includes an enum-type column, it's on your head that the enum
 exists and is compatibly defined on the far end.

We're already saying this about some substantial fraction of ALTER
TABLEs that could happen on the remote side.  I don't see how
including enums could make it substantially worse.

 Not sure about the risk-benefit tradeoff here though.  If you get it
 wrong (example: the two servers have different opinions about the
 sort order of the enum's values), you would end up with *very* hard
 to debug wrong results.  I'm not convinced that we really want to
 encourage users to do that.

Perhaps we should add some compatibility checking functions for local
vs. remote tables.  The first cut of these could be, are the tables
defined identically up to what we've specified in the foreign
server/foreign table stuff?  Subtler, looser versions might follow.
For example, if the foreign table definition has VARCHAR(255) and the
remote table has VARCHAR(100), it's not a catastrophe.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] changing primary key col(s) with minimal impact

2014-12-03 Thread Fabrízio de Royes Mello

Hi Joe,

 I have a need to provide a way to change a table's primary key
 columns, in the possible presence of foreign keys, and with concurrent
 use (as much as possible).

 The best way I have come up with is roughly:
 a. create the to-be-new-pk index concurrently
 b. demote the old pk to be a unique constraint
 c. alter table add constraint new pk using existing index

 Step b. involves (in one transaction):
 - --
 * update pg_class row for the table
   relhaspkey false
 * update pg_constraint row for the original pk
   contype = 'u'
   conname = 'some_new_name'
 * update pg_index row for the original pk index
   indisprimary = false
 * alter index original pk rename to some_new_name
 - --
 I have tested this (minimally) and as far as I can tell it works.

 Questions:
 1) any major holes in this approach?
 2) any better ideas?
 3) should we consider an ALTER TABLE ALTER CONSTRAINT command
(or some such) to allow demotion of a PRIMARY KEY to a UNIQUE
constraint?

 Thanks for any thoughts/comments.


Hi Joe,

I already did something like it once, but to rebuild a bloated PK index
with minimal locks. I still waiting for 'REINDEX CONCURRENTLY'  :-)

At that time I didn't have no trouble with this approach.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello

Re: [HACKERS] postgres_fdw does not see enums

2014-12-03 Thread Tom Lane

David Fetter da...@fetter.org writes:
 On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote:
 What do you mean reconstruct the enum?

 Capture its state at the time when IMPORT FOREIGN SCHEMA is executed.
 Right now, if you try IMPORT SCHEMA on a foreign table with an enum in
 it, postgresql_fdw errors out rather than trying to notice that
 there's an enum definition which should precede creation and execute
 it in the correct order.

Oh, you think IMPORT FOREIGN SCHEMA should try to import enums?
I doubt it.  What happens if the enum already exists locally?
And why enums, and not domains, ranges, composite types, etc?

Perhaps more to the point, IMPORT FOREIGN SCHEMA is defined in the SQL
standard, as are its effects, and those effects are defined as a series of
CREATE FOREIGN TABLE commands.  There's nothing there about trying to
import types that the tables might depend on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw does not see enums

2014-12-03 Thread David Fetter

On Wed, Dec 03, 2014 at 06:17:51PM -0500, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote:
  What do you mean reconstruct the enum?
 
  Capture its state at the time when IMPORT FOREIGN SCHEMA is executed.
  Right now, if you try IMPORT SCHEMA on a foreign table with an enum in
  it, postgresql_fdw errors out rather than trying to notice that
  there's an enum definition which should precede creation and execute
  it in the correct order.
 
 Oh, you think IMPORT FOREIGN SCHEMA should try to import enums?

Yes.

 I doubt it.  What happens if the enum already exists locally?

Informative error message along the lines of, local enum foo.bar
doesn't match remote enum public.bar with a suitable HINT comparing
the enums' values.

However, I don't see much of a use case for this because INTO SCHEMA
should be specifying an empty schema, or at least one without objects
in it (like ENUMs) that could clash.

 And why enums, and not domains, ranges, composite types, etc?

You'd be assuming I think those should be excluded. ;)

 Perhaps more to the point, IMPORT FOREIGN SCHEMA is defined in the
 SQL standard, as are its effects, and those effects are defined as a
 series of CREATE FOREIGN TABLE commands.  There's nothing there
 about trying to import types that the tables might depend on.

The SQL standard has an awful lot of holes, this one being about the
size of the Chicxulub crater.

That fact doesn't force our implementation to throw up its hands when
it finds a feature we've implemented and encouraged people to use.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] libpq pipelining

2014-12-03 Thread Matt Newell


Hi,

The recent discussion about pipelining in the jodbc driver prompted me to look 
at what it would take for libpq.

I have a proof of concept patch working.  The results are even more promising 
than I expected.

While it's true that many applications and frameworks won't easily benefit, it 
amazes me that this hasn't been explored before.  

I developed a simple test application that creates a table with a single auto 
increment primary key column, then runs a 4 simple queries x times each:

INSERT INTO test() VALUES ()
SELECT * FROM test LIMIT 1
SELECT * FROM test
DELETE FROM test

The parameters to testPipelinedSeries are (number of times to execute each 
query, maximum number of queued queries).

Results against local server:

testPipelinedSeries(10,1) took 0.020884
testPipelinedSeries(10,3) took 0.020630, speedup 1.01
testPipelinedSeries(10,10) took 0.006265, speedup 3.33
testPipelinedSeries(100,1) took 0.042731
testPipelinedSeries(100,3) took 0.043035, speedup 0.99
testPipelinedSeries(100,10) took 0.037222, speedup 1.15
testPipelinedSeries(100,25) took 0.031223, speedup 1.37
testPipelinedSeries(100,50) took 0.032482, speedup 1.32
testPipelinedSeries(100,100) took 0.031356, speedup 1.36

Results against remote server through ssh tunnel(30-40ms rtt):

testPipelinedSeries(10,1) took 3.2461736
testPipelinedSeries(10,3) took 1.1008443, speedup 2.44
testPipelinedSeries(10,10) took 0.342399, speedup 7.19
testPipelinedSeries(100,1) took 26.25882588
testPipelinedSeries(100,3) took 8.8509234, speedup 3.04
testPipelinedSeries(100,10) took 3.2866285, speedup 9.03
testPipelinedSeries(100,25) took 2.1472847, speedup 17.57
testPipelinedSeries(100,50) took 1.957510, speedup 27.03
testPipelinedSeries(100,100) took 0.690682, speedup 37.47

I plan to write documentation, add regression testing, and do general cleanup 
before asking for feedback on the patch itself.  Any suggestions about 
performance testing or api design would be nice.  I haven't played with 
changing the sync logic yet, but I'm guessing that an api to allow manual sync 
instead of a sync per PQsendQuery will be needed.  That could make things 
tricky though with multi-statement queries, because currently the only way to 
detect when results change from one query  to the next are a ReadyForQuery 
message.

Matt Newell

/*
 * src/test/examples/testlibpqpipeline.c
 *
 *
 * testlibpqpipeline.c
 *		this test program test query pipelining and it's performance impact
 *
 *
 */
#include stdio.h
#include stdlib.h
#include sys/time.h

#include libpq-fe.h

// If defined we won't issue more sql commands if the socket's
// write buffer is full
//#define MIN_LOCAL_Q

//#define PRINT_QUERY_PROGRESS

static int testPipelined( PGconn * conn, int totalQueries, int totalQueued, const char * sql );
static int testPipelinedSeries( PGconn * conn, int totalQueries, int totalQueued, int baseline_usecs );


int
testPipelined( PGconn * conn, int totalQueries, int totalQueued, const char * sql )
{
	int nQueriesQueued;
	int nQueriesTotal;
	PGresult * result;
	PGquery * firstQuery;
	PGquery * curQuery;
	
	nQueriesQueued = nQueriesTotal = 0;
	result = NULL;
	firstQuery = curQuery = NULL;
	
	while( nQueriesQueued  0 || nQueriesTotal  totalQueries ) {
		
		if( PQconsumeInput(conn) == 0 ) {
			printf( PQconsumeInput ERROR: %s\n, PQerrorMessage(conn) );
			return 1;
		}
		
		do {
			curQuery = PQgetFirstQuery(conn);
			
			/* firstQuery is finished */
			if( firstQuery != curQuery )
			{
//printf( %p done, curQuery=%p\n, firstQuery, curQuery );
#ifdef PRINT_QUERY_PROGRESS
printf(-);
#endif
firstQuery = curQuery;
nQueriesQueued--;
			}
			
			/* Break if no queries are ready */
			if( !firstQuery || PQisBusy(conn) )
break;
			
			if( (result = PQgetResult(conn)) != 0 )
PQclear(result);
		}
		while(1);
		
		if( nQueriesTotal  totalQueries  nQueriesQueued  totalQueued ) {
#ifdef MIN_LOCAL_Q
			int flushResult = PQflush(conn);
			 if( flushResult == -1 ) {
printf( PQflush ERROR: %s\n, PQerrorMessage(conn) );
return 1;
			} else if ( flushResult == 1 )
continue;
#endif
			PQsendQuery(conn,sql);
			if( firstQuery == NULL )
firstQuery = PQgetFirstQuery(conn);
			nQueriesTotal++;
			nQueriesQueued++;
#ifdef PRINT_QUERY_PROGRESS
			printf( + );
#endif
		}
	}
#ifdef PRINT_QUERY_PROGRESS
	printf( \n );
#endif
	return 0;
}

int testPipelinedSeries( PGconn * conn, int totalQueries, int totalQueued, int baseline_usecs )
{
	int result;
	struct timeval tv1, tv2;
	int secs, usecs;
	
	gettimeofday(tv1,NULL);
#define TEST_P(q) \
	if( (result = testPipelined(conn,totalQueries,totalQueued,q)) != 0 ) \
		return result;
	TEST_P(INSERT INTO test() VALUES ());
	TEST_P(SELECT * FROM test LIMIT 1);
	TEST_P(SELECT * FROM test);
	TEST_P(DELETE FROM test);
	gettimeofday(tv2,NULL);
	secs = tv2.tv_sec - tv1.tv_sec;
	usecs = secs * 100 + tv2.tv_usec - tv1.tv_usec;
	printf(testPipelinedSeries(%i,%i) took

Re: [HACKERS] [COMMITTERS] pgsql: Fix whitespace

2014-12-03 Thread Peter Eisentraut

On 12/3/14 7:44 AM, Alvaro Herrera wrote:
 Does this mean you're happy with it now?  Does make check-world pass for
 you?

Yes, it all works now.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-03 Thread Peter Geoghegan

On Tue, Dec 2, 2014 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote:
 Basically, the case in which I think it's helpful to issue a
 suggestion here is when the user has used the table name rather than
 the alias name.  I wonder if it's worth checking for that case
 specifically, in lieu of what you've done here, and issuing a totally
 different hint in that case (HINT: You must refer to this as column
 as prime_minister.id rather than cameron.id).

Well, if an alias is used, and you refer to an attribute using a
non-alias name (i.e. the original table name), then you'll already get
an error suggesting that the alias be used instead -- of course,
that's nothing new. It doesn't matter to the existing hinting
mechanism if the attribute name is otherwise wrong. Once you fix the
code to use the alias suggested, you'll then get this new
Levenshtein-based hint.

 Another idea, which I think I like less well, is to check the
 Levenshtein distance between the allowed alias and the entered alias
 and, if that's within the half-the-shorter-length threshold, consider
 possible matches from that RTE, charge the distance between the
 correct alias and the entered alias as a penalty to each potential
 column match.

I don't about that either. Aliases are often totally arbitrary,
particularly for ad-hoc queries, which is what this is aimed at.

 What I think won't do is to look at a situation where the user has
 entered automobile.id and suggest that maybe they meant student.iq, or
 even student.id.

I'm not sure I follow. If there is an automobile.ip, then it will be
suggested. If there is no automobile column that's much of a match (so
no automobile.ip, say), then student.id will be suggested (and not
student.iq, *even if there is no student.id* - the final quality check
saves us). So this is possible:

postgres=# select iq, * from student, automobile;
ERROR:  42703: column iq does not exist
LINE 1: select iq, * from student, automobile;
   ^
HINT:  Perhaps you meant to reference the column student.id.
postgres=# select automobile.iq, * from student, automobile;
ERROR:  42703: column automobile.iq does not exist
LINE 1: select automobile.iq, * from student, automobile;
   ^

(note that using the table name makes us *not* see a suggestion where
we otherwise would).

The point is that there is a fixed penalty for a wrong user-specified
alias, but all relation RTEs are considered.

 The amount of difference between the names has got to
 matter for the RTE names, just as it does for the column names.

I think it makes sense that it matters by a fixed amount. Besides,
this seems complicated enough already - I don't won't to add more
complexity to worry about equidistant (but still actually valid)
RTE/table/alias names.

It sounds like your concern here is mostly a concern about the
relative distance among multiple matches, as opposed to the absolute
quality of suggestions. The former seems a lot less controversial than
the latter was, though - the user always gets the best match, or the
join pair of best matches, or no match when this new hinting mechanism
is involved.

I attach a new revision. The revision:

* Uses default costs for Levenshtein distance.

* Still charges extra for a non-alias-matching match (although it only
charges a fixed distance of 1 extra). This has regression test
coverage.

* Applies a generic final quality check that enforces a requirement
that a hint have a distance of no greater than 50% of the total string
size. No special treatment of shorter strings is involved anymore.

* Moves almost everything out of scanRTEForColumn() as you outlined
(into a new function, updateFuzzyAttrMatchState(), per your
suggestion).

* Moves dropped column detection into updateFuzzyAttrMatchState(), per
your suggestion.

* Still does the if (rte-rtekind == RTE_JOIN) thing in the existing
function searchRangeTableForCol().

I am quite confident that a suggestion from a join RTE will never be
useful, to either the existing use of searchRangeTableForCol() or this
expanded use, and it makes more sense to me to put it there. In fact,
the existing use of searchRangeTableForCol() is really rather similar
to this, and will give up on the first identical match (which is taken
as evidence that there is a attribute of that name, but isn't visible
at this level of the query). So I have not followed your suggestion
here.

Thoughts?
-- 
Peter Geoghegan
From 81c7b0691e9d03c1bdd99f4b264737306d1bd2cf Mon Sep 17 00:00:00 2001
From: Peter Geoghegan p...@heroku.com
Date: Wed, 12 Nov 2014 15:31:37 -0800
Subject: [PATCH] Levenshtein distance column HINT

Add a new HINT -- a guess as to what column the user might have intended
to reference, to be shown in various contexts where an
ERRCODE_UNDEFINED_COLUMN error is raised.  The user will see this HINT
when he or she fat-fingers a column reference in an ad-hoc SQL query, or
incorrectly pluralizes or fails to pluralize a column reference, or
incorrectly omits or

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Fujii Masao

On Wed, Dec 3, 2014 at 11:54 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Pushed with some extra cosmetic tweaks.

I got the following assertion failure when I executed pg_xact_commit_timestamp()
in the standby server.

=# select pg_xact_commit_timestamp('1000'::xid);
TRAP: FailedAssertion(!(((oldestCommitTs) != ((TransactionId) 0)) ==
((newestCommitTs) != ((TransactionId) 0))), File: commit_ts.c,
Line: 315)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2014-12-04
12:01:08 JST sby1 LOG:  server process (PID 15545) was terminated by
signal 6: Aborted
2014-12-04 12:01:08 JST sby1 DETAIL:  Failed process was running:
select pg_xact_commit_timestamp('1000'::xid);

The way to reproduce this problem is

#1. set up and start the master and standby servers with
track_commit_timestamp disabled
#2. enable track_commit_timestamp in the master and restart the master
#3. run some write transactions
#4. enable track_commit_timestamp in the standby and restart the standby
#5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby

BTW, at the step #4, I got the following log messages. This might be a hint for
this problem.

LOG:  file pg_commit_ts/ doesn't exist, reading as zeroes
CONTEXT:  xlog redo Transaction/COMMIT: 2014-12-04 12:00:16.428702+09;
inval msgs: catcache 59 catcache 58 catcache 59 catcache 58 catcache
45 catcache 44 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 snapshot 2608
relcache 16384

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Simon Riggs

On 4 December 2014 at 03:08, Fujii Masao masao.fu...@gmail.com wrote:

 #1. set up and start the master and standby servers with
 track_commit_timestamp disabled
 #2. enable track_commit_timestamp in the master and restart the master
 #3. run some write transactions
 #4. enable track_commit_timestamp in the standby and restart the standby
 #5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby

I'm not sure what step4 is supposed to do?

Surely if steps 1-3 generate any WAL then the standby should replay
it, whether or not track_commit_timestamp is enabled.

So what effect does setting that parameter on the standby?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-12-03 Thread Fujii Masao

On Thu, Dec 4, 2014 at 12:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 4 December 2014 at 03:08, Fujii Masao masao.fu...@gmail.com wrote:

 #1. set up and start the master and standby servers with
 track_commit_timestamp disabled
 #2. enable track_commit_timestamp in the master and restart the master
 #3. run some write transactions
 #4. enable track_commit_timestamp in the standby and restart the standby
 #5. execute select pg_xact_commit_timestamp('1000'::xid) in the standby

 I'm not sure what step4 is supposed to do?

 Surely if steps 1-3 generate any WAL then the standby should replay
 it, whether or not track_commit_timestamp is enabled.

 So what effect does setting that parameter on the standby?

At least track_commit_timestamp seems to need to be enabled even in the standby
when we want to call pg_xact_commit_timestamp() and pg_last_committed_xact()
in the standby. I'm not sure if this is good design, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

2014-12-03 Thread Ashutosh Bapat

On Thu, Dec 4, 2014 at 9:05 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
wrote:

 (2014/12/03 19:35), Ashutosh Bapat wrote:

 On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:


  This is not exactly extension of non-inheritance case. non-inheritance
 case doesn't show two remote SQLs under the same plan node. May be you
 can rename the label Remote SQL as Remote UPDATE/INSERT/DELETE (or
 something to that effect) for the DML command and the Foreign plan node
 should be renamed to Foreign access node or something to indicate that
 it does both the scan as well as DML. I am not keen about the actual
 terminology, but I think a reader of plan shouldn't get confused.

 We can leave this for committer's judgement.


 Thanks for the proposal!  I think that would be a good idea.  But I think
 there would be another idea.  An example will be shown below.  We show the
 update commands below the ModifyTable node, not above the corresponding
 ForeignScan nodes, so maybe less confusing.  If there are no objections of
 you and others, I'll update the patch this way.

 postgres=# explain verbose update parent set a = a * 2 where a = 5;
  QUERY PLAN
 
 -
  Update on public.parent  (cost=0.00..280.77 rows=25 width=10)
On public.ft1
  Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1
On public.ft2
  Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1
-  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
  Output: (parent.a * 2), parent.ctid
  Filter: (parent.a = 5)
-  Foreign Scan on public.ft1  (cost=100.00..140.38 rows=12 width=10)
  Output: (ft1.a * 2), ft1.ctid
  Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5))
 FOR UPDATE
-  Foreign Scan on public.ft2  (cost=100.00..140.38 rows=12 width=10)
  Output: (ft2.a * 2), ft2.ctid
  Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5))
 FOR UPDATE
 (12 rows)


Looks better.


  IIUC, even the transactions over the local and the *single* remote
 server are not guaranteed to be executed atomically in the current
 form.  It is possible that the remote transaction succeeds and the
 local one fails, for example, resulting in data inconsistency
 between the local and the remote.


  IIUC, while committing transactions involving a single remote server,
 the steps taken are as follows
 1. the local changes are brought to PRE-COMMIT stage, which means that
 the transaction *will* succeed locally after successful completion of
 this phase,
 2. COMMIT message is sent to the foreign server
 3. If step two succeeds, local changes are committed and successful
 commit is conveyed to the client
 4. if step two fails, local changes are rolled back and abort status is
 conveyed to the client
 5. If step 1 itself fails, the remote changes are rolled back.
 This is as per one phase commit protocol which guarantees ACID for
 single foreign data source. So, the changes involving local and a single
 foreign server seem to be atomic and consistent.


 Really?  Maybe I'm missing something, but I don't think the current
 implementation for committing transactions has such a mechanism stated in
 step 1.  So, I think it's possible that the local transaction fails in
 step3 while the remote transaction succeeds, as mentioned above.


PFA a script attached which shows this. You may want to check the code in
pgfdw_xact_callback() for actions taken by postgres_fdw on various events.
CommitTransaction() for how those events are generated. The code there
complies with the sequence above.



 Thanks,

 Best regards,
 Etsuro Fujita




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] inherit support for foreign tables

2014-12-03 Thread Ashutosh Bapat

Sorry, here's the script.

On Thu, Dec 4, 2014 at 10:00 AM, Ashutosh Bapat 
ashutosh.ba...@enterprisedb.com wrote:



 On Thu, Dec 4, 2014 at 9:05 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
  wrote:

 (2014/12/03 19:35), Ashutosh Bapat wrote:

 On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp
 wrote:


  This is not exactly extension of non-inheritance case. non-inheritance
 case doesn't show two remote SQLs under the same plan node. May be you
 can rename the label Remote SQL as Remote UPDATE/INSERT/DELETE (or
 something to that effect) for the DML command and the Foreign plan node
 should be renamed to Foreign access node or something to indicate that
 it does both the scan as well as DML. I am not keen about the actual
 terminology, but I think a reader of plan shouldn't get confused.

 We can leave this for committer's judgement.


 Thanks for the proposal!  I think that would be a good idea.  But I think
 there would be another idea.  An example will be shown below.  We show the
 update commands below the ModifyTable node, not above the corresponding
 ForeignScan nodes, so maybe less confusing.  If there are no objections of
 you and others, I'll update the patch this way.

 postgres=# explain verbose update parent set a = a * 2 where a = 5;
  QUERY PLAN
 
 -
  Update on public.parent  (cost=0.00..280.77 rows=25 width=10)
On public.ft1
  Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1
On public.ft2
  Remote SQL: UPDATE public.mytable_2 SET a = $2 WHERE ctid = $1
-  Seq Scan on public.parent  (cost=0.00..0.00 rows=1 width=10)
  Output: (parent.a * 2), parent.ctid
  Filter: (parent.a = 5)
-  Foreign Scan on public.ft1  (cost=100.00..140.38 rows=12 width=10)
  Output: (ft1.a * 2), ft1.ctid
  Remote SQL: SELECT a, ctid FROM public.mytable_1 WHERE ((a = 5))
 FOR UPDATE
-  Foreign Scan on public.ft2  (cost=100.00..140.38 rows=12 width=10)
  Output: (ft2.a * 2), ft2.ctid
  Remote SQL: SELECT a, ctid FROM public.mytable_2 WHERE ((a = 5))
 FOR UPDATE
 (12 rows)


 Looks better.


  IIUC, even the transactions over the local and the *single* remote
 server are not guaranteed to be executed atomically in the current
 form.  It is possible that the remote transaction succeeds and the
 local one fails, for example, resulting in data inconsistency
 between the local and the remote.


  IIUC, while committing transactions involving a single remote server,
 the steps taken are as follows
 1. the local changes are brought to PRE-COMMIT stage, which means that
 the transaction *will* succeed locally after successful completion of
 this phase,
 2. COMMIT message is sent to the foreign server
 3. If step two succeeds, local changes are committed and successful
 commit is conveyed to the client
 4. if step two fails, local changes are rolled back and abort status is
 conveyed to the client
 5. If step 1 itself fails, the remote changes are rolled back.
 This is as per one phase commit protocol which guarantees ACID for
 single foreign data source. So, the changes involving local and a single
 foreign server seem to be atomic and consistent.


 Really?  Maybe I'm missing something, but I don't think the current
 implementation for committing transactions has such a mechanism stated in
 step 1.  So, I think it's possible that the local transaction fails in
 step3 while the remote transaction succeeds, as mentioned above.


 PFA a script attached which shows this. You may want to check the code in
 pgfdw_xact_callback() for actions taken by postgres_fdw on various events.
 CommitTransaction() for how those events are generated. The code there
 complies with the sequence above.



 Thanks,

 Best regards,
 Etsuro Fujita




 --
 Best Wishes,
 Ashutosh Bapat
 EnterpriseDB Corporation
 The Postgres Database Company




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


tran_inconsistency.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-03 Thread Amit Langote


Hi,

 From: Jim Nasby [mailto:jim.na...@bluetreble.com]
 On 12/2/14, 9:43 PM, Amit Langote wrote:
 
  What are you going to do if the partitioning key has two columns of
  different data types?
  
  Sorry, this totally eluded me. Perhaps, the 'values' needs some more 
  thought.
 They are one of the most crucial elements of the scheme.
 
  I wonder if your suggestion of pg_node_tree plays well here. This then could
 be a list of CONSTs or some such... And I am thinking it's a concern only for
 range partitions, no? (that is, a multicolumn partition key)
 
  I think partkind switches the interpretation of the field as appropriate. 
  Am I
 missing something? By the way, I had mentioned we could have two values
 fields each for range and list partition kind.
 
 The more SQL way would be records (composite types). That would make
 catalog inspection a LOT easier and presumably make it easier to change the
 partitioning key (I'm assuming ALTER TYPE cascades to stored data). Records
 are stored internally as tuples; not sure if that would be faster than a List 
 of
 Consts or a pg_node_tree. Nodes would theoretically allow using things other
 than Consts, but I suspect that would be a bad idea.
 

While I couldn’t find an example in system catalogs where a record/composite 
type is used, there are instances of pg_node_tree at a number of places like in 
pg_attrdef and others. Could you please point me to such a usage for reference?

 Something else to consider... our user-space support for ranges is now
 rangetypes, so perhaps that's what we should use for range partitioning. The
 up-side (which would be a double-edged sword) is that you could leave holes
 in your partitioning map. Note that in the multi-key case we could still have 
 a
 record of rangetypes.

That is something I had mind at least at some point. My general doubt remains 
about the usage of user space SQL types for catalog fields though I may be 
completely uninitiated about such usage.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel Seq Scan

2014-12-03 Thread Amit Kapila

As per discussion on another thread related to using
custom scan nodes for prototype of parallel sequence scan,
I have developed the same, but directly by adding
new nodes for parallel sequence scan.  There might be
some advantages for developing this as a contrib
module by using custom scan nodes, however I think
we might get stucked after some point due to custom
scan node capability as pointed out by Andres.

The basic idea used is that while evaluating the cheapest
path for scan, optimizer will also evaluate if it can use
parallel seq path.  Currently I have kept a very simple
model to calculate the cost of parallel sequence path which
is that divide the cost for CPU and disk by availble number
of worker backends (We can enhance it based on further
experiments and discussion; we need to consider worker startup
and dynamic shared memory setup cost as well). The work aka
scan of blocks is divided equally among all workers (except for
corner cases where blocks can't be equally divided among workers,
the last worker will be responsible for scanning the remaining blocks).

The number of worker backends that can be used for
parallel seq scan can be configured by using a new GUC
parallel_seqscan_degree, the default value of which is zero
and it means parallel seq scan will not be considered unless
user configures this value.

In ExecutorStart phase, initiate the required number of workers
as per parallel seq scan plan and setup dynamic shared memory and
share the information required for worker to execute the scan.
Currently I have just shared the relId, targetlist and number
of blocks to be scanned by worker, however I think we might want
to generate a plan for each of the workers in master backend and
then share the same to individual worker.
Now to fetch the data from multiple queues corresponding to each
worker a simple mechanism is used that is fetch from first queue
till all the data is consumed from same, then fetch from second
queue and so on.  Also here master backend is responsible for just
getting the data from workers and passing it back to client.
I am sure that we can improve this strategy in many ways
like by making master backend to also perform scan for some
of the blocks rather than just getting data from workers and
a better strategy to fetch the data from multiple queues.

Worker backend will receive the information related to scan
from master backend and generate the plan from same and
execute that plan, so here the work to scan the data after
generating the plan is very much similar to exec_simple_query()
(i.e Create the portal and run it based on planned statement)
except that worker backends will initialize the block range it want to
scan in executor initialization phase (ExecInitSeqScan()).
Workers will exit after sending the data to master backend
which essentially means that for each execution we need
to initiate the workers, I think here we can improve by giving the
control for workers to postmaster so that we don't need to
initialize them each time during execution, however this can
be a totally separate optimization which is better to be done
independently of this patch.
As currently we don't have mechanism to share transaction
state, I have used separate transaction in worker backend to
execute the plan.

Any error in master backend either via backend worker or due
to other issue in master backend itself should terminate all the
workers before aborting the transaction.
We can't do it with the error context callback mechanism
(error_context_stack) which we use at other places in code, as
for this case we need it from the time workers are started till
the execution is complete (error_context_stack could get reset
once the control goes out of the function which has set it.)
One way could be that maintain the callback information in
TransactionState and use it to kill the workers before aborting
transaction in main backend.  Another could be that have another
variable similar to error_context_stack (which will be used
specifically for storing the workers state), and kill the workers
in errfinish via callback. Currently I have handled it at the time of
detaching from shared memory.
Another point that needs to be taken care in worker backend is
that if any error occurs, we should *not* abort the transaction as
the transaction state is shared across all workers.

Currently the parallel seq scan will not be considered
for statements other than SELECT or if there is a join in
the statement or if statement contains quals or if target
list contains non-Var fields. We can definitely support
simple quals and targetlist other than non-Vars.  By simple,
I means that it should not contain functions or some other
conditions which can't be pushed down to worker backend.

Behaviour of some simple statements with patch is as below:

postgres=# create table t1(c1 int, c2 char(500)) with (fillfactor=10);
CREATE TABLE

postgres=# insert into t1 values(generate_series(1,100),'amit');
INSERT 0 100

76 matches

Mail list logo