date:20141003


On 10/03/2014 11:07 AM, Simon Riggs wrote:

On 1 October 2014 20:54, Heikki Linnakangas hlinnakan...@vmware.com wrote:

On 10/01/2014 02:34 PM, Simon Riggs wrote:



...

When later insert scans see the promise tuple they perform
XactLockTableWait() and when they get control they look again for the
key. If they find a promise tuple with an aborted xid they replace
that value with their own xid and continue as a). Otherwise b).



XactLockTableWait() waits until the end of transaction, that's not you want
here. If the backend that inserted the promise tuple decides to not proceed
with the insertion, and removes the promise tuple, the backend waiting on it
needs to be woken up more or less immediately, not when the transaction
completes.


There is no remove promise tuple part of the above design.


I had this exact same issue in my earlier patch versions, fixed it with a
new kind of heavy-weight lock, and a new field in PGPROC
(http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com). That
wasn't very pretty, but it got the job done. Some other design might be
better.


Currently, if some fool developer decides to insert rows that already
duplicate values in the table, then currently he inserts a heap row,
then sees a potential conflict in the index and waits for the
conflicting xact to end. His fool insert, his wait. That's how btree
does this now.
I don't see any reason why we need to do better for Upsert.

If the row already exists we need to be able to quickly change into an
update, but we still only support one write xact at a time.


That lowers the bar from what I thought everyone agreed on. Namely, if 
two backends run a similar UPSERT command concurrently on a table that 
has more than one unique constraint, they might deadlock, causing one of 
them to throw an error instead of INSERTing or UPDATEing anything.


I'm sure that's useful enough in many applications, but I'd like to have 
a more robust implementation. The shorter we can keep the list of 
caveats, the better.



The value in the index needs to be protected by a block level lock, so
we can check it quickly, but the eventual heap work is serialized by
transactional semantics.

I think a little perspective is due here and we should stick to the
main use case, not cater for bizarre edge cases.


I'm trying to bisect your thoughts on exactly what use cases you think 
we must support, and which ones you consider bizarre edge cases, and 
what exactly is acceptable behavior in those edge cases.



What we are looking for is something that can decide whether it is an
insert or an update and progress quickly with that. It needs to be
correct, but there is no requirement to be faster than currently
possible, in the case of concurrency.


I believe we all agree on that.


Any form of tuple locking that uses the general lock manager will not
be usable. I can't see it is worth the overhead of doing that to
protect against deadlocks that would only be experienced by people
doing foolish things.


Maybe, maybe not, but let's define the acceptable behavior first, and 
think about the implementation second. I'm pretty sure all of the 
approaches discussed so far can be made fast enough, and the bloat 
issues can be made small enough, that it doesn't matter much which one 
we choose from a performance point of view. The differences are in what 
use cases they can support, and the maintainability of the code.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale

2014-10-03 Thread Kouhei Kaigai

 On 10/03/2014 07:08 AM, Kouhei Kaigai wrote:
  Hello,
 
  I recently got a trouble on development of my extension that utilizes
  the shared buffer when it released each buffer page.
 
  This extension transfers contents of the shared buffers to GPU device
  using DMA feature, then kicks a device kernel code.
 
 Wow, that sounds crazy.
 
  Once backend/extension calls ReadBuffer(), resowner.c tracks which
  buffer was referenced by the current resource owner, to ensure these
  buffers being released at end of the transaction.
  However, it seems to me implementation of resowner.c didn't assume
  many buffers are referenced by a particular resource owner simultaneously.
  It manages the buffer index using an expandable array, then looks up
  the target buffer by sequential walk but from the tail because
  recently pinned buffer tends to be released first.
  It made a trouble in my case. My extension pinned multiple thousands
  buffers, so owner-buffers[] were enlarged and takes expensive cost to
  walk on.
  In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in
  total during hash-joining 2M rows; even though hash-joining itself
  takes less than 16 seconds.
 
  What is the best way to solve the problem?
 
 How about creating a separate ResourceOwner for these buffer pins, and doing
 a wholesale ResourceOwnerRelease() on it when you're done?
 
Let me clarify your idea.

1. Create a separate ResourceOwner under the CurrentResourceOwner.
2. Switch CurrentResourceOwner to the self-constructed resource owner
   during ReadBuffer() on thousands buffers.
3. Switch back to the original CurrentResourceOwner.
4. Kick DMA and run GPU device kernel (actual job running...)
5. Switch CurrentResourceOwner to the self-constructed resource owner
   again, during ReleaseBuffer() in reverse order.
6. Switch back to the original CurrentResourceOwner, then calls
   ResourceOwnerDelete().

Hmm... at this moment, I cannot find something not easy to implement.
I'd like to try this idea, then report it later.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale

2014-10-03 Thread Kouhei Kaigai

 On 2014-10-03 10:35:42 +0300, Heikki Linnakangas wrote:
  On 10/03/2014 07:08 AM, Kouhei Kaigai wrote:
   Hello,
  
   I recently got a trouble on development of my extension that
   utilizes the shared buffer when it released each buffer page.
  
   This extension transfers contents of the shared buffers to GPU
   device using DMA feature, then kicks a device kernel code.
 
  Wow, that sounds crazy.
 
 Agreed. I doubt that pinning that many buffers is a sane thing to do. At
 the very least you'll heavily interfere with vacuum and such.
 
My assumption is, this extension is used to handle OLAP type workload,
thus relatively less amount of write traffic to the database.
Sorry, I missed to mention about.

   Once backend/extension calls ReadBuffer(), resowner.c tracks which
   buffer was referenced by the current resource owner, to ensure these
   buffers being released at end of the transaction.
   However, it seems to me implementation of resowner.c didn't assume
   many buffers are referenced by a particular resource owner
 simultaneously.
   It manages the buffer index using an expandable array, then looks up
   the target buffer by sequential walk but from the tail because
   recently pinned buffer tends to be released first.
   It made a trouble in my case. My extension pinned multiple thousands
   buffers, so owner-buffers[] were enlarged and takes expensive cost
   to walk on.
   In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in
   total during hash-joining 2M rows; even though hash-joining itself
   takes less than 16 seconds.
 
   What is the best way to solve the problem?
 
  How about creating a separate ResourceOwner for these buffer pins, and
  doing a wholesale ResourceOwnerRelease() on it when you're done?
 
 Or even just unpinning them in reverse order? That should already fix the
 performance issues?
 
In case when multiple chunks (note: a chunk contains thousands buffers as
a unit of device kernel execution) are running asynchronously, order of
GPU job's completion is not predictable.
So, it does not help my situation if one resource-owner tracks all the
buffers.

Probably, Heikki suggested to create a separate resource-owner per chunk.
In this case, all the buffers in a particular chunk shall be released
on the same time, so ReleaseBuffer() in reverse order makes sense.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

On Fri, Oct 3, 2014 at 2:03 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 That lowers the bar from what I thought everyone agreed on. Namely, if two
 backends run a similar UPSERT command concurrently on a table that has more
 than one unique constraint, they might deadlock, causing one of them to
 throw an error instead of INSERTing or UPDATEing anything.

It lowers the bar to a level that I am not willing to limbo dance
under. You don't even need two unique constraints. Nothing as
complicated as that is required.

When this happens with MySQL, they have the good sense to call it a
bug [1], and even fix it. I find the comparison with conventional
insertion entirely unconvincing.

 I'm sure that's useful enough in many applications, but I'd like to have a
 more robust implementation. The shorter we can keep the list of caveats, the
 better.

INSERT and UPDATE are supposed to be fairly well balanced here.
Conflicts are the norm.

 The value in the index needs to be protected by a block level lock, so
 we can check it quickly, but the eventual heap work is serialized by
 transactional semantics.

 I think a little perspective is due here and we should stick to the
 main use case, not cater for bizarre edge cases.


 I'm trying to bisect your thoughts on exactly what use cases you think we
 must support, and which ones you consider bizarre edge cases, and what
 exactly is acceptable behavior in those edge cases.

Lots of concurrency is not an edge-case.

 Any form of tuple locking that uses the general lock manager will not
 be usable. I can't see it is worth the overhead of doing that to
 protect against deadlocks that would only be experienced by people
 doing foolish things.

 Maybe, maybe not, but let's define the acceptable behavior first, and think
 about the implementation second.

+1. Updating a lot with UPSERT is not foolish. That's all it took to
make earlier prototypes deadlock.

 I'm pretty sure all of the approaches
 discussed so far can be made fast enough, and the bloat issues can be made
 small enough, that it doesn't matter much which one we choose from a
 performance point of view. The differences are in what use cases they can
 support, and the maintainability of the code.

+1

What do we get for giving up on not having unprincipled deadlocks
here? What's the advantage? Assuming that this is a bizarre edge-case
(note: it isn't), what do we get in return for giving up on fixing it?

[1] MySQL bug #52020
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join push-down support for foreign tables

2014-10-03 Thread Shigeru Hanada

2014-09-08 8:07 GMT+09:00 Shigeru HANADA shigeru.han...@gmail.com:
 (2014/09/04 21:37), Robert Haas wrote: On Wed, Sep 3, 2014 at 5:16 AM,
 Probably both the initial cost and final cost calculations should be
 delegated to the FDW, but maybe within postgres_fdw, the initial cost
 should do only the work that can be done without contacting the remote
 server; then, let the final cost step do that if appropriate.  But I'm
 not entirely sure what is best here.

 Agreed.  I'll design planner API along that way for now.

I tried some patterns of implementation but I've not gotten feasible
way yet.  So I'd like to hear hackers' idea.

* Foreign join hook point
First I thought that following existing cost estimating manner is the
way to go, but I tend to think it doesn't fit foreign joins because
join method is tightly-coupled to sort-ness, but foreign join would
not.

In current planner, add_paths_to_joinrel is conscious of sort-ness,
and functions directly called from it are conscious of join method.
But this seems not fit the abstraction level of FDW.  FDW is highly
abstracted, say differ from custom plan providers, so most of work
should be delegated to FDW, including pathkeys consideration, IMO.

Besides that, order of join consideration is another issue.  First I
try to add foreign join consideration at the last (after hash join
consideration), but after some thought I noticed that
early-elimination would work better if we try foreign join first,
because in most cases foreign join is the cheapest way to accomplish a
join between two foreign relations.

So currently I'm thinking that delegating whole join consideration to
FDWs before other join consideration in add_paths_to_joinrel, by
calling new FDW API would be promising.

This means that FDWs can add multiple arbitrary paths to RelOptInfo in
a call.  Of course this allows FDWs to do round-trip per path, but it
would be optimization issue,  and they can compare their own
candidates they can get without round-trip.

* Supported join types
INNER and (LEFT|RIGHT|FULL) OUTER would be safe to push down, even
though some of OUTER JOIN might not be much faster than local join.
I'm not sure that SEMI and ANTI joins are safe to push-down.  Can we
leave the matter to FDWs, or should we forbid FDWs pushing down by not
calling foreign join API?  Anyway SEMI/ANTI would not be supported in
the first version.

* Blockers of join push-down
Pushing down join means that foreign scans for inner and outer are
skipped, so some elements blocks pushing down.  Basically the criteria
is same as scan push-down and update push-down.

After some thoughts, we should check only unsafe expression in join
qual and restrict qual.  This limitation is necessary to avoid
difference between results of pushe-down or not.  Target list seems to
contain only Var for necessary columns, but we should check that too.

* WIP patch
Attached is WIP patch for reviewing the design.  Works should be done
are 1) judging push-down or not, and 2) generating join SQL.  For 2),
I'm thinking about referring Postgres-XC's join shipping mechanism.

Any comments or questions are welcome.
-- 
Shigeru HANADA


join_pushdown_wip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-03 Thread Simon Riggs

On 3 October 2014 10:03, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 That lowers the bar from what I thought everyone agreed on. Namely, if two
 backends run a similar UPSERT command concurrently on a table that has more
 than one unique constraint, they might deadlock, causing one of them to
 throw an error instead of INSERTing or UPDATEing anything.

Now we get to a productive discussion, this is good.

When we first make requirements, obviously everyone agrees a long list
of things since there is initially not much reason to say No to it. As
we go towards implementation we begin to understand the true price of
meeting each requirement. It was good that this detail was raised and
sensible to attempt to avoid unprincipled deadlocks. If the price of
avoiding them is high, it is worth reconsidering how important that
is.

My view is that I can't see the above use case from happening in real
situations, except by infrequent mistake. In most cases, unique
indexes represent some form of object identity and those don't change
frequently in the real world. So to be changing two unique fields at
the same time and it not representing some form of business process
error that people would like to see fail anyway, I'd be surprised by.
If someone has an example of that in a real, common case then I would
like to see it and I would revise my view accordingly

We are frequently hampered by trying to design something that can sing
and dance at the same time. That thought is exactly how we are looking
at upsert now, not merge. So trimming our objectives to what makes
sense is an accepted part of this project already.

 Any form of tuple locking that uses the general lock manager will not
 be usable. I can't see it is worth the overhead of doing that to
 protect against deadlocks that would only be experienced by people
 doing foolish things.


 Maybe, maybe not, but let's define the acceptable behavior first, and think
 about the implementation second.

Hand in hand, I think, given the other constraints of time, review,
maintainability etc..

 I'm pretty sure all of the approaches
 discussed so far can be made fast enough, and the bloat issues can be made
 small enough, that it doesn't matter much which one we choose from a
 performance point of view. The differences are in what use cases they can
 support, and the maintainability of the code.

The discussion of approaches has up to now focused only on what
impossibities exist, with a we must do this because feature A can't
do aspect X. I haven't yet seen much discussion of maintainability of
code, but I would guess simpler is better, overall.

Realistically, I won't be coding any separate approaches, so this is
down to Peter and maybe yourself Heikki. I hope only to avoid
foreclosing viable and simple approaches for the wrong reasons. There
are many other considerations that make up the final view.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

On Fri, Oct 3, 2014 at 2:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 My view is that I can't see the above use case from happening in real
 situations, except by infrequent mistake. In most cases, unique
 indexes represent some form of object identity and those don't change
 frequently in the real world. So to be changing two unique fields at
 the same time and it not representing some form of business process
 error that people would like to see fail anyway, I'd be surprised by.

Are we talking about two different things here?

Unprincipled deadlocks can be seen without updating any constrained
column in the UPSERT. The test-case that originally highlighted the
issue only had one unique index, and it was *not* in the update's
targetlist. See:

https://wiki.postgresql.org/wiki/Value_locking#.22Unprincipled_Deadlocking.22_and_value_locking

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-03 Thread Simon Riggs

On 3 October 2014 10:32, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 3, 2014 at 2:03 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 That lowers the bar from what I thought everyone agreed on. Namely, if two
 backends run a similar UPSERT command concurrently on a table that has more
 than one unique constraint, they might deadlock, causing one of them to
 throw an error instead of INSERTing or UPDATEing anything.

 It lowers the bar to a level that I am not willing to limbo dance
 under. You don't even need two unique constraints. Nothing as
 complicated as that is required.

 When this happens with MySQL, they have the good sense to call it a
 bug [1], and even fix it. I find the comparison with conventional
 insertion entirely unconvincing.

Is there a test case that demonstrates the problem?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

On Fri, Oct 3, 2014 at 3:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Is there a test case that demonstrates the problem?

Yes. See my e-mail to Heikki here:

http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com

Testcase is attached.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-03 Thread Simon Riggs

On 3 October 2014 10:57, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 3, 2014 at 2:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 My view is that I can't see the above use case from happening in real
 situations, except by infrequent mistake. In most cases, unique
 indexes represent some form of object identity and those don't change
 frequently in the real world. So to be changing two unique fields at
 the same time and it not representing some form of business process
 error that people would like to see fail anyway, I'd be surprised by.

 Are we talking about two different things here?

 Unprincipled deadlocks can be seen without updating any constrained
 column in the UPSERT. The test-case that originally highlighted the
 issue only had one unique index, and it was *not* in the update's
 targetlist. See:

 https://wiki.postgresql.org/wiki/Value_locking#.22Unprincipled_Deadlocking.22_and_value_locking

I followed that to a wiki page, then clicked again to an old email. No
test case.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT


On 10/03/2014 01:05 PM, Peter Geoghegan wrote:

On Fri, Oct 3, 2014 at 3:04 AM, Simon Riggs si...@2ndquadrant.com wrote:

Is there a test case that demonstrates the problem?


Yes. See my e-mail to Heikki here:

http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com

Testcase is attached.


Simon's approach would actually pass that test case just fine. It 
inserts the (promise) index tuple first, and heap tuple only after that. 
It will fail the test case with more than one unique index, however.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST splitting on empty pages


On 10/03/2014 05:03 AM, Andrew Gierth wrote:

This is from Bug #11555, which is still in moderation as I type this
(analysis was done via IRC).

The GiST insertion code appears to have no length checks at all on the
inserted entry. index_form_tuple checks for length = 8191, with the
default blocksize, but obviously a tuple less than 8191 bytes may
still not fit on the page due to page header info etc.

However, the gist insertion code assumes that if the tuple doesn't fit,
then it must have to split the page, with no check whether the page is
already empty.

[script to reproduce]


Thanks for the analysis!


Suggested fixes (probably all of these are appropriate):

1. gistSplit should have check_stack_depth()

2. gistSplit should probably refuse to split anything if called with
only one item (which is the value being inserted).

3. somewhere before reaching gistSplit it might make sense to check
explicitly (e.g. in gistFormTuple) whether the tuple will actually
fit on a page.

4. pg_trgm probably should do something more sensible with large leaf
items, but this is a peripheral issue since ultimately the gist core
must enforce these limits rather than rely on the opclass.


Fixed, I did 1. and 2. Number 3. would make a lot of sense, but I 
couldn't totally convince myself that we can safely put the check in 
gistFormTuple() without causing some cases to fail that currently work. 
I think the code sometimes forms tuples that are never added to the 
insert as such, used only to union them to existing tuples on internal 
pages. Or maybe not, but in any case, the check in gistSplit() is enough 
to stop the crash.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP test breakage on MacOS X

On Thu, Oct 2, 2014 at 10:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 make check-world dies ingloriously for me, like this:

 FWIW, it works fine for me on my Mac laptop, using the Perl 5.16.2 that
 comes standard with OSX 10.9.5.  I did have to install IPC::Run from
 CPAN though.

 #   Failed test 'initdb options handling'
 #   at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229.

 This output seems to be pretty clear proof that you're not using
 Apple's Perl.  What is it exactly (where did you get it from)?

 Also, noticing that what you're using is evidently Perl 5.12, I'm
 wondering whether our TAP test scripts require a fairly new Perl version.
 I recall some of my Salesforce colleagues griping that the TAP scripts
 didn't work with older Perls.

I installed it via MacPorts.

[rhaas pgsql]$ which perl
/opt/local/bin/perl
[rhaas pgsql]$ perl -v

This is perl 5, version 12, subversion 5 (v5.12.5) built for
darwin-thread-multi-2level

Copyright 1987-2012, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using man perl or perldoc perl.  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog and replication slots

On 2014-10-03 10:30:19 +0900, Michael Paquier wrote:
 On Thu, Oct 2, 2014 at 12:44 AM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  I pushed the first part.
 
 Thanks. Attached is a rebased version of patch 2, implementing the actual
 feature. One thing I noticed with more testing is that if --create is used
 and that the destination folder does not exist, pg_receivexlog was creating
 the slot, and left with an error. This does not look user-friendly so I
 changed the logic a bit to check for the destination folder before creating
 any slot. This results in a bit of refactoring, but this way behavior is
 more intuitive.

Ok.

 para
 +applicationpg_receivexlog/application can run in one of two following
 +modes, which control physical replication slot:

I don't think that's good enough. There's also the important mode where
it's not doing --create/--drop at all.


 + /*
 +  * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
 +  * position.
 +  */
 + if (!RunIdentifySystem(conn, NULL, NULL, NULL, db_name))
 + disconnect_and_exit(1);
 +
 + /*
 +  * Check that there is a database associated with connection, none
 +  * should be defined in this context.
 +  */
 + if (db_name)
 + {
 + fprintf(stderr,
 + _(%s: database defined for replication 
 connection \%s\\n),
 + progname, replication_slot);
 + disconnect_and_exit(1);
 + }

I don't like 'defined' here. 'replication connection unexpectedly is
database specific' or something would be better.

I do wonder whether --create/--drop aren't somewhat wierd for
pg_receivexlog. It's not that clear what it means. It'd be ugly, but we
could rename them --create-slot/drop-slot.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On 2014-10-02 20:08:33 -0400, Greg Smith wrote:
 I did a fair dive into double-checking the decision to just leave
 xloginsert_locks fixed at 8 for 9.4.  My conclusion:  good call, move along.
 Further improvements beyond what the 8-way split gives sure are possible.
 But my guess from chasing them a little is that additional places will pop
 up as things that must also be tweaked, before you'll see those gains turn
 significant.

Thanks for doing this.


 I'd like to see that box re-opened at one point.  But if we do that, I'm
 comfortable that could end with a xloginsert_locks that tunes itself
 reasonably on large servers in the end, similar to wal_buffers.  There's
 nothing about this that makes feel like it needs a GUC.  I barely needed an
 exposed knob to do this evaluation.
 
 = Baseline =
 
 I rolled back a few commits to just before the GUC was removed and tested
 against that point in git time.  Starting with the 4 client test case Heikki
 provided, the fastest runs on my 24 core server looked like this:
 
 tps = 56.691855 (including connections establishing)
 
 Repeat runs do need to drop the table and rebuild, because eventually AV
 kicks in on things in a big way, and then your test is toast until it's
 done.  Attached is what I settled on for a test harness. Nothing here was so
 subtle I felt a more complicated harness was needed.
 
 Standard practice for me is to give pgbench more workers when worrying about
 any scalability tests.  That gives a tiny improvement, to where this is
 typical with 4 clients and 4 workers:
 
 tps = 60.942537 (including connections establishing)
 
 Increasing to 24 clients plus 24 workers gives roughly the same numbers,
 suggesting that the bottleneck here is certainly not the client count, and
 that the suggestion of 4 was high enough:
 
 tps = 56.731581 (including connections establishing)
 
 Decreasing xloginsert_locks to 1, so back to the original problem, the rate
 normally looks like this instead:
 
 tps = 25.384708 (including connections establishing)
 
 So the big return you get just fine with the default tuning; great. I'm
 happy to see it ship like this as good enough for 9.4.
 
 = More locks =
 
 For the next phase, I stuck to 24 clients and 24 workers.  If I then bump up
 xloginsert_locks to something much larger, there is an additional small gain
 to be had.  With 24 locks, so basically ever client has their own, instead
 of 57-60 TPS, I managed to get as high as this:
 
 tps = 66.790968 (including connections establishing)
 
 However, the minute I get into this territory, there's an obvious bottleneck
 shift going on in there too.  The rate of creating new checkpoint segments
 becomes troublesome as one example, with messages like this:
 
 LOG:  checkpoints are occurring too frequently (1 second apart)
 HINT:  Consider increasing the configuration parameter
 checkpoint_segments.
 
 When 9.4 is already giving a more than 100% gain on this targeted test case,
 I can't see that chasing after maybe an extra 10% is worth having yet
 another GUC around.  Especially when it will probably take multiple tuning
 steps before you're done anyway; we don't really know the rest of them yet;
 and when we do, we probably won't need a GUC to cope with them in the end
 anyway.

I've modified the test slightly, by having the different backends insert
into different relations. Even on my measly 5 year old workstation I
*do* see quite a bit more than 10%.


psql -f /tmp/prepare.sql  pgbench -n -f /tmp/fooinsert.sql -c 64 -j 64 -T 10
on a 2x E5520 server (2 sockets a 4 cores a 2 threads)
with the following configuration:
 -c shared_buffers=2GB
 -c wal_level=hot_standby
 -c full_page_writes=off
 -c checkpoint_segments=400
 -c fsync=off (io system here is abysmally bad)
 -c synchronous_commit=off

#define NUM_XLOGINSERT_LOCKS  1
tps = 52.711939 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  8
tps = 286.496054 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  16
tps = 346.113313 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  24
tps = 363.242111 (including connections establishing)

I'd not be surprised at all if you'd see bigger influence on a system
with 4 sockets.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
insert into foo_:client_id select g from generate_series(1, 1) g;
CREATE OR REPLACE FUNCTION exec(text) returns text language plpgsql volatile
  AS $f$
BEGIN
  EXECUTE $1;
  RETURN $1;
END;
$f$;
\o /dev/null
SELECT exec('drop table if exists foo_'||g.i||'; create table foo_'||g.i||'(id 
int4);') FROM generate_series(1, 64) g(i);
\o
CHECKPOINT;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DDL Damage Assessment

On Thu, Oct 2, 2014 at 5:22 PM, Stephen Frost sfr...@snowman.net wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-10-02 17:03:59 -0400, Stephen Frost wrote:
   That sounds extremely complex. You'd have to implement the fake
   columns, foreign keys, indexes, etc on most execution nodes, the
   planner, and even system views.
 
  Eh?  We have MVCC catalog access.

 So you want to modify the catalog without actually doing the
 corresponding actions? That'll be heck of invasive. With changes all
 over the backend. We'll need to remove error checks (like for the
 existance of relfilenodes), remove rewriting, and such.

 Yeah, I was getting at it being rather invasive earlier.  It really
 depends on exactly what we'd support in this mode, which would depend on
 just what would be invasive and what wouldn't, I expect.  I dislike the
 idea of not being able to actually run a real migration script though as
 anything else opens the very real possibility that the real script and
 the 'explain' script don't do the same thing, making this capability not
 nearly as useful..

I think this is the real issue.  Somebody's got to come up with an
infrastructure for reporting locks to be taken without actually
performing the corresponding actions, and it's got to go through
basically the same code paths as it would if we really did it.  That's
going to be complicated.  But if we make the EXPLAIN code completely
separate from the real code, it will be easy for them to get out of
sync with each other, and then the answers will be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DDL Damage Assessment

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 I'm thinking it would be better to have something you could set at a session 
 level, so you don't have to stick EXPLAIN in front of all your DDL.

Right, I'm agreed there.

 As for the dry-run idea, I don't think that's really necessary. I've never 
 seen anyone serious that doesn't have a development environment, which is 
 where you would simply deploy the real DDL using verbose mode and see what 
 the underlying commands actually do.

That's certainly an interesting point and perhaps what we'd do is,
instead, have a collect info on locks needed mode- but otherwise, let
everything run as-is.  You could then take the report at the end of the
transaction and use it to identify what would be needed in production
and maybe even have a script created which grabs all the locks using
'nowait' or fails the whole thing if it isn't possible..

Of course, we kind of have that already...  Just look at the locks
you've acquired at the end of the transaction..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] WAL format and API changes (9.5)

On 2014-10-03 15:51:37 +0300, Heikki Linnakangas wrote:
 After a lot of experimentation, I figured out that the slowdown is already
 apparent with the *first* patch, the one that just refactors existing
 XLogInsert, without any changes to the WAL format or to the callers. I
 fiddled with that for a long time, trying to tease apart the change that
 makes the difference, and was finally able to narrow it down.
 
 Attached are two patches. These are both just refactoring, with no changes
 to the WAL format or APIs. The first is the same as the refactoring patch I
 posted earlier, with only minor changes to #includes, comments and such (per
 Alvaro's and Michael's suggestions - thanks!). With the first patch, the
 test case I've been using to performance test this becomes somewhere between
 5% - 10% slower. Applying the second patch on top of that restores the
 performance back to what you get without these patches.
 
 Strange. The second patch moves the CRC calculation from a separate loop
 over the XLogRecDatas to the earlier loop, that also iterates through the
 XLogRecDatas. The strange thing about this is that when I tried to make the
 same change to current git master, without applying the first patch, it
 didn't make any difference. The CRC calculation used to integrated to the
 earlier loop in 9.1 and before, but in 9.2 it was moved to a separate loop
 for simplicity, because it didn't make any difference to performance.
 
 So I now have a refactoring patch ready that I'd like to commit (the
 attached two patches together), but to be honest, I have no idea why the
 second patch is so essential to performance.

Maybe a stupid question, but did you verify it's not caused by splitting
xloginsert over two translation units and/or not inlining the separate
function?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)


On 10/03/2014 04:10 PM, Andres Freund wrote:

On 2014-10-03 15:51:37 +0300, Heikki Linnakangas wrote:

After a lot of experimentation, I figured out that the slowdown is already
apparent with the *first* patch, the one that just refactors existing
XLogInsert, without any changes to the WAL format or to the callers. I
fiddled with that for a long time, trying to tease apart the change that
makes the difference, and was finally able to narrow it down.

Attached are two patches. These are both just refactoring, with no changes
to the WAL format or APIs. The first is the same as the refactoring patch I
posted earlier, with only minor changes to #includes, comments and such (per
Alvaro's and Michael's suggestions - thanks!). With the first patch, the
test case I've been using to performance test this becomes somewhere between
5% - 10% slower. Applying the second patch on top of that restores the
performance back to what you get without these patches.

Strange. The second patch moves the CRC calculation from a separate loop
over the XLogRecDatas to the earlier loop, that also iterates through the
XLogRecDatas. The strange thing about this is that when I tried to make the
same change to current git master, without applying the first patch, it
didn't make any difference. The CRC calculation used to integrated to the
earlier loop in 9.1 and before, but in 9.2 it was moved to a separate loop
for simplicity, because it didn't make any difference to performance.

So I now have a refactoring patch ready that I'd like to commit (the
attached two patches together), but to be honest, I have no idea why the
second patch is so essential to performance.


Maybe a stupid question, but did you verify it's not caused by splitting
xloginsert over two translation units and/or not inlining the separate
function?


Hmph, now that I try this again, I don't see any difference with or 
without the second patch - they both perform just as well as unpatched 
master. So I'm totally baffled again, but I guess the patch is good to 
go. I need a more stable test environment..


Yeah, I did try merging xlog.c and xloginsert.c into the same .c file at 
some point, and marking the XLogInsertRecData function as static, 
allowing the compiler to inline it, but I didn't see any difference. But 
you have to take that with a grain of salt - I'm apparently not capable 
of constructing a properly repeatable test case for this.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

2014-10-03 Thread Gregory Smith


On 10/3/14, 8:26 AM, Andres Freund wrote:

#define NUM_XLOGINSERT_LOCKS  1
tps = 52.711939 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  8
tps = 286.496054 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  16
tps = 346.113313 (including connections establishing)
#define NUM_XLOGINSERT_LOCKS  24
tps = 363.242111 (including connections establishing)


Just to clarify:  that 10% number I threw out was meant as a rough 
estimate for a system with the default configuration, which is all that 
I tested.  It seemed like people would likely need to tune all the usual 
things like checkpoint_segments, shared_buffers, etc. as well before 
seeing much better.  You did all that, and sure enough the gain went up; 
thanks for confirming my guess.


I still don't think that means this needs a GUC for 9.4.  Look at that 
jump from 1 to 8.  The low-hanging fruit here hasn't just been knocked 
off.  It's been blended into a frozen drink, poured into a glass, and 
had a little paper umbrella put on top.  I think that's enough for 9.4.  
But, yes, let's see if we can add delivery to the side of the pool in 
the next version too.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On 2014-10-03 10:07:39 -0400, Gregory Smith wrote:
 On 10/3/14, 8:26 AM, Andres Freund wrote:
 #define NUM_XLOGINSERT_LOCKS  1
 tps = 52.711939 (including connections establishing)
 #define NUM_XLOGINSERT_LOCKS  8
 tps = 286.496054 (including connections establishing)
 #define NUM_XLOGINSERT_LOCKS  16
 tps = 346.113313 (including connections establishing)
 #define NUM_XLOGINSERT_LOCKS  24
 tps = 363.242111 (including connections establishing)
 
 Just to clarify:  that 10% number I threw out was meant as a rough estimate
 for a system with the default configuration, which is all that I tested.  It
 seemed like people would likely need to tune all the usual things like
 checkpoint_segments, shared_buffers, etc. as well before seeing much better.
 You did all that, and sure enough the gain went up; thanks for confirming my
 guess.
 
 I still don't think that means this needs a GUC for 9.4.  Look at that jump
 from 1 to 8.  The low-hanging fruit here hasn't just been knocked off.  It's
 been blended into a frozen drink, poured into a glass, and had a little
 paper umbrella put on top.  I think that's enough for 9.4.  But, yes, let's
 see if we can add delivery to the side of the pool in the next version too.

So 25% performance on a relatively small machine improvements aren't
worth a GUC? That are likely to be larger on a bigger machine?

I utterly fail to see why that's a service to our users.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Escaping from blocked send() reprised.


On 09/28/2014 01:54 AM, Andres Freund wrote:

I've invested some more time in this:


Thanks!

In 0001, the select() codepath will not return (WL_SOCKET_READABLE | 
WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the 
poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE 
was requested as a wake-event, and likewise for writeable, while the 
poll() codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) 
regardless of the requested wake-events. I'm not sure which is actually 
better - a separate WL_SOCKET_ERROR code might be best - but it's 
inconsistent as it is.



0002 now makes sense on its own and doesn't change anything around the
  interrupt handling. Oh, and it compiles without 0003.


WaitLatchOrSocket() can throw an error, so it's not totally safe to call 
that underneath OpenSSL. Admittedly the cases where it throws an error 
are shouldn't happen cases like poll() failed or read() on 
self-pipe failed, but still. Perhaps those errors should be 
reclassified as FATAL; it's not clear you can just roll back and expect 
to continue running if any of them happens.


In secure_raw_write(), you need to save and restore errno, as 
WaitLatchOrSocket will not preserve it. If secure_raw_write() calls 
WaitLatchOrSocket(), and it returns because the latch was set, and we 
fall out of secure_raw_write, we will return -1 but the errno might not 
be set to anything sensible anymore.



0003 Sinval/notify processing got simplified further. There really isn't
  any need for DisableNotifyInterrupt/DisableCatchupInterrupt
  anymore. Also begin_client_read/client_read_ended don't make much
  sense anymore. Instead introduce ProcessClientReadInterrupt (which
  wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
  set. All of that happens via the latch mechanism, nothing happens
  inside signal handlers. So I do think it's quite an improvement
  over what's been discussed in this thread.
  But it (and the other approaches) do noticeably increase the
  likelihood of clients not getting the error message if the client
  isn't actually dead. The likelihood of write() being blocked
  *temporarily* due to normal bandwidth constraints is quite high
  when you consider COPY FROM and similar. Right now we'll wait till
  we can put the error message into the socket afaics.

1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.


Yeah, 1-3 seem sane. 4 also looks OK to me at a quick glance. It 
basically enables handling the die interrupt immediately, if we're 
blocked in a read or write. It won't be handled in the signal handler, 
but within the secure_read/write call anyway.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Escaping from blocked send() reprised.

Hi,

On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
 On 09/28/2014 01:54 AM, Andres Freund wrote:
 I've invested some more time in this:
 
 Thanks!
 
 In 0001, the select() codepath will not return (WL_SOCKET_READABLE |
 WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the
 poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE was
 requested as a wake-event, and likewise for writeable, while the poll()
 codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) regardless of
 the requested wake-events. I'm not sure which is actually better - a
 separate WL_SOCKET_ERROR code might be best - but it's inconsistent as it
 is.

Hm. Right. I think we should only report the requested state. We can't
really discern wether it's a hangup, error or actually readable/writable
with select() - it just returns the socket as readable/writable as soon
as it doesn't block anymore. Where not blocking includes the connection
having gone bad.

It took me a while to figure out whether that's actually guaranteed by
the spec, but I'm pretty sure it is...

 0002 now makes sense on its own and doesn't change anything around the
   interrupt handling. Oh, and it compiles without 0003.
 
 WaitLatchOrSocket() can throw an error, so it's not totally safe to call
 that underneath OpenSSL.

Hm. Fair point.

 Admittedly the cases where it throws an error are
 shouldn't happen cases like poll() failed or read() on self-pipe
 failed, but still. Perhaps those errors should be reclassified as FATAL;
 it's not clear you can just roll back and expect to continue running if any
 of them happens.

Fine with me.

 In secure_raw_write(), you need to save and restore errno, as
 WaitLatchOrSocket will not preserve it. If secure_raw_write() calls
 WaitLatchOrSocket(), and it returns because the latch was set, and we fall
 out of secure_raw_write, we will return -1 but the errno might not be set to
 anything sensible anymore.

Oops.

 0003 Sinval/notify processing got simplified further. There really isn't
   any need for DisableNotifyInterrupt/DisableCatchupInterrupt
   anymore. Also begin_client_read/client_read_ended don't make much
   sense anymore. Instead introduce ProcessClientReadInterrupt (which
   wants a better name).
 There's also a very WIP
 0004 Allows secure_read/write be interrupted when ProcDiePending is
   set. All of that happens via the latch mechanism, nothing happens
   inside signal handlers. So I do think it's quite an improvement
   over what's been discussed in this thread.
   But it (and the other approaches) do noticeably increase the
   likelihood of clients not getting the error message if the client
   isn't actually dead. The likelihood of write() being blocked
   *temporarily* due to normal bandwidth constraints is quite high
   when you consider COPY FROM and similar. Right now we'll wait till
   we can put the error message into the socket afaics.
 
 1-3 need some serious comment work, but I think the approach is
 basically sound. I'm much, much less sure about allowing send() to be
 interrupted.
 
 Yeah, 1-3 seem sane.

I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...

Tom, do you happen to have time to look at that bit?


There's also the concern that using a latch for client communication
increases the number of syscalls for the same work. We should at least
try to quantify that...

 4 also looks OK to me at a quick glance. It basically
 enables handling the die interrupt immediately, if we're blocked in a read
 or write. It won't be handled in the signal handler, but within the
 secure_read/write call anyway.

What are you thinking about the concern that it'll reduce the likelihood
of transferring the error message to the client? I tried to reduce that
by only allowing errors when write() blocks, but that's not an
infrequent event.

Thanks for looking.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Trailing comma support in SELECT statements

2014-10-03 Thread Bogdan Pilch

Hi,
I have created a small patch to postgres source (in particular the
psql part of it) that accepts trailing comma at the end of list in
SELECT statement.

The idea is to be able to say both (with the same result):
SELECT a, b, c from t;
SELECT a, b, c, from t;

Attached you can find a patch containing regression test (incorporated
into the serial_schedule).
My patch is relative to origin/REL9_4_STABLE branch as that is the one
I started from.

My plea is to have this change merged into the main stream so that it
becomes available in upcoming releases.

This modification does not require any interaction with user.
It does not create any backward compatibility issues.
Not does it have any performance impact.

regards
bogdan
From 450c339b4284887782b30e154766a0ee90d6f7ee Mon Sep 17 00:00:00 2001
From: Bogdan Pilch bogdan.pi...@opensynergy.com
Date: Sat, 16 Aug 2014 19:42:29 +0200
Subject: [PATCH 1/3] BPI: Added support for ignoring the trailing comma in
 select statement

---
 src/backend/parser/gram.y | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7b9895d..345c6cb 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -12470,6 +12470,7 @@ ctext_row: '(' ctext_expr_list ')'	{ $$ = $2; }
  */
 
 opt_target_list: target_list		{ $$ = $1; }
+			| target_list ','{ $$ = $1; }
 			| /* EMPTY */			{ $$ = NIL; }
 		;
 
-- 
1.9.1


From 9faf5eec4975eb99ad7c8901e30742ba92c0c4cb Mon Sep 17 00:00:00 2001
From: Bogdan Pilch bogdan.pi...@opensynergy.com
Date: Sun, 28 Sep 2014 13:12:24 +0200
Subject: [PATCH 3/3] Added regression test for trailing comma select feature.

---
 .../regress/expected/select_trailing_comma.out | 53 ++
 src/test/regress/serial_schedule   |  1 +
 src/test/regress/sql/select_trailing_comma.sql | 16 +++
 3 files changed, 70 insertions(+)
 create mode 100644 src/test/regress/expected/select_trailing_comma.out
 create mode 100644 src/test/regress/sql/select_trailing_comma.sql

diff --git a/src/test/regress/expected/select_trailing_comma.out b/src/test/regress/expected/select_trailing_comma.out
new file mode 100644
index 000..f84938c
--- /dev/null
+++ b/src/test/regress/expected/select_trailing_comma.out
@@ -0,0 +1,53 @@
+--
+-- SELECT WITH TRAILING COMMA
+--
+CREATE TEMP TABLE primes (p1 int, p2 int, p3 int);
+INSERT INTO primes VALUES (13, 7927, 7);
+SELECT * FROM primes;
+ p1 |  p2  | p3 
++--+
+ 13 | 7927 |  7
+(1 row)
+
+SELECT *, FROM primes;
+ p1 |  p2  | p3 
++--+
+ 13 | 7927 |  7
+(1 row)
+
+SELECT p1 FROM primes;
+ p1 
+
+ 13
+(1 row)
+
+SELECT p1, FROM primes;
+ p1 
+
+ 13
+(1 row)
+
+SELECT p1, p2 FROM primes;
+ p1 |  p2  
++--
+ 13 | 7927
+(1 row)
+
+SELECT p1, p2, FROM primes;
+ p1 |  p2  
++--
+ 13 | 7927
+(1 row)
+
+SELECT p1, p2, p3 FROM primes;
+ p1 |  p2  | p3 
++--+
+ 13 | 7927 |  7
+(1 row)
+
+SELECT p1, p2, p3, FROM primes;
+ p1 |  p2  | p3 
++--+
+ 13 | 7927 |  7
+(1 row)
+
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 16a1905..3571d14 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -79,6 +79,7 @@ test: select_distinct
 test: select_distinct_on
 test: select_implicit
 test: select_having
+test: select_trailing_comma
 test: subselect
 test: union
 test: case
diff --git a/src/test/regress/sql/select_trailing_comma.sql b/src/test/regress/sql/select_trailing_comma.sql
new file mode 100644
index 000..a2c922f
--- /dev/null
+++ b/src/test/regress/sql/select_trailing_comma.sql
@@ -0,0 +1,16 @@
+--
+-- SELECT WITH TRAILING COMMA
+--
+
+CREATE TEMP TABLE primes (p1 int, p2 int, p3 int);
+
+INSERT INTO primes VALUES (13, 7927, 7);
+
+SELECT * FROM primes;
+SELECT *, FROM primes;
+SELECT p1 FROM primes;
+SELECT p1, FROM primes;
+SELECT p1, p2 FROM primes;
+SELECT p1, p2, FROM primes;
+SELECT p1, p2, p3 FROM primes;
+SELECT p1, p2, p3, FROM primes;
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

Michael Banck wrote:

 diff --git a/src/backend/access/transam/xlog.c 
 b/src/backend/access/transam/xlog.c
 index 5a4dbb9..f2716ae 100644
 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -8085,10 +8085,14 @@ CreateCheckPoint(int flags)
  
   /*
* If enabled, log checkpoint start.  We postpone this until now so as 
 not
 -  * to log anything if we decided to skip the checkpoint.
 +  * to log anything if we decided to skip the checkpoint.  If we are 
 during
 +  * shutdown and checkpoints are not being logged, add a log message 
 that a 
 +  * checkpoint is to be written and shutdown is potentially delayed.
*/
   if (log_checkpoints)
   LogCheckpointStart(flags, false);
 + else if (shutdown)
 + ereport(LOG, (errmsg(waiting for checkpoint ...)));
  
   TRACE_POSTGRESQL_CHECKPOINT_START(flags);

I think if we're going to emit log messages for shutdown checkpoints, we
ought to use the same format we already have, i.e. instead of having the
separate waiting for checkpoint message, just test log_checkpoints ||
shutdown, then LogCheckpointStart.  And for consistency also make sure
that the checkpoint end log line is also reported on shutdown
checkpoints regardless of log_checkpoints.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Escaping from blocked send() reprised.


On 10/03/2014 05:26 PM, Andres Freund wrote:

On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:

On 09/28/2014 01:54 AM, Andres Freund wrote:

0003 Sinval/notify processing got simplified further. There really isn't
  any need for DisableNotifyInterrupt/DisableCatchupInterrupt
  anymore. Also begin_client_read/client_read_ended don't make much
  sense anymore. Instead introduce ProcessClientReadInterrupt (which
  wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
  set. All of that happens via the latch mechanism, nothing happens
  inside signal handlers. So I do think it's quite an improvement
  over what's been discussed in this thread.
  But it (and the other approaches) do noticeably increase the
  likelihood of clients not getting the error message if the client
  isn't actually dead. The likelihood of write() being blocked
  *temporarily* due to normal bandwidth constraints is quite high
  when you consider COPY FROM and similar. Right now we'll wait till
  we can put the error message into the socket afaics.

1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.


Yeah, 1-3 seem sane.


I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...


I only looked at the .patch, I didn't apply it, so I didn't look at the 
context much. But I don't see any fundamental problem with it. I would 
like to have a closer look before it's committed, though.



There's also the concern that using a latch for client communication
increases the number of syscalls for the same work. We should at least
try to quantify that...


I'm not too concerned about that, since we only do extra syscalls when 
the socket isn't immediately available for reading/writing, i.e. when we 
have to sleep anyway.



4 also looks OK to me at a quick glance. It basically
enables handling the die interrupt immediately, if we're blocked in a read
or write. It won't be handled in the signal handler, but within the
secure_read/write call anyway.


What are you thinking about the concern that it'll reduce the likelihood
of transferring the error message to the client? I tried to reduce that
by only allowing errors when write() blocks, but that's not an
infrequent event.


I'm not too concerned about that either. I mean, it's probably true that 
it reduces the likelihood, but I don't particularly care myself. But if 
we care, we could use a timeout there, so that if we receive a SIGTERM 
while blocked on a send(), we wait for a few seconds to see if we can 
send whatever we were sending, before terminating the backend.



What should we do with this patch in the commitfest? Are you planning to 
clean up and commit these patches?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [RFC] Incremental backup v2: add backup profile to base backup

2014-10-03 Thread Marco Nenciarini

Hi Hackers,

I've updated the wiki page
https://wiki.postgresql.org/wiki/Incremental_backup following the result
of discussion on hackers.

Compared to first version, we switched from a timestamp+checksum based
approach to one based on LSN.

This patch adds an option to pg_basebackup and to replication protocol
BASE_BACKUP command to generate a backup_profile file. It is almost
useless by itself, but it is the foundation on which we will build the
file based incremental backup (and hopefully a block based incremental
backup after it).

Any comment will be appreciated. In particular I'd appreciate comments
on correctness of relnode files detection and LSN extraction code.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it


backup_profile_v2.patch.gz
Description: GNU Zip compressed data


signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

On Thu, Oct  2, 2014 at 11:50:14AM +0200, Andres Freund wrote:
 The first problem that comes to my mind about collecting enough data is
 that we have a very large number of lwlocks (fixed_number + 2 *
 shared_buffers). One 'trivial' way of implementing this is to have a per
 backend array collecting the information, and then a shared one
 accumulating data from it over time. But I'm afraid that's not going to
 fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...
 
 Perhaps we should somehow encode this different for individual lwlock
 tranches? It's far less problematic to collect all this information for
 all but the buffer lwlocks...

First, I think this could be a major Postgres feature, and I am excited
someone is working on this.

As far as gathering data, I don't think we are going to do any better in
terms of performance/simplicity/reliability than to have a single PGPROC
entry to record when we enter/exit a lock, and having a secondary
process scan the PGPROC array periodically.

What that gives us is almost zero overhead on backends, high
reliability, and the ability of the scan daemon to give higher weights
to locks that are held longer.  Basically, if you just stored the locks
you held and released, you either have to add timing overhead to the
backends, or you have no timing information collected.  By scanning
active locks, a short-lived lock might not be seen at all, while a
longer-lived lock might be seen by multiple scans.  What that gives us
is a weighting of the lock time with almost zero overhead.   If we want
finer-grained lock statistics, we just increase the number of scans per
second.

I am assuming almost no one cares about the number of locks, but rather
they care about cummulative lock durations.

I am having trouble seeing any other option that has such a good
cost/benefit profile.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo fixes in src/backend/rewrite/rewriteHandler.c

On Thu, Oct 2, 2014 at 2:55 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Here is the comments in process_matched_tle() in rewriteHandler.c.

 883  * such nodes; consider
 884  *  UPDATE tab SET col.fld1.subfld1 = x, col.fld2.subfld2 = y
 885  * The two expressions produced by the parser will look like
 886  *  FieldStore(col, fld1, FieldStore(placeholder, subfld1, x))
 887  *  FieldStore(col, fld2, FieldStore(placeholder, subfld2, x))

 I think the second one is not correct and should be

 FieldStore(col, fld2, FieldStore(placeholder, subfld2, y))

 Just like this,

 891  *  FieldStore(FieldStore(col, fld1,
 892  *FieldStore(placeholder, subfld1, x)),
 893  * fld2, FieldStore(placeholder, subfld2, x))

 should be

 FieldStore(FieldStore(col, fld1,
   FieldStore(placeholder, subfld1, x)),
fld2, FieldStore(placeholder, subfld2, y))

 Patch attached.

Looks right to me.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

On Fri, Oct 3, 2014 at 11:33 AM, Bruce Momjian br...@momjian.us wrote:
 I am assuming almost no one cares about the number of locks, but rather
 they care about cummulative lock durations.

 I am having trouble seeing any other option that has such a good
 cost/benefit profile.

I do think that the instrumentation data gathered by LWLOCK_STATS is
useful - very useful.

But it does have significant overhead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup


On 10/03/2014 06:31 PM, Marco Nenciarini wrote:

Hi Hackers,

I've updated the wiki page
https://wiki.postgresql.org/wiki/Incremental_backup following the result
of discussion on hackers.

Compared to first version, we switched from a timestamp+checksum based
approach to one based on LSN.

This patch adds an option to pg_basebackup and to replication protocol
BASE_BACKUP command to generate a backup_profile file. It is almost
useless by itself, but it is the foundation on which we will build the
file based incremental backup (and hopefully a block based incremental
backup after it).


I'd suggest jumping straight to block-based incremental backup. It's not 
significantly more complicated to implement, and if you implement both 
separately, then we'll have to support both forever. If you really need 
to, you can implement file-level diff as a special case, where the 
server sends all blocks in the file, if any of them have an LSN  the 
cutoff point. But I'm not sure if there's point in that, once you have 
block-level support.


If we're going to need a profile file - and I'm not convinced of that - 
is there any reason to not always include it in the backup?



Any comment will be appreciated. In particular I'd appreciate comments
on correctness of relnode files detection and LSN extraction code.


I didn't look at it in detail, but one future problem comes to mind: 
Once you implement the server-side code that only sends a file if its 
LSN is higher than the cutoff point that the client gave, you'll have to 
scan the whole file first, to see if there are any blocks with a higher 
LSN. At least until you find the first such block. So with a file-level 
implementation of this sort, you'll have to scan all files twice, in the 
worst case.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-03 Thread Ilya Kosmodemiansky

On Fri, Oct 3, 2014 at 5:33 PM, Bruce Momjian br...@momjian.us wrote:
 As far as gathering data, I don't think we are going to do any better in
 terms of performance/simplicity/reliability than to have a single PGPROC
 entry to record when we enter/exit a lock, and having a secondary
 process scan the PGPROC array periodically.

That was the point.


 What that gives us is almost zero overhead on backends, high
 reliability, and the ability of the scan daemon to give higher weights
 to locks that are held longer.  Basically, if you just stored the locks
 you held and released, you either have to add timing overhead to the
 backends, or you have no timing information collected.  By scanning
 active locks, a short-lived lock might not be seen at all, while a
 longer-lived lock might be seen by multiple scans.  What that gives us
 is a weighting of the lock time with almost zero overhead.   If we want
 finer-grained lock statistics, we just increase the number of scans per
 second.

So I could add the function, which will accumulate the data in some
view/table (with weights etc). How it should be called? From specific
process? From some existing maintenance process such as autovacuum?
Should I implement GUC for example lwlock_pull_rate, 0 for off, from 1
to 10 for 1 to 10 samples pro second?


 I am assuming almost no one cares about the number of locks, but rather
 they care about cummulative lock durations.

Oracle and DB2 measure both,  cummulative durations and counts.


 I am having trouble seeing any other option that has such a good
 cost/benefit profile.

At least cost. In Oracle documentation clearly stated, that it is all
about diagnostic convenience, performance impact is significant.


 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + Everyone has their own god. +



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

2014-10-03 Thread Ilya Kosmodemiansky

On Fri, Oct 3, 2014 at 5:51 PM, Robert Haas robertmh...@gmail.com wrote:
 I do think that the instrumentation data gathered by LWLOCK_STATS is
 useful - very useful.

Sure, quite useful.

But how about this comment:

/*
 * The LWLock stats will be updated within a critical section, which
 * requires allocating new hash entries. Allocations within a critical
 * section are normally not allowed because running out of memory would
 * lead to a PANIC, but LWLOCK_STATS is debugging code that's not normally
 * turned on in production, so that's an acceptable risk. The hash entries
 * are small, so the risk of running out of memory is minimal in practice.
 */


 But it does have significant overhead.

I will say that it is a bit more than overhead for production use.




-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

Stephen Frost wrote:
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

  I am rather surprised that nobody has reported this problem before.  I
  am now of the mind that this is clearly a bug that should be fixed all
  the way back.
 
 I'm coming around to that also, however, should we worry about users who
 set per-table settings and then simply forgot about them?  I suppose
 that won't matter too much unless the table is really active, and if it
 is, they've probably already set it to zero.

Right.

For the cases where it's been set and forgotten, perhaps we can have an
item in release notes to tell people to look into tables with the
parameters set in pg_class.reloptions (to any value different from zero)
and to look for performance differences from previous versions when they
are.

I have pushed this now, backpatch all the way back to 9.0.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers


On 09/16/2014 09:09 PM, Brightwell, Adam wrote:



I think there's been some changes to this patch since july, care to
resend a new version?


Sure, here it is.

The only difference with the previous version is that it now also
supports column defaults.  This was found to be a problem when you drop
a sequence that some column default depends on -- for example a column
declared SERIAL, or a sequence marked with ALTER SEQUENCE OWNED BY.  The
new code is able to drop both the sequence and the default value
(leaving, of course, the rest of the column intact.)  This required
adding support for such objects in get_object_address.



I have given this patch the following review:

- Apply to current master (77e65bf).  -- success
- check-world. --success
- multiple FIXME statements still exist -- are there plans to fix these
items? Can the duplicated code be extracted to a static function?


Nothing seems to be happening to this, so I'm marking this as returned 
with feedback.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup

2014-10-03 Thread Marco Nenciarini

Il 03/10/14 17:53, Heikki Linnakangas ha scritto:
 If we're going to need a profile file - and I'm not convinced of that -
 is there any reason to not always include it in the backup?
 

The main reason is to have a centralized list of files that need to be
present. Without a profile, you have to insert some sort of placeholder
for kipped files. Moreover, the profile allows you to quickly know the
size of the recovered backup (by simply summing the individual size).
Another use could be to 'validate' the presence of all required files in
a backup.

 Any comment will be appreciated. In particular I'd appreciate comments
 on correctness of relnode files detection and LSN extraction code.
 
 I didn't look at it in detail, but one future problem comes to mind:
 Once you implement the server-side code that only sends a file if its
 LSN is higher than the cutoff point that the client gave, you'll have to
 scan the whole file first, to see if there are any blocks with a higher
 LSN. At least until you find the first such block. So with a file-level
 implementation of this sort, you'll have to scan all files twice, in the
 worst case.
 

It's true. To solve this you have to keep a central maxLSN directory,
but I think it introduces more issues than it solves.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Last Commitfest patches waiting review

Thanks to everyone's who's reviewed a patch so far.  One last crunch, 
and we'll be through.


We have 7 patches left in Needs Review state:

pg_receivexlog: addition of --create/--drop to create/drop repslots

  Waiting for Magnus. Amit promised to take a look if Magnus continues
  to be busy.

Sort support for text with strxfrm() poor man's keys

  Robert? What do you think of Peter's latest patch?

add latency limit to pgbench throttling (--rate)

  I'm waiting for Greg Smith to have a look at the latest patch.

Escaping from blocked send() by pg_terminate_backend().

  Andres posted a patch series using a completely different approach. I
  guess this should be just marked as returned with feedback, if we're
  going to pursue the different approach instead.

Fix local_preload_libraries to work as expected.
pg_dump refactor to remove global variables

  Peter: Ping? Will you have the time to review these?

Fix xpath() to return namespace definitions (fixes the issue with nested 
or repeated xpath())


  Peter, you've done some XML stuff before; could you have a look at 
this too?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trailing comma support in SELECT statements

Bogdan Pilch bog...@matfyz.cz writes:
 I have created a small patch to postgres source (in particular the
 psql part of it) that accepts trailing comma at the end of list in
 SELECT statement.

This doesn't seem to me to be a remarkably good idea.  What's the
difference between this and accepting random misspellings of SELECT,
allowing mismatched parentheses in expressions, etc etc?  It's important
in a computer language to be able to catch typos.

If we were going to be lax about trailing commas, the SELECT list
would hardly be the only candidate, or even the first candidate,
for being lax that way.  But I don't want to go there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trailing comma support in SELECT statements

On Sun, Sep 28, 2014 at 01:42:46PM +0200, Bogdan Pilch wrote:
 Hi,
 I have created a small patch to postgres source (in particular the
 psql part of it) that accepts trailing comma at the end of list in
 SELECT statement.
 
 The idea is to be able to say both (with the same result):
 SELECT a, b, c from t;
 SELECT a, b, c, from t;
 
 Attached you can find a patch containing regression test (incorporated
 into the serial_schedule).
 My patch is relative to origin/REL9_4_STABLE branch as that is the one
 I started from.
 
 My plea is to have this change merged into the main stream so that it
 becomes available in upcoming releases.
 
 This modification does not require any interaction with user.
 It does not create any backward compatibility issues.

Interesting --- I know some languages allow trailing delimiters, like
Perl and Javascript.  Could this mask query errors?  Does any other
database accept this?  Seems this would need to be done in many other
places, like UPDATE, but let's first decide if we want this.

FYI, it is usually better to discuss a feature before showing a patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup

2014-10-03 Thread Claudio Freire

On Fri, Oct 3, 2014 at 1:08 PM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 Any comment will be appreciated. In particular I'd appreciate comments
 on correctness of relnode files detection and LSN extraction code.

 I didn't look at it in detail, but one future problem comes to mind:
 Once you implement the server-side code that only sends a file if its
 LSN is higher than the cutoff point that the client gave, you'll have to
 scan the whole file first, to see if there are any blocks with a higher
 LSN. At least until you find the first such block. So with a file-level
 implementation of this sort, you'll have to scan all files twice, in the
 worst case.


 It's true. To solve this you have to keep a central maxLSN directory,
 but I think it introduces more issues than it solves.

I see that as a worthy optimization on the server side, regardless of
whether file or block-level backups are used, since it allows
efficient skipping of untouched segments (common for append-only
tables).

Still, it would be something to do after it works already (ie: it's an
optimization)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE IF NOT EXISTS INDEX

2014-10-03 Thread Fabrízio de Royes Mello

On Fri, Oct 3, 2014 at 5:26 AM, Marti Raudsepp ma...@juffo.org wrote:

 On Fri, Oct 3, 2014 at 6:25 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Documentation: I would prefer if the explanation were consistent with

  Do not throw an error if the index already exists. A notice is issued
in
  this case.
  Fixed in that way. Ok?

 And also Note that there is no guarantee that the existing index is
 anything like the one that would have been created.


Fixed.


  I think ERRCODE_SYNTAX_ERROR makes more sense, it's something that we
  decided we *don't want* to support.
  I don't think so. It's the same as CREATE SCHEMA IF NOT EXISTS that not
  support to include schema elements.

 IMO that's wrong too, the CREATE SCHEMA documentation doesn't list it
 as valid syntax.

 But now that you split the syntax in two, you can simply replace
 opt_index_name with index_name and it will naturally cause a
 syntax error without the need for an if(). What do you think?
 Patch attached, which applies on top of your v4 patch.


Fine. Thanks.


 On Fri, Oct 3, 2014 at 6:29 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  I would also move this check to after all the attributes have been
  assigned, rather than splitting the assignments in half.
  Why? If you see other places in gram.y it's a common usage...

 Looks cleaner to me: first input all the fields, then validate. And
 there are examples like this too, like COPY select_with_parens. But
 this is moot now, if you agree with my grammar change.


I agree with your grammar change.


 On Fri, Oct 3, 2014 at 6:35 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  And just to remember please add your review to commitfest

 Thanks for reminding, I always forget to update the CommitFest... :(


You're welcome.

The version 5 (attached) contains all discussed until now.

Regards.

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index e469b17..2dfe2f4 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/replaceable ] ON replaceable class=parametertable_name/replaceable [ USING replaceable class=parametermethod/replaceable ]
+CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ IF NOT EXISTS replaceable class=parametername/replaceable | replaceable class=parametername/replaceable ] ON replaceable class=parametertable_name/replaceable [ USING replaceable class=parametermethod/replaceable ]
 ( { replaceable class=parametercolumn_name/replaceable | ( replaceable class=parameterexpression/replaceable ) } [ COLLATE replaceable class=parametercollation/replaceable ] [ replaceable class=parameteropclass/replaceable ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
 [ WITH ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] ) ]
 [ TABLESPACE replaceable class=parametertablespace_name/replaceable ]
@@ -99,6 +99,17 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/
 
 variablelist
  varlistentry
+  termliteralIF NOT EXISTS/literal/term
+  listitem
+   para
+Do not throw an error if the index already exists. A notice is issued
+in this case. Note that there is no guarantee that the existing index
+is anything like the one that would have been created.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termliteralUNIQUE/literal/term
   listitem
para
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ee10594..8905e30 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -697,7 +697,8 @@ index_create(Relation heapRelation,
 			 bool allow_system_table_mods,
 			 bool skip_build,
 			 bool concurrent,
-			 bool is_internal)
+			 bool is_internal,
+			 bool if_not_exists)
 {
 	Oid			heapRelationId = RelationGetRelid(heapRelation);
 	Relation	pg_class;
@@ -773,10 +774,22 @@ index_create(Relation heapRelation,
 		elog(ERROR, shared relations must be placed in pg_global tablespace);
 
 	if (get_relname_relid(indexRelationName, namespaceId))
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
+	(errcode(ERRCODE_DUPLICATE_TABLE),
+	 errmsg(relation \%s\ already exists, skipping,
+			indexRelationName)));
+			heap_close(pg_class, RowExclusiveLock);
+			return InvalidOid;
+		}
+
 		ereport(ERROR,
 (errcode(ERRCODE_DUPLICATE_TABLE),
  errmsg(relation \%s\ already exists,
 		indexRelationName)));
+	}
 
 	/*
 	 * construct tuple

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On Fri, Oct  3, 2014 at 04:11:30PM +0200, Andres Freund wrote:
 On 2014-10-03 10:07:39 -0400, Gregory Smith wrote:
  On 10/3/14, 8:26 AM, Andres Freund wrote:
  #define NUM_XLOGINSERT_LOCKS  1
  tps = 52.711939 (including connections establishing)
  #define NUM_XLOGINSERT_LOCKS  8
  tps = 286.496054 (including connections establishing)
  #define NUM_XLOGINSERT_LOCKS  16
  tps = 346.113313 (including connections establishing)
  #define NUM_XLOGINSERT_LOCKS  24
  tps = 363.242111 (including connections establishing)
  
  Just to clarify:  that 10% number I threw out was meant as a rough estimate
  for a system with the default configuration, which is all that I tested.  It
  seemed like people would likely need to tune all the usual things like
  checkpoint_segments, shared_buffers, etc. as well before seeing much better.
  You did all that, and sure enough the gain went up; thanks for confirming my
  guess.
  
  I still don't think that means this needs a GUC for 9.4.  Look at that jump
  from 1 to 8.  The low-hanging fruit here hasn't just been knocked off.  It's
  been blended into a frozen drink, poured into a glass, and had a little
  paper umbrella put on top.  I think that's enough for 9.4.  But, yes, let's
  see if we can add delivery to the side of the pool in the next version too.
 
 So 25% performance on a relatively small machine improvements aren't
 worth a GUC? That are likely to be larger on a bigger machine?
 
 I utterly fail to see why that's a service to our users.

Well, I think the issue is that having a GUC that can't reasonably be
tuned by 95% of our users is nearly useless.  Few users are going to run
benchmarks to see what the optimal value is.

I remember Informix had a setting that had no description except try
different values to see if it helps performance --- we don't want to do
that.

What if we emit a server message if the setting is too low?  That's how
we handle checkpoint_segments.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last Commitfest patches waiting review

2014-10-03 Thread Magnus Hagander

On Fri, Oct 3, 2014 at 6:14 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Thanks to everyone's who's reviewed a patch so far.  One last crunch, and
 we'll be through.

 We have 7 patches left in Needs Review state:

 pg_receivexlog: addition of --create/--drop to create/drop repslots

   Waiting for Magnus. Amit promised to take a look if Magnus continues
   to be busy.

Andres did, not Amit.

And thanks Andres! :) And sorry aobut that one - no way I'll get to it
until at the earliest next week but almost certainly not until the
week after :( So I really appreciate the pickup.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

Heikki Linnakangas wrote:
 On 09/16/2014 09:09 PM, Brightwell, Adam wrote:

 I have given this patch the following review:
 
 - Apply to current master (77e65bf).  -- success
 - check-world. --success
 - multiple FIXME statements still exist -- are there plans to fix these
 items? Can the duplicated code be extracted to a static function?
 
 Nothing seems to be happening to this, so I'm marking this as
 returned with feedback.

Meh.

There are three fixmes in the code.  One can be handled by just removing
the line; we don't really care about duplicating 10 lines of boilerplate
code.  The other two mean missing support for domain constraints and for
default ACLs.  Is there absolutely no feedback to be had on the
mechanism used by the patch?

Since the patch has had good feedback and no further comments arise, I
can just implement support for those two missing object types and push,
and everybody will be happy.  Right?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 There are three fixmes in the code.  One can be handled by just removing
 the line; we don't really care about duplicating 10 lines of boilerplate
 code.  The other two mean missing support for domain constraints and for
 default ACLs.  Is there absolutely no feedback to be had on the
 mechanism used by the patch?
 
 Since the patch has had good feedback and no further comments arise, I
 can just implement support for those two missing object types and push,
 and everybody will be happy.  Right?

In general, I'd say yes, but I'll take a look at the patch now and
provide feedback in a couple hours anyway.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)

On Wed, Oct 1, 2014 at 11:10 AM, Robert Haas robertmh...@gmail.com wrote:
 As far as I can tell, it's configured to run everything.  I just
 checked, though, and found it wedged again.  I'm not sure whether it
 was the same problem, though; I ended up just killing all of the
 postgres processes to fix it.  We may be just at the beginning of an
 exciting debugging journey.

After much more time spent on this, and some help from two people
whose names are both six letters starting with Andre, I discovered
the following:

- anole apparently had not tried to build master in a long time - I
think since May - because of a leftover inst directory instead
HEAD's buildroot.
- anole still refused to build either 9.4 or master until I removed a
couple of leftover socket directories in /tmp.
- after that it hung again trying to run the test_shm_mq regression
tests, in pretty much the same way as before.

The postmaster state looks like this:

0xc044a350:0 in _select_sys+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) bt
#0  0xc044a350:0 in _select_sys+0x30 () from /usr/lib/hpux64/libc.so.1
#1  0xc045f950:0 in select+0x90 () from /usr/lib/hpux64/libc.so.1
#2  0x40a4e510:0 in ServerLoop () at postmaster.c:1540
#3  0x40a48b20:0 in PostmasterMain (argc=3, argv=0x87ffd178)
at postmaster.c:1223
#4  0x408d0a90:0 in main (argc=3, argv=0x87ffd178)
at main.c:227
(gdb) p BackgroundWorkerList
$1 = {head = {next = 0x600e8a58}}
(gdb) p BackgroundWorkerList.head
$2 = {next = 0x600e8a58}
(gdb) p ((char *) BackgroundWorkerList.head) - 264
$3 = 0x600e8950 test_shm_mq
(gdb) p (RegisteredBgWorker *) (((char *) BackgroundWorkerList.head) - 264)
$4 = (struct RegisteredBgWorker *) 0x600e8950
(gdb) p ((RegisteredBgWorker *) (((char *) BackgroundWorkerList.head) - 264))[0]
$5 = {rw_worker = {
bgw_name = test_shm_mq\000\000\001?\000 \000\00511985, '\000'
repeats 13 times, \005T\000\000\000\000\000\001?\000
\000\00512038\000\000\000\000\000\000, bgw_flags = 1, bgw_start_time
= BgWorkerStart_ConsistentState,
bgw_restart_time = -1, bgw_main = 0,
bgw_library_name =
test_shm_mq\000\000\000\005t\000\000\000\000\000\001?\000
\000\00511917, '\000' repeats 13 times,
\005?\000\000\000\000\000\001?\000 \000\0051209,
bgw_function_name = test_shm_mq_main\000\000\000\000\000\001?\000
\000\00511903, '\000' repeats 13 times,
\005?\000\000\000\000\000\001?\000 \000\0051199, bgw_main_arg =
1069289198, bgw_notify_pid = 12361}, rw_backend = 0x0,
  rw_pid = 12369, rw_child_slot = 35, rw_crashed_at = 0, rw_shmem_slot = 6,
  rw_terminate = 0 '\000', rw_lnode = {next = 0x600e8938}}
(gdb) p $5.rw_lnode
$6 = {next = 0x600e8938}
(gdb) p ((RegisteredBgWorker *) (((char *) $5.rw_lnode) - 264))[0]
$7 = {rw_worker = {
bgw_name = test_shm_mq\000\000\001?x\000 \000\00511931, '\000'
repeats 13 times, \004?\000\000\000\000\000\001?}\000
\000\00511939\000\000\000\000\000\000, bgw_flags = 1, bgw_start_time
= BgWorkerStart_ConsistentState,
bgw_restart_time = -1, bgw_main = 0,
bgw_library_name =
test_shm_mq\000\000\000\004?\000\000\000\000\000\001?\000
\000\00511957, '\000' repeats 13 times,
\004?\000\000\000\000\000\001?\000 \000\0051209,
bgw_function_name = test_shm_mq_main\000\000\000\000\000\001?\000
\000\00511956, '\000' repeats 13 times,
\005\024\000\000\000\000\000\001?\000 \000\0051192, bgw_main_arg =
1069289198, bgw_notify_pid = 12361},
  rw_backend = 0x0, rw_pid = 0, rw_child_slot = 0, rw_crashed_at = 0,
  rw_shmem_slot = 5, rw_terminate = 0 '\000', rw_lnode = {
next = 0x600e8818}}
(gdb) p ((RegisteredBgWorker *) (((char *) $7.rw_lnode) - 264))[0]
$8 = {rw_worker = {
bgw_name = test_shm_mq\000\000\001?f\000 \000\00512051, '\000'
repeats 13 times, \0044\000\000\000\000\000\001?g\000
\000\00512047\000\000\000\000\000\000, bgw_flags = 1, bgw_start_time
= BgWorkerStart_ConsistentState,
bgw_restart_time = -1, bgw_main = 0,
bgw_library_name =
test_shm_mq\000\000\000\004T\000\000\000\000\000\001?j\000
\000\00511979, '\000' repeats 13 times,
\004d\000\000\000\000\000\001?k\000 \000\0051197,
bgw_function_name =
test_shm_mq_main\000\000\000\000\000\001?m\000 \000\00511873, '\000'
repeats 13 times, \004?\000\000\000\000\000\001?u\000
\000\0051196, bgw_main_arg = 1069289198, bgw_notify_pid = 12361},
rw_backend = 0x0,
  rw_pid = 0, rw_child_slot = 0, rw_crashed_at = 0, rw_shmem_slot = 4,
  rw_terminate = 0 '\000', rw_lnode = {next = 0x600e86f8}}
(gdb) p ((RegisteredBgWorker *) (((char *) $8.rw_lnode) - 264))[0]
$9 = {rw_worker = {
bgw_name = test_shm_mq\000\000\001?\\\000 \000\00512085, '\000'
repeats 13 times, \003?\000\000\000\000\000\001?]\000
\000\00512076\000\000\000\000\000\000, bgw_flags = 1, bgw_start_time
= BgWorkerStart_ConsistentState,
bgw_restart_time = -1, bgw_main = 0,
bgw_library_name =
test_shm_mq\000\000\000\003?\000\000\000\000\000\001?_\000
\000\00511885, '\000' repeats

Re: [HACKERS] replicating DROP commands across servers


On 10/03/2014 08:08 PM, Stephen Frost wrote:

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

There are three fixmes in the code.  One can be handled by just removing
the line; we don't really care about duplicating 10 lines of boilerplate
code.  The other two mean missing support for domain constraints and for
default ACLs.  Is there absolutely no feedback to be had on the
mechanism used by the patch?

Since the patch has had good feedback and no further comments arise, I
can just implement support for those two missing object types and push,
and everybody will be happy.  Right?


In general, I'd say yes, but I'll take a look at the patch now and
provide feedback in a couple hours anyway.


Thanks Stephen!

I had a very brief look at the docs, and these extra outputs from 
pg_event_trigger_dropped_objects caught my eye:



+row
+ entryliteraladdress_names/literal/entry
+ entrytypetext[]/type/entry
+ entry
+  An array that, together with literaladdress_args/literal,
+  can be used by the C-language function getObjectAddress() to
+  recreate the object address in a remote server containing a similar 
object.
+ /entry
+/row
+row
+ entryliteraladdress_args/literal/entry
+ entrytypetext[]/type/entry
+ entry
+  See literaladdress_names/literal above.
+ /entry
+/row


I couldn't find a function called getObjectAddress anywhere. Typo?

Also, is providing a C-language function the best we can do? The rest of 
the information returned by pg_event_trigger_dropped_objects is usable 
from any language.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for updating src/timezone

I wrote:
 The urgency of updating our timezone code has risen quite a bit for me,
 because while testing an update of the data files to tzdata2014h I became
 aware that the -P option is failing to print a noticeable number of
 zone abbreviations that clearly exist in the data files.  Since the -P
 hack itself is so simple, it's hard to come to any other conclusion than
 that the rest of the timezone code is buggy --- presumably because it's
 four years behind what IANA is targeting with the data files.

Ah, scratch that.  On further investigation, it turns out that my code for
-P is in fact wrong; it misses out printing zone abbreviations in cases
where an area reverted to an older meaning of a zone abbrev after having
used another meaning for awhile.  Which, in fact, is what the Russians
have just done, and it seems it's happened rather often elsewhere as well.

We do still need to work on syncing our code with IANA's updates one way
or the other, because eventually they are going to start shipping zone
definitions that break older copies of the library; this has been
mentioned more than once on the IANA announcement list.  But it doesn't
seem like that has happened quite yet.

Meanwhile, it looks like there's a fair amount of missed updates in our
tznames/ files as a result of this error.  Off I go to look at that.

(The underlying motivation here is that I'd like to get the tzdata2014h
updates installed before we wrap 9.4beta3.  Since it looks like those
changes are going to be rather invasive, it seems like getting some beta
testing on them would be a good thing before we unleash them on
back-branch releases.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

2014-10-03 Thread Arthur Silva

On Fri, Oct 3, 2014 at 1:40 PM, Bruce Momjian br...@momjian.us wrote:

 On Fri, Oct  3, 2014 at 04:11:30PM +0200, Andres Freund wrote:
  On 2014-10-03 10:07:39 -0400, Gregory Smith wrote:
   On 10/3/14, 8:26 AM, Andres Freund wrote:
   #define NUM_XLOGINSERT_LOCKS  1
   tps = 52.711939 (including connections establishing)
   #define NUM_XLOGINSERT_LOCKS  8
   tps = 286.496054 (including connections establishing)
   #define NUM_XLOGINSERT_LOCKS  16
   tps = 346.113313 (including connections establishing)
   #define NUM_XLOGINSERT_LOCKS  24
   tps = 363.242111 (including connections establishing)
  
   Just to clarify:  that 10% number I threw out was meant as a rough
 estimate
   for a system with the default configuration, which is all that I
 tested.  It
   seemed like people would likely need to tune all the usual things like
   checkpoint_segments, shared_buffers, etc. as well before seeing much
 better.
   You did all that, and sure enough the gain went up; thanks for
 confirming my
   guess.
  
   I still don't think that means this needs a GUC for 9.4.  Look at that
 jump
   from 1 to 8.  The low-hanging fruit here hasn't just been knocked
 off.  It's
   been blended into a frozen drink, poured into a glass, and had a little
   paper umbrella put on top.  I think that's enough for 9.4.  But, yes,
 let's
   see if we can add delivery to the side of the pool in the next version
 too.
 
  So 25% performance on a relatively small machine improvements aren't
  worth a GUC? That are likely to be larger on a bigger machine?
 
  I utterly fail to see why that's a service to our users.

 Well, I think the issue is that having a GUC that can't reasonably be
 tuned by 95% of our users is nearly useless.  Few users are going to run
 benchmarks to see what the optimal value is.

 I remember Informix had a setting that had no description except try
 different values to see if it helps performance --- we don't want to do
 that.

 What if we emit a server message if the setting is too low?  That's how
 we handle checkpoint_segments.

 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + Everyone has their own god. +


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


Not all GUC need to be straight forward to tune.
If the gains are worthy I don't see any reason not to have it.

Re: [HACKERS] pg_receivexlog and replication slots

On Fri, Oct 3, 2014 at 7:57 AM, Andres Freund and...@2ndquadrant.com wrote:
 I do wonder whether --create/--drop aren't somewhat wierd for
 pg_receivexlog. It's not that clear what it means. It'd be ugly, but we
 could rename them --create-slot/drop-slot.

+1 on doing it, -1 on it being ugly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

Heikki Linnakangas wrote:

 I had a very brief look at the docs, and these extra outputs from
 pg_event_trigger_dropped_objects caught my eye:
 
 +row
 + entryliteraladdress_names/literal/entry
 + entrytypetext[]/type/entry
 + entry
 +  An array that, together with literaladdress_args/literal,
 +  can be used by the C-language function getObjectAddress() to
 +  recreate the object address in a remote server containing a 
 similar object.
 + /entry
 +/row
 +row
 + entryliteraladdress_args/literal/entry
 + entrytypetext[]/type/entry
 + entry
 +  See literaladdress_names/literal above.
 + /entry
 +/row
 
 I couldn't find a function called getObjectAddress anywhere. Typo?

Ah, yeah, it's get_object_address actually.

 Also, is providing a C-language function the best we can do? The
 rest of the information returned by pg_event_trigger_dropped_objects
 is usable from any language.

Well, the return value from get_object_address is an ObjectAddress.
It's simple enough to create an SQL wrapper that takes the
address_names/address_args arrays and return an ObjectAddress; would
this be useful?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On Fri, Oct  3, 2014 at 03:00:56PM -0300, Arthur Silva wrote:
 I remember Informix had a setting that had no description except try
 different values to see if it helps performance --- we don't want to do
 that.
 
 What if we emit a server message if the setting is too low?  That's how
 we handle checkpoint_segments.
 
 Not all GUC need to be straight forward to tune.
 If the gains are worthy I don't see any reason not to have it.

Every GUC add complexity to the system because people have to understand
it to know if they should tune it.  No GUC is zero-cost.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On Fri, Oct  3, 2014 at 02:07:45PM -0400, Bruce Momjian wrote:
 On Fri, Oct  3, 2014 at 03:00:56PM -0300, Arthur Silva wrote:
  I remember Informix had a setting that had no description except try
  different values to see if it helps performance --- we don't want to do
  that.
  
  What if we emit a server message if the setting is too low?  That's how
  we handle checkpoint_segments.
  
  Not all GUC need to be straight forward to tune.
  If the gains are worthy I don't see any reason not to have it.
 
 Every GUC add complexity to the system because people have to understand
 it to know if they should tune it.  No GUC is zero-cost.

Please see my blog post about the cost of adding GUCs:

http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup

On Fri, Oct  3, 2014 at 06:08:47PM +0200, Marco Nenciarini wrote:
  Any comment will be appreciated. In particular I'd appreciate comments
  on correctness of relnode files detection and LSN extraction code.
  
  I didn't look at it in detail, but one future problem comes to mind:
  Once you implement the server-side code that only sends a file if its
  LSN is higher than the cutoff point that the client gave, you'll have to
  scan the whole file first, to see if there are any blocks with a higher
  LSN. At least until you find the first such block. So with a file-level
  implementation of this sort, you'll have to scan all files twice, in the
  worst case.
  
 
 It's true. To solve this you have to keep a central maxLSN directory,
 but I think it introduces more issues than it solves.

The central issue Heikki is pointing out is whether we should implement
a file-based system if we already know that a block-based system will be
superior in every way.  I agree with that and agree that implementing
just file-based isn't worth it as we would have to support it forever.

So, in summary, if you target just a file-based system, be prepared that
it might be rejected.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers


On 10/03/2014 09:06 PM, Alvaro Herrera wrote:

Heikki Linnakangas wrote:


I had a very brief look at the docs, and these extra outputs from
pg_event_trigger_dropped_objects caught my eye:


+row
+ entryliteraladdress_names/literal/entry
+ entrytypetext[]/type/entry
+ entry
+  An array that, together with literaladdress_args/literal,
+  can be used by the C-language function getObjectAddress() to
+  recreate the object address in a remote server containing a similar 
object.
+ /entry
+/row
+row
+ entryliteraladdress_args/literal/entry
+ entrytypetext[]/type/entry
+ entry
+  See literaladdress_names/literal above.
+ /entry
+/row


I couldn't find a function called getObjectAddress anywhere. Typo?


Ah, yeah, it's get_object_address actually.


Also, is providing a C-language function the best we can do? The
rest of the information returned by pg_event_trigger_dropped_objects
is usable from any language.


Well, the return value from get_object_address is an ObjectAddress.
It's simple enough to create an SQL wrapper that takes the
address_names/address_args arrays and return an ObjectAddress; would
this be useful?


An ObjectAddress consists of a classid, objid, and objsubid. 
pg_event_trigger_dropped_objects already returns all of those as 
separate fields. What am I missing?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

2014-10-03 Thread Arthur Silva

On Fri, Oct 3, 2014 at 3:10 PM, Bruce Momjian br...@momjian.us wrote:

 On Fri, Oct  3, 2014 at 02:07:45PM -0400, Bruce Momjian wrote:
  On Fri, Oct  3, 2014 at 03:00:56PM -0300, Arthur Silva wrote:
   I remember Informix had a setting that had no description except
 try
   different values to see if it helps performance --- we don't want
 to do
   that.
  
   What if we emit a server message if the setting is too low?
 That's how
   we handle checkpoint_segments.
  
   Not all GUC need to be straight forward to tune.
   If the gains are worthy I don't see any reason not to have it.
 
  Every GUC add complexity to the system because people have to understand
  it to know if they should tune it.  No GUC is zero-cost.

 Please see my blog post about the cost of adding GUCs:

 http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009

 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + Everyone has their own god. +


That's true Bruce (nice post, it was a good reading).
But how can we ignore 25%+ improvements (from 8 to 24)?
At very least we should delivery some pretty good defaults.

Re: [HACKERS] replicating DROP commands across servers

Heikki Linnakangas wrote:
 On 10/03/2014 09:06 PM, Alvaro Herrera wrote:

 Well, the return value from get_object_address is an ObjectAddress.
 It's simple enough to create an SQL wrapper that takes the
 address_names/address_args arrays and return an ObjectAddress; would
 this be useful?
 
 An ObjectAddress consists of a classid, objid, and objsubid.
 pg_event_trigger_dropped_objects already returns all of those as
 separate fields. What am I missing?

Precisely the point is not returning those values, because they are
useless to identify the equivalent object in a remote database.  What we
need is the object names and other stuff used to uniquely identify it
by user-visible name.  We transmit those name arrays to a remote
server, then on the remote server we can run get_object_address and get
the ObjectAddress, which has the classid,objid,objsubid values you cite ...
but for the remote server.  The object can then be dropped there.

Initially we thought that we would use the object_identity object for
this (which is why we invented that functionality and added the column
in 9.3), but this turned out not to work very well for unusual object
types; hence this patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)

On Fri, Oct 3, 2014 at 1:09 PM, Robert Haas robertmh...@gmail.com wrote:
 Further debugging reveals that sigusr1_handler() gets called
 repeatedly, to start autovacuum workers, and it keeps waking up and
 starting them.  But that doesn't cause the background workers to get
 started either, because although sigusr1_handler() contains a call to
 maybe_start_bgworker, it only does that if start_bgworker = true,
 which only happens if
 CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE) is true,
 which on these calls it isn't.
 And I think this might also be the missing ingredient answering
 Andres's question from before: why doesn't the 60-second
 select()-timeout cause the background worker to eventually start even
 if the SELECT doesn't get interrupted?  There seems to be a SIGUSR1
 arriving about every 3 seconds, and I bet that's resetting the timer
 every time.

For now I propose to address this by committing the attached patch,
which gets rid of the separate start_bgworker flag inside
sigusr1_handler() and instead uses the same StartWorkerNeeded ||
HaveCrashedWorker test that we use inside ServerLoop() to decide
whether to call maybe_start_bgworker().  Either more signals will
arrive (in which case the signal handler will launch an additional
background worker every time a signal arrives) or they won't (in which
case the 60-second timeout will eventually expire, and ServerLoop()
will kick into high gear and satisfy all outstanding requests).  This
isn't really right, because there might still be a quite noticeable
delay waiting for workers to get launched, but at least the delay
would be bounded to at most 60 seconds rather than, as at present,
potentially infinite.

A totally correct fix will require a bit more thought.  A search of
the git history reveals that the problem of a signal restarting the
timeout is not new: Tom fixed a similar problem back in 2007 by making
the autovacuum launcher sleep for at most a second at a time.  Such a
fix isn't ideal here, because we really don't want an up-to-1-second
delay launching a newly-registered background worker if there's a way
to avoid that -- it's probably OK for launching daemons, but it's not
so hot for parallel query.  However, we could:

(1) Use the self-pipe trick. We could not directly use a latch, at
least not without a new API, because we might be waiting on more than
one socket.

(2) Have the postmaster not set SA_RESTART for the sigusr1 handler.  I
don't know how platform-independent this approach would be.

(3) Have sigusr1_handler repeatedly call maybe_start_bgworker() until
StartWorkerNeeded becomes false, instead of just calling it once.
ServerLoop() is carefully coded to call maybe_start_bgworker() just
once per iteration, presumably to make sure the server stays
responsive even if the bgworker-starting machinery is quite busy;
looping inside the signal handler would give up that nice property
unless we had some way to break out of the loop if there's activity on
the socket.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ce920ab..6220a8e 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4752,7 +4752,6 @@ static void
 sigusr1_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
-	bool		start_bgworker = false;
 
 	PG_SETMASK(BlockSig);
 
@@ -4760,7 +4759,7 @@ sigusr1_handler(SIGNAL_ARGS)
 	if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
 	{
 		BackgroundWorkerStateChange();
-		start_bgworker = true;
+		StartWorkerNeeded = true;
 	}
 
 	/*
@@ -4801,10 +4800,10 @@ sigusr1_handler(SIGNAL_ARGS)
 
 		pmState = PM_HOT_STANDBY;
 		/* Some workers may be scheduled to start now */
-		start_bgworker = true;
+		StartWorkerNeeded = true;
 	}
 
-	if (start_bgworker)
+	if (StartWorkerNeeded || HaveCrashedWorker)
 		maybe_start_bgworker();
 
 	if (CheckPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

On Fri, Oct  3, 2014 at 05:53:59PM +0200, Ilya Kosmodemiansky wrote:
  What that gives us is almost zero overhead on backends, high
  reliability, and the ability of the scan daemon to give higher weights
  to locks that are held longer.  Basically, if you just stored the locks
  you held and released, you either have to add timing overhead to the
  backends, or you have no timing information collected.  By scanning
  active locks, a short-lived lock might not be seen at all, while a
  longer-lived lock might be seen by multiple scans.  What that gives us
  is a weighting of the lock time with almost zero overhead.   If we want
  finer-grained lock statistics, we just increase the number of scans per
  second.
 
 So I could add the function, which will accumulate the data in some
 view/table (with weights etc). How it should be called? From specific
 process? From some existing maintenance process such as autovacuum?
 Should I implement GUC for example lwlock_pull_rate, 0 for off, from 1
 to 10 for 1 to 10 samples pro second?

Yes, that's the right approach.  You would implement it as a background
worker process, and a GUC as you described.  I assume it would populate
a view like we already do for the pg_stat_ views, and the counters could
be reset somehow.  I would pattern it after how we handle the pg_stat_
views.

  I am assuming almost no one cares about the number of locks, but rather
  they care about cummulative lock durations.
 
 Oracle and DB2 measure both,  cummulative durations and counts.

Well, the big question is whether counts are really useful.  You did a
good job of explaining that when you find heavy clog or xlog lock usage
you would adjust your server.  What I am unclear about is why you would
adjust your server based on lock _counts_ and not cummulative lock
duration.  I don't think we want the overhead of accumulating
information that isn't useful.

  I am having trouble seeing any other option that has such a good
  cost/benefit profile.
 
 At least cost. In Oracle documentation clearly stated, that it is all
 about diagnostic convenience, performance impact is significant.

Oh, we don't want to go there then, and I think this approach is a big
win.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

Bruce Momjian br...@momjian.us writes:
 On Fri, Oct  3, 2014 at 03:00:56PM -0300, Arthur Silva wrote:
 Not all GUC need to be straight forward to tune.
 If the gains are worthy I don't see any reason not to have it.

 Every GUC add complexity to the system because people have to understand
 it to know if they should tune it.  No GUC is zero-cost.

In particular, the cost of putting this one back would be documenting
what it does and how to tune it.  As mentioned upthread, we're not
following that Informix precedent ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On Fri, Oct  3, 2014 at 03:30:35PM -0300, Arthur Silva wrote:
  Every GUC add complexity to the system because people have to understand
  it to know if they should tune it.  No GUC is zero-cost.
 
 Please see my blog post about the cost of adding GUCs:
 
 http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009

 That's true Bruce (nice post, it was a good reading).
 But how can we ignore 25%+ improvements (from 8 to 24)?
 At very least we should delivery some pretty good defaults.

Well, checkpoint_segments was a similar case where we couldn't give good
tuning advice so we went with a server log file warning if it needed to
be increased --- this might be a similar case.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4


On 10/03/2014 09:42 PM, Bruce Momjian wrote:

On Fri, Oct  3, 2014 at 03:30:35PM -0300, Arthur Silva wrote:

  Every GUC add complexity to the system because people have to understand
  it to know if they should tune it.  No GUC is zero-cost.

 Please see my blog post about the cost of adding GUCs:

 http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009

That's true Bruce (nice post, it was a good reading).
But how can we ignore 25%+ improvements (from 8 to 24)?
At very least we should delivery some pretty good defaults.


Well, checkpoint_segments was a similar case where we couldn't give good
tuning advice so we went with a server log file warning if it needed to
be increased --- this might be a similar case.


I have no idea how to decide at runtime whether it should be increased 
or not. If that was feasible, we probably could make it tune itself on 
the fly - it's not like checkpoint_segments where you need more disk 
space if you increase it.


I stand by my decision to make it a #define, at least until someone 
voices their objection in the form of a documentation patch.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On Fri, Oct 3, 2014 at 2:49 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I stand by my decision to make it a #define, at least until someone voices
 their objection in the form of a documentation patch.

I think that's exactly right.  If we knew users should tune this, then
we might be able to make it self-tuning, which would be best of all.
At the least, we'd have useful information to provide to people who
want to change it.  However, if we *can't* provide tuning advice, then
all making it a GUC does is give users a knob that says turning this
might make things better! or worse!.  Adding such knobs does little
harm individually, but in the aggregate it makes for a product that is
mysterious, hard to use, and hard to maintain and improve.

In practice, I suspect few people will get the amount of benefit that
Andres saw in his tests.  You have to be xloginsert-bound, and a lot
of workloads aren't.  And if they were before, they aren't now that we
have 8 xloginsert slots by default.

My suspicion, which I think is what Andres was getting at as well, is
that you'll want more slots if you have more CPUs.  One way to measure
the effectiveness of a given value (and maybe even auto-tune) would be
to count how often you run out of slots.  Of course, you'd need some
countervailing form of pressure based on the effort that you're
spending locking however many you have rather than some smaller
number.  You can't just measure the upside without measuring the
downside.

The xlog slots use the kind of algorithm that I quite frankly hate
even when it works.  It avoids one kind of loss but in a way that's
not scalable; as you crank up the number of slots you start to incur
cost elsewhere.  Getting any more scalability beyond that point will
require doing something fundamentally different, or at least that's my
guess.  So I feel like we've put our finger in the dike more than
anything, but that still beats letting the dike collapse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

2014-10-03 Thread Merlin Moncure

On Fri, Oct 3, 2014 at 2:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Oct 3, 2014 at 2:49 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 I stand by my decision to make it a #define, at least until someone voices
 their objection in the form of a documentation patch.

 I think that's exactly right.  If we knew users should tune this, then
 we might be able to make it self-tuning, which would be best of all.
 At the least, we'd have useful information to provide to people who
 want to change it.  However, if we *can't* provide tuning advice, then
 all making it a GUC does is give users a knob that says turning this
 might make things better! or worse!.  Adding such knobs does little
 harm individually, but in the aggregate it makes for a product that is
 mysterious, hard to use, and hard to maintain and improve.

100% agree.  It's a very simple standard: if you provide a performance
affecting GUC pleease provide guidelines to end user regarding the
tradeoffs or performance interactions being managed.  Failure to do
this causes an interesting problem; let's take the case of shared
buffers for example which isn't explained very well.  Failure to
properly document or explain the interactions causes the user to make
invalid assumptions about the setting (more memory = faster!) or take
obsolete advice (Tom Lane said in 1997 that 128mb is about right) as
gospel.  This has been a long running gripe of mine.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

Alvaro Herrera wrote:

 Precisely the point is not returning those values, because they are
 useless to identify the equivalent object in a remote database.  What we
 need is the object names and other stuff used to uniquely identify it
 by user-visible name.  We transmit those name arrays to a remote
 server, then on the remote server we can run get_object_address and get
 the ObjectAddress, which has the classid,objid,objsubid values you cite ...
 but for the remote server.  The object can then be dropped there.
 
 Initially we thought that we would use the object_identity object for
 this (which is why we invented that functionality and added the column
 in 9.3), but this turned out not to work very well for unusual object
 types; hence this patch.

The other thing to keep in mind is that with all those ObjectAddress
thingies you got, you cannot simply construct a DROP obj command:

1. The objects in the list might be of different types; say a table and
a view that are dropped by the same command because of CASCADE.  (You
could just pass the CASCADE to the other side and hope that it happens
to do the same thing; but if the schemas are slightly different, it
might not.)

2. DROP OWNED or other commands might have dropped several objects,
again of varying types.

So what we do in BDR is stuff all those ObjectAddress in an array of
them, and then call performMultipleDeletions.  There is no way to get
this functionality in non-C code; so it's hard to see that it's very
useful to have a non-C way to use the original objnames/objargs arrays.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Right.  In the add to objname cases, there is already some other
 routine that initialized it previously by filling in some stuff; in the
 case above, this happens in the getRelationIdentity() immediately
 preceding this.
 
 In the other cases we initialize on that spot.

ahh, ok, that makes a bit more sense, sorry for missing it.  Still makes
me wonder why objargs gets special treatment at the top of the function
and objnames doesn't- seems like both should be initialized either
before being passed in (and perhaps an Assert to verify that they are),
or they should both be initialized, but I tend to prefer just Assert'ing
that they are correct on entry- either both are valid pointers to empty
lists, or both NULL.

  I'm also not a huge fan of the big object_type_map, but I also don't
  have a better solution.
 
 Agreed.  We have the ObjectProperty array also in the same file; it
 kinda looks like there is duplication here, but actually there isn't.

Yeah, I did notice that, and noticed that it's not duplication.

 This whole issue is just fallout from the fact that we have three
 different ways to identify object classes: the ObjectClass enum, the
 ObjectType enum, and the relation OIDs of each catalog (actually a
 fourth one, see below).  I don't see any other nice way around this; I
 guess we could try to auto-generate these tables somehow from a master
 text file, or something.  Not material for this patch, I think.

Agreed that this patch doesn't need to address it and not sure that a
master text file would actually be an improvement..  I had been thinking
if there was some way to have a single mapping which could be used in
either direction, but I didn't see any sensible way to make that work
given that it's not *quite* the same backwards and forewards.

 Note my DDL deparse patch adds a comment:
 
 +/* XXX merge this with ObjectTypeMap? */
  static event_trigger_support_data event_trigger_support[] = {

Yes, I saw that, and that you added a comment that the new map needs to
be updated when changes are made, which is also good.

 and a late revision to that patch added a new function in
 event_triggers.c (not yet posted I think) due to GRANT having its own
 enum of object types, AclObjectKind.

Yeah.  Perhaps one day we'll unify all of these, though I'm not 100%
sure it'd be possible...

Thanks!

Stephen


signature.asc
Description: Digital signature

[HACKERS] Re: Valgrind warnings in master branch (Invalid read of size 8) originating within CreatePolicy()

Peter,

* Peter Geoghegan (p...@heroku.com) wrote:
 I see the following Valgrind warnings in a recent build of the master branch:

Fix pushed, thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Missing newlines in verbose logs of pg_dump, introduced by RLS patch

* Stephen Frost (sfr...@snowman.net) wrote:
 * Michael Paquier (michael.paqu...@gmail.com) wrote:
  On Mon, Sep 29, 2014 at 10:07 AM, Fabrízio de Royes Mello 
  fabriziome...@gmail.com wrote:
  
   The schema name is missing... attached patch add it.
  
  Ah, right, thanks. It didn't occur to me immediately :) Your patch looks
  good to me, and you are updating as well the second message that missed the
  schema name in getRowSecurity.
 
 Thanks to you both- will review.

Fix pushed- thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Trailing comma support in SELECT statements

2014-10-03 Thread Andrew Dunstan



On 10/03/2014 12:20 PM, Bruce Momjian wrote:

On Sun, Sep 28, 2014 at 01:42:46PM +0200, Bogdan Pilch wrote:

Hi,
I have created a small patch to postgres source (in particular the
psql part of it) that accepts trailing comma at the end of list in
SELECT statement.

The idea is to be able to say both (with the same result):
SELECT a, b, c from t;
SELECT a, b, c, from t;

Attached you can find a patch containing regression test (incorporated
into the serial_schedule).
My patch is relative to origin/REL9_4_STABLE branch as that is the one
I started from.

My plea is to have this change merged into the main stream so that it
becomes available in upcoming releases.

This modification does not require any interaction with user.
It does not create any backward compatibility issues.

Interesting --- I know some languages allow trailing delimiters, like
Perl and Javascript.  Could this mask query errors?  Does any other
database accept this?  Seems this would need to be done in many other
places, like UPDATE, but let's first decide if we want this.

FYI, it is usually better to discuss a feature before showing a patch.



Javascript might accept it, but it's not valid JSON.

The case for doing it is that then you can easily comment out any entry 
at all in a select list:


select
foo as f1,
bar as f2,
-- baz as f3,
from blurfl


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup

On Fri, Oct 3, 2014 at 12:08 PM, Marco Nenciarini
marco.nenciar...@2ndquadrant.it wrote:
 Il 03/10/14 17:53, Heikki Linnakangas ha scritto:
 If we're going to need a profile file - and I'm not convinced of that -
 is there any reason to not always include it in the backup?

 The main reason is to have a centralized list of files that need to be
 present. Without a profile, you have to insert some sort of placeholder
 for kipped files.

Why do you need to do that?  And where do you need to do that?

It seems to me that there are three interesting operations:

1. Take a full backup.  Basically, we already have this.  In the
backup label file, make sure to note the newest LSN guaranteed to be
present in the backup.

2. Take a differential backup.  In the backup label file, note the LSN
of the fullback to which the differential backup is relative, and the
newest LSN guaranteed to be present in the differential backup.  The
actual backup can consist of a series of 20-byte buffer tags, those
being the exact set of blocks newer than the base-backup's
latest-guaranteed-to-be-present LSN.  Each buffer tag is followed by
an 8kB block of data.  If a relfilenode is truncated or removed, you
need some way to indicate that in the backup; e.g. include a buffertag
with forknum = -(forknum + 1) and blocknum = the new number of blocks,
or InvalidBlockNumber if removed entirely.

3. Apply a differential backup to a full backup to create an updated
full backup.  This is just a matter of scanning the full backup and
the differential backup and applying the changes in the differential
backup to the full backup.

You might want combinations of these, like something that does 2+3 as
a single operation, for efficiency, or a way to copy a full backup and
apply a differential backup to it as you go.  But that's it, right?
What else do you need?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

Stephen Frost wrote:
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
  Right.  In the add to objname cases, there is already some other
  routine that initialized it previously by filling in some stuff; in the
  case above, this happens in the getRelationIdentity() immediately
  preceding this.
  
  In the other cases we initialize on that spot.
 
 ahh, ok, that makes a bit more sense, sorry for missing it.  Still makes
 me wonder why objargs gets special treatment at the top of the function
 and objnames doesn't- seems like both should be initialized either
 before being passed in (and perhaps an Assert to verify that they are),
 or they should both be initialized, but I tend to prefer just Assert'ing
 that they are correct on entry- either both are valid pointers to empty
 lists, or both NULL.

I guess I could initialize objnames to NIL also.  I initialize objargs
because that one is unused for a lot of object types (so I would have to
set it to NIL in cases where it's not used), whereas objnames is always
used and thus we know it's always initialized later.

Maybe what I need here is just a longer comment explaining this ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  ahh, ok, that makes a bit more sense, sorry for missing it.  Still makes
  me wonder why objargs gets special treatment at the top of the function
  and objnames doesn't- seems like both should be initialized either
  before being passed in (and perhaps an Assert to verify that they are),
  or they should both be initialized, but I tend to prefer just Assert'ing
  that they are correct on entry- either both are valid pointers to empty
  lists, or both NULL.
 
 I guess I could initialize objnames to NIL also.  I initialize objargs
 because that one is unused for a lot of object types (so I would have to
 set it to NIL in cases where it's not used), whereas objnames is always
 used and thus we know it's always initialized later.
 
 Maybe what I need here is just a longer comment explaining this ...

A one-line comment that it's always reset below would be sufficient for me.

Thanks for explaining it :),

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] replicating DROP commands across servers

On Fri, Oct 3, 2014 at 2:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Heikki Linnakangas wrote:
 On 10/03/2014 09:06 PM, Alvaro Herrera wrote:

 Well, the return value from get_object_address is an ObjectAddress.
 It's simple enough to create an SQL wrapper that takes the
 address_names/address_args arrays and return an ObjectAddress; would
 this be useful?

 An ObjectAddress consists of a classid, objid, and objsubid.
 pg_event_trigger_dropped_objects already returns all of those as
 separate fields. What am I missing?

 Precisely the point is not returning those values, because they are
 useless to identify the equivalent object in a remote database.  What we
 need is the object names and other stuff used to uniquely identify it
 by user-visible name.  We transmit those name arrays to a remote
 server, then on the remote server we can run get_object_address and get
 the ObjectAddress, which has the classid,objid,objsubid values you cite ...
 but for the remote server.  The object can then be dropped there.

 Initially we thought that we would use the object_identity object for
 this (which is why we invented that functionality and added the column
 in 9.3), but this turned out not to work very well for unusual object
 types; hence this patch.

I'm not really very convinced that it's a good idea to expose this
instead of just figuring out a way to parse the object identity.

But I expect to lose that argument.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

Robert Haas wrote:

 I'm not really very convinced that it's a good idea to expose this
 instead of just figuring out a way to parse the object identity.

That's the first thing I tried.  But it's not pretty: you have to
extract schema names by splitting at a period (and what if a schema name
contains a period?), split out on ON for certain object types, figure
out parens and argument types and names for functions and aggregates,
etc.  It's just not sane to try to parse such text strings.

 But I expect to lose that argument.

Good :-)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DDL Damage Assessment

2014-10-03 Thread José Luis Tallón


On 10/03/2014 11:02 AM, Dimitri Fontaine wrote:

Jim Nasby jim.na...@bluetreble.com writes:

EXPLAIN
ALTER TABLE 

I'm thinking it would be better to have something you could set at a session
level, so you don't have to stick EXPLAIN in front of all your DDL.


We were considering the potential needs of accidental DBAs here at 
first, if memory serves me well.
I definitively see the value of EXPLAIN [DDL STATEMENT]... even if 
implemented as a regular this is what should happen, without even 
attempting to run a thing (not even dry run transactions), but 
including the full catalog search / attribute resolution etc.
Providing insight on the locking that would happen sounds like a real 
life-saver for many real life situations (i.e. would this ALTER 
completely lock my application due to the constant logging-to-table?)

This is, obviously IMHO.

Yeah I'm coming into that camp too, and I think the Event Trigger idea
gets us halfway there. Here's a detailed sketched of how it would work:

  1. preparatory steps: install the Event Trigger
  
 create extension norewrite;


  2. test run:

 psql -1 -f ddl.sql
 ERROR: Table Rewrite has been cancelled.

  3. Well actually we need to run that thing in production

 BEGIN;
   ALTER EVENT TRIGGER norewrite DISABLE;
   \i ddl.sql
   ALTER EVENT TRIGGER norewrite ENABLE;
 COMMIT;

Then it's also possible to have another Event Trigger that would
automatically issue a LOCK table NOWAIT; command before any DDL
against a table is run, in another extension:

   create extension ddl_lock_nowait;

The same applies, if your production rollout is blocked repeatedly and
you want to force it through at some point, it's possible to disable the
event trigger within the DDL script/transaction.


This serves a different purpose which is, at least, as worthwhile as the 
former: provide a real dry run mechanism for advanced users.
Stephen's delta fork sounds like a promising approach ... even if a 
bit too Oracle-ish (sounds an awful lot like UNDO logs!) for my liking.



As for the dry-run idea, I don't think that's really necessary. I've never
seen anyone serious that doesn't have a development environment, which is
where you would simply deploy the real DDL using verbose mode and see what
the underlying commands actually do.

The major drawback of the Event Trigger idea is that the transaction is
cancelled as soon as a Rewrite Event is fired when you have installed
the protective trigger. It means that you won't see the next problem
after the first one, so it's not a dry-run.

But considering what you're saying here, it might well be enough.


It is a very convenient first step (minimally invasive, and good use of 
existing infrastructure)... since it allows an easy testing phase in 
order to iron out potential shortcomings and gather input on some other 
applications.



Thanks,

/ Jose



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner kgri...@ymail.com wrote:
I have two questions I hope you can clarify. I'm having trouble
parsing what this statement means:

... the SQL standard does not require that MERGE be atomic in the
sense of atomically providing either an INSERT or UPDATE, ...

My understanding is that the standard logically requires (without
concern for implementation details) that the second specified table
(via table name or subquery -- which could be a VALUES statement)
is scanned, and for each row it attempts to match a row in the
target table. That will either match or not, according to the
boolean expression in the ON clause. You can have one WHEN MATCHED
THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
If both clauses are present, I believe that it is guaranteed that
one or the other (but not both) will fire for each row in the
second table. The usual guarantees for each isolation level must
not be violated (although an implementation is not required to
generate anomalies which could not happen with serial execution).

That seems like a statement that isn't getting to the heart of the
matter. Which is: simple UPSERT cases will get things like duplicate
key violations under concurrency with MERGE, in SQL Server, Oracle,
and DB2. I think serialization failures are inevitable and appropriate
when not using READ COMMITTED mode with MVCC, but these duplicate
violations are sort of like READ COMMITTED serialization failures in
disguise. Or they would be, if MERGE (as described by the standard, or
in practice) promised to care about atomicity in that sense. It
doesn't, so they're off the hook. I think we should care about
atomicity (in the sense of always getting an insert or update, never
an error at RC) for the simple reason that that's what people want.

It seems like your remarks here don't acknowledge the reality: That
there are slightly different ideas of each row that are in play
here.

So as usual for a transaction involving multiple tables,
serialization anomalies are possible if the transaction isolation
level is REPEATABLE READ or less. Is that what you're getting at,
or something else?

My complaint is quite straightforward, really. I don't want to have to
tell users to do this: http://stackoverflow.com/a/2249

At the same time, I also don't want to have to live with the
consequences of implementing a MERGE that does not exhibit that
behavior. Which is to say, the consequences of doing something like
selectively using different types of snapshots (MVCC or dirty - the
two different ideas of each row that are in tension) based on the
exact clauses used. That seems like asking for trouble, TBH.

-- Predicate within UPDATE auxiliary statement
-- (row is still locked when the UPDATE predicate
-- isn't satisfied):
INSERT INTO upsert(key, val) VALUES(1, 'insert')
-- CONFLICTING() carries forward effects of both INSERT and UPDATE BEFORE
row-level triggers
ON CONFLICT UPDATE SET val = CONFLICTING(val)
-- Predicate has interesting semantics visibility-wise:
WHERE val != 'delete';

Can you explain what the WHERE clause there does?

Sure. Having already locked the tuple on the basis on finding a
conflict TID, but before an UPDATE, the qual is evaluated using
EvalPlanQual rescanning. The predicate must be satisfied in order for
the UPDATE to actually proceed. If, in this instance, the locked tuple
happened to have a val of 'delete', then we would not UPDATE at all.
It would still be locked, though (possibly without being visible to
our MVCC snapshot, just like when we might do that when we do UPDATE).

There are some interesting implications visibility-wise here that must
be considered. These are discussed on the Wiki page:
https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29

Obviously higher isolation levels just throw an error here instead.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trailing comma support in SELECT statements

2014-10-03 Thread David G Johnston

Andrew Dunstan wrote
 On 10/03/2014 12:20 PM, Bruce Momjian wrote:
 On Sun, Sep 28, 2014 at 01:42:46PM +0200, Bogdan Pilch wrote:
 Hi,
 I have created a small patch to postgres source (in particular the
 psql part of it) that accepts trailing comma at the end of list in
 SELECT statement.

 The idea is to be able to say both (with the same result):
 SELECT a, b, c from t;
 SELECT a, b, c, from t;

 Attached you can find a patch containing regression test (incorporated
 into the serial_schedule).
 My patch is relative to origin/REL9_4_STABLE branch as that is the one
 I started from.

 My plea is to have this change merged into the main stream so that it
 becomes available in upcoming releases.

 This modification does not require any interaction with user.
 It does not create any backward compatibility issues.
 Interesting --- I know some languages allow trailing delimiters, like
 Perl and Javascript.  Could this mask query errors?  Does any other
 database accept this?  Seems this would need to be done in many other
 places, like UPDATE, but let's first decide if we want this.

 FYI, it is usually better to discuss a feature before showing a patch.

 
 Javascript might accept it, but it's not valid JSON.
 
 The case for doing it is that then you can easily comment out any entry 
 at all in a select list:
 
  select
  foo as f1,
  bar as f2,
  -- baz as f3,
  from blurfl

Should we also allow:

SELECT
, col1
, col2
, col3
FROM ...

?

The other reason for this would be to build dynamic SQL more easily via a
loop.

Barring arguments showing danger allowing I don't see a reason to reject
this; let people decide whether they want to utilize it on stylistic or
compatibility grounds.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Trailing-comma-support-in-SELECT-statements-tp5821613p5821694.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup

On 2014-10-03 17:31:45 +0200, Marco Nenciarini wrote:
 I've updated the wiki page
 https://wiki.postgresql.org/wiki/Incremental_backup following the result
 of discussion on hackers.
 
 Compared to first version, we switched from a timestamp+checksum based
 approach to one based on LSN.
 
 This patch adds an option to pg_basebackup and to replication protocol
 BASE_BACKUP command to generate a backup_profile file. It is almost
 useless by itself, but it is the foundation on which we will build the
 file based incremental backup (and hopefully a block based incremental
 backup after it).
 
 Any comment will be appreciated. In particular I'd appreciate comments
 on correctness of relnode files detection and LSN extraction code.

Can you describe the algorithm you implemented in words?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] superuser() shortcuts

2014-10-03 Thread Brightwell, Adam

All,


 Thanks!  Please add it to the next commitfest.


 Sounds good.  I'll update the patch and add accordingly.


Attached is an updated patch.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
new file mode 100644
index 133143d..695a13c
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
***
*** 27,32 
--- 27,33 
  #include miscadmin.h
  #include replication/walreceiver.h
  #include storage/smgr.h
+ #include utils/acl.h
  #include utils/builtins.h
  #include utils/numeric.h
  #include utils/guc.h
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 54,60 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser()  !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg(must be superuser or replication role to run a backup)));
--- 55,61 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!has_replication_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg(must be superuser or replication role to run a backup)));
*** pg_stop_backup(PG_FUNCTION_ARGS)
*** 82,88 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser()  !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg(must be superuser or replication role to run a backup;
--- 83,89 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!has_replication_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg(must be superuser or replication role to run a backup;
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
new file mode 100644
index d30612c..4d26b02
*** a/src/backend/catalog/aclchk.c
--- b/src/backend/catalog/aclchk.c
*** static AclMode restrict_and_check_grant(
*** 143,148 
--- 143,149 
  		 AttrNumber att_number, const char *colname);
  static AclMode pg_aclmask(AclObjectKind objkind, Oid table_oid, AttrNumber attnum,
  		   Oid roleid, AclMode mask, AclMaskHow how);
+ static bool has_catupdate_privilege(Oid roleid);
  
  
  #ifdef ACLDEBUG
*** aclcheck_error_type(AclResult aclerr, Oi
*** 3425,3431 
  
  /* Check if given user has rolcatupdate privilege according to pg_authid */
  static bool
! has_rolcatupdate(Oid roleid)
  {
  	bool		rolcatupdate;
  	HeapTuple	tuple;
--- 3426,3432 
  
  /* Check if given user has rolcatupdate privilege according to pg_authid */
  static bool
! has_catupdate_privilege(Oid roleid)
  {
  	bool		rolcatupdate;
  	HeapTuple	tuple;
*** pg_class_aclmask(Oid table_oid, Oid role
*** 3630,3636 
  	if ((mask  (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) 
  		IsSystemClass(table_oid, classForm) 
  		classForm-relkind != RELKIND_VIEW 
! 		!has_rolcatupdate(roleid) 
  		!allowSystemTableMods)
  	{
  #ifdef ACLDEBUG
--- 3631,3637 
  	if ((mask  (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) 
  		IsSystemClass(table_oid, classForm) 
  		classForm-relkind != RELKIND_VIEW 
! 		!has_catupdate_privilege(roleid) 
  		!allowSystemTableMods)
  	{
  #ifdef ACLDEBUG
*** has_createrole_privilege(Oid roleid)
*** 5078,5083 
--- 5079,5106 
  		ReleaseSysCache(utup);
  	}
  	return result;
+ }
+ 
+ /*
+  * Check whether specified role has REPLICATION priviledge (or is a superuser)
+  */
+ bool
+ has_replication_privilege(Oid roleid)
+ {
+ 	bool		result = false;
+ 	HeapTuple	utup;
+ 
+ 	/* Superusers bypass all permission checking. */
+ 	if (superuser_arg(roleid))
+ 		return true;
+ 
+ 	utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid));
+ 	if (HeapTupleIsValid(utup))
+ 	{
+ 		result = ((Form_pg_authid) GETSTRUCT(utup))-rolreplication;
+ 		ReleaseSysCache(utup);
+ 	}
+ 	return result;
  }
  
  bool
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
new file mode 100644
index c9a9baf..ed89b23
*** a/src/backend/commands/alter.c
--- b/src/backend/commands/alter.c
*** AlterObjectOwner_internal(Relation rel,
*** 807,852 
  		bool	   *nulls;
  		bool	   *replaces;
  
! 		/* Superusers can bypass permission checks */
! 		if (!superuser())
  		{
! 			AclObjectKind aclkind = get_object_aclkind(classId);
  
! 			/* must be owner */
! 			if (!has_privs_of_role(GetUserId(), old_ownerId))
  			{
! char	   *objname;
! char		namebuf[NAMEDATALEN];
! 
! if (Anum_name != InvalidAttrNumber)
! {
! 	datum = heap_getattr(oldtup, Anum_name,
! 		 RelationGetDescr(rel), isnull);
! 	Assert(!isnull);
! 	objname = NameStr(*DatumGetName(datum));
! }
! else
! {
! 	snprintf(namebuf, sizeof(namebuf), %u,
! 			 HeapTupleGetOid(oldtup));
! 	objname = namebuf;

Re: [HACKERS] Trailing comma support in SELECT statements

2014-10-03 Thread Pavel Stehule

2014-09-28 13:42 GMT+02:00 Bogdan Pilch bog...@matfyz.cz:

 Hi,
 I have created a small patch to postgres source (in particular the
 psql part of it) that accepts trailing comma at the end of list in
 SELECT statement.


It is ANSI/SQL ?

Why we should to enable? We can be tolerant to this bug, but then
developers will hate us, when they will try to port to other servers.

-1 from me

Regards

Pavel



 The idea is to be able to say both (with the same result):
 SELECT a, b, c from t;
 SELECT a, b, c, from t;

 Attached you can find a patch containing regression test (incorporated
 into the serial_schedule).
 My patch is relative to origin/REL9_4_STABLE branch as that is the one
 I started from.

 My plea is to have this change merged into the main stream so that it
 becomes available in upcoming releases.

 This modification does not require any interaction with user.
 It does not create any backward compatibility issues.
 Not does it have any performance impact.

 regards
 bogdan


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

On 2014-10-03 11:33:18 -0400, Bruce Momjian wrote:
 On Thu, Oct  2, 2014 at 11:50:14AM +0200, Andres Freund wrote:
  The first problem that comes to my mind about collecting enough data is
  that we have a very large number of lwlocks (fixed_number + 2 *
  shared_buffers). One 'trivial' way of implementing this is to have a per
  backend array collecting the information, and then a shared one
  accumulating data from it over time. But I'm afraid that's not going to
  fly :(. Hm. With the above sets of stats that'd be ~50MB per backend...
  
  Perhaps we should somehow encode this different for individual lwlock
  tranches? It's far less problematic to collect all this information for
  all but the buffer lwlocks...
 
 First, I think this could be a major Postgres feature, and I am excited
 someone is working on this.
 
 As far as gathering data, I don't think we are going to do any better in
 terms of performance/simplicity/reliability than to have a single PGPROC
 entry to record when we enter/exit a lock, and having a secondary
 process scan the PGPROC array periodically.

I don't think that'll give meaningful results given the very short times
most lwlocks are held. And it'll give not very interesting results for
multiple lwlocks held at the same time - most of the time the 'earlier'
held ones are more interesting than the last acquired one...

 I am assuming almost no one cares about the number of locks, but rather
 they care about cummulative lock durations.

I actually don't think that's true. Every lock acquiration implies a
number of atomic locks. Those are expensive. And if you see individual
locks acquired a high number of times in multiple proceses that's
something important. It causes significant bus traffic between sockets,
while not necessarily visible in the lock held times.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

On 2014-10-03 11:51:46 -0400, Robert Haas wrote:
 On Fri, Oct 3, 2014 at 11:33 AM, Bruce Momjian br...@momjian.us wrote:
  I am assuming almost no one cares about the number of locks, but rather
  they care about cummulative lock durations.
 
  I am having trouble seeing any other option that has such a good
  cost/benefit profile.
 
 I do think that the instrumentation data gathered by LWLOCK_STATS is
 useful - very useful.
 
 But it does have significant overhead.

Have you ever analyzed where that overhead is with the current code?

I do wonder if having a per backend array in shmem indexed by the lockid
(inside its tranche) wouldn't be quite doable for the smaller tranches.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

On Fri, Oct 3, 2014 at 3:54 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Simon's approach would actually pass that test case just fine. It inserts
 the (promise) index tuple first, and heap tuple only after that. It will
 fail the test case with more than one unique index, however.

Oh, I see. Still, I don't think you need to UPDATE a
uniquely-constrained attribute - even if updating constrained
attributes is rare (dubious), non-HOT updates will have the same
effect, no? I still think that's unacceptable.

In any case, I still don't see what this buys us over the other two
designs. What's the pay-off for giving up on the general avoidance of
unprincipled deadlocks?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last Commitfest patches waiting review

On 2014-10-03 19:14:14 +0300, Heikki Linnakangas wrote:
 Thanks to everyone's who's reviewed a patch so far.  One last crunch, and
 we'll be through.
 
 We have 7 patches left in Needs Review state:
 
 pg_receivexlog: addition of --create/--drop to create/drop repslots
 
   Waiting for Magnus. Amit promised to take a look if Magnus continues
   to be busy.

I've committed half of it, and will commit the second half soon. I guess
you meant Andres instead of Amit?

 Escaping from blocked send() by pg_terminate_backend().
 
   Andres posted a patch series using a completely different approach. I
   guess this should be just marked as returned with feedback, if we're
   going to pursue the different approach instead.

Sounds fair to me. I'll comment with details on the corresponding thread.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replicating DROP commands across servers

On 2014-10-03 14:02:09 -0300, Alvaro Herrera wrote:
 Since the patch has had good feedback and no further comments arise, I
 can just implement support for those two missing object types and push,
 and everybody will be happy.  Right?

I'd like to see a new version before that out here... I don't think
there's fundamental issues, but it's complicated enough to warrant
that. And I don't think it has gotten enough detailed review. I hope to
be able to give a bit more of that...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote:
 On Fri, Oct  3, 2014 at 04:11:30PM +0200, Andres Freund wrote:
  On 2014-10-03 10:07:39 -0400, Gregory Smith wrote:
   On 10/3/14, 8:26 AM, Andres Freund wrote:
   #define NUM_XLOGINSERT_LOCKS  1
   tps = 52.711939 (including connections establishing)
   #define NUM_XLOGINSERT_LOCKS  8
   tps = 286.496054 (including connections establishing)
   #define NUM_XLOGINSERT_LOCKS  16
   tps = 346.113313 (including connections establishing)
   #define NUM_XLOGINSERT_LOCKS  24
   tps = 363.242111 (including connections establishing)
   
   Just to clarify:  that 10% number I threw out was meant as a rough 
   estimate
   for a system with the default configuration, which is all that I tested.  
   It
   seemed like people would likely need to tune all the usual things like
   checkpoint_segments, shared_buffers, etc. as well before seeing much 
   better.
   You did all that, and sure enough the gain went up; thanks for confirming 
   my
   guess.
   
   I still don't think that means this needs a GUC for 9.4.  Look at that 
   jump
   from 1 to 8.  The low-hanging fruit here hasn't just been knocked off.  
   It's
   been blended into a frozen drink, poured into a glass, and had a little
   paper umbrella put on top.  I think that's enough for 9.4.  But, yes, 
   let's
   see if we can add delivery to the side of the pool in the next version 
   too.
  
  So 25% performance on a relatively small machine improvements aren't
  worth a GUC? That are likely to be larger on a bigger machine?
  
  I utterly fail to see why that's a service to our users.
 
 Well, I think the issue is that having a GUC that can't reasonably be
 tuned by 95% of our users is nearly useless.  Few users are going to run
 benchmarks to see what the optimal value is.

Sure. And the default loooks to be a good one. So it's not bad that they
don't tune it further.
But. How are we ever going to be able to tune it further without
feedback from actual production usage?

It's possible to convince customers to play with a performance
influencing parameter and see how the results are. Even in
production. It's near impossible to do so if that requires to download
source packages, change a define in some .c file, rebuild the packages,
distribute them to the respective servers. And then continue to do so
for every release thereafter.

Not only is that a *significant* amount of work. It often involves a
different set of people (sysadmin, not dba-ish people). And often
complicated procedures to allow patching the source code at all.

 I remember Informix had a setting that had no description except try
 different values to see if it helps performance --- we don't want to do
 that.

I'm pretty sure we can come up with a better description than that.

 What if we emit a server message if the setting is too low?  That's how
 we handle checkpoint_segments.

I don't think it's realistically possible in this case. The
instrumentation we'd need to add would be too expensive for something
running as frequently as this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-03 Thread Kevin Grittner

Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner kgri...@ymail.com wrote:
 I'm having trouble parsing what this statement means:

 ... the SQL standard does not require that MERGE be atomic in
 the sense of atomically providing either an INSERT or UPDATE, ...

 ... always getting an insert or update, never an error ...

I've never seen atomic used to mean you never get an error
before.  Perhaps it would be clearer to replace atomic with
error-free or some such.  It certainly would be less confusing
for me.  Atomic in most cases is a property that can be enforced
by generating an error where it would otherwise be violated.

For example: http://en.wikipedia.org/wiki/Atomicity_%28database_systems%29

  Although implementations vary depending on factors such as 
  concurrency issues, the principle of atomicity — i.e. complete 
  success or complete failure — remain.

When you are saying atomic you mean something quite different.

 My complaint is quite straightforward, really. I don't want to
 have to tell users to do this: http://stackoverflow.com/a/2249

I think you are confusing syntax with semantics.  I grant that a
reasonable person could have concerns about using the MERGE syntax
to implement the semantics you want in the special case that an
appropriate unique index exists, but pretending that the semantics
can't be achieved with that syntax is just silly.

 At the same time, I also don't want to have to live with the
 consequences of implementing a MERGE that does not exhibit that
 behavior. Which is to say, the consequences of doing something
 like selectively using different types of snapshots (MVCC or
 dirty - the two different ideas of each row that are in
 tension) based on the exact clauses used. That seems like asking
 for trouble, TBH.

Now *that* is getting more to a real issue.  We routinely pick very
different plans based on the presence or absence of an index, and
we use special snapshots in the course of executing many DML
statements (if FK triggers are fired), but this would be the first
place I can think of that the primary DML statement behaves that
way.  You and a couple others have expressed concern about that,
but it seems pretty vague and hand-wavey.  If a different execution
node is used for the special behavior, and that node is generated
based on the unique index being used, I'm really having problems
seeing where the problem lies.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)

On 2014-10-03 14:38:10 -0400, Robert Haas wrote:
 On Fri, Oct 3, 2014 at 1:09 PM, Robert Haas robertmh...@gmail.com wrote:
  Further debugging reveals that sigusr1_handler() gets called
  repeatedly, to start autovacuum workers, and it keeps waking up and
  starting them.  But that doesn't cause the background workers to get
  started either, because although sigusr1_handler() contains a call to
  maybe_start_bgworker, it only does that if start_bgworker = true,
  which only happens if
  CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE) is true,
  which on these calls it isn't.
  And I think this might also be the missing ingredient answering
  Andres's question from before: why doesn't the 60-second
  select()-timeout cause the background worker to eventually start even
  if the SELECT doesn't get interrupted?  There seems to be a SIGUSR1
  arriving about every 3 seconds, and I bet that's resetting the timer
  every time.

Ick. There's a couple reports about HPUX behaving that way for select(),
so that seems like a reasonable theory. And a good explanation why only
anole was seing this.

 For now I propose to address this by committing the attached patch,
 which gets rid of the separate start_bgworker flag inside
 sigusr1_handler() and instead uses the same StartWorkerNeeded ||
 HaveCrashedWorker test that we use inside ServerLoop() to decide
 whether to call maybe_start_bgworker().

That seems sane.

 Either more signals will
 arrive (in which case the signal handler will launch an additional
 background worker every time a signal arrives) or they won't (in which
 case the 60-second timeout will eventually expire, and ServerLoop()
 will kick into high gear and satisfy all outstanding requests).  This
 isn't really right, because there might still be a quite noticeable
 delay waiting for workers to get launched, but at least the delay
 would be bounded to at most 60 seconds rather than, as at present,
 potentially infinite.

Right, it will somewhat fix the current issue. But aren't the
consequences of a potentially never/very seldomly run ServerLoop() more
severe? There's a couple more things done in that loop :(

 A totally correct fix will require a bit more thought.  A search of
 the git history reveals that the problem of a signal restarting the
 timeout is not new: Tom fixed a similar problem back in 2007 by making
 the autovacuum launcher sleep for at most a second at a time.  Such a
 fix isn't ideal here, because we really don't want an up-to-1-second
 delay launching a newly-registered background worker if there's a way
 to avoid that -- it's probably OK for launching daemons, but it's not
 so hot for parallel query.  However, we could:
 
 (1) Use the self-pipe trick. We could not directly use a latch, at
 least not without a new API, because we might be waiting on more than
 one socket.

I think this is the way to go. And I think we should make the latch API
support that case, instead of reinventing it. That'll also have the
advantage of automatically using poll() (which is more efficient),
rather than select(), when supported by the platform.

 (2) Have the postmaster not set SA_RESTART for the sigusr1 handler.  I
 don't know how platform-independent this approach would be.

This makes me wary. Postmaster doesn't do too much stuff, but it'd still
be annoying to have to make sure everything always does loop around
EINTR.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog and replication slots

On 2014-10-03 14:02:08 -0400, Robert Haas wrote:
 On Fri, Oct 3, 2014 at 7:57 AM, Andres Freund and...@2ndquadrant.com wrote:
  I do wonder whether --create/--drop aren't somewhat wierd for
  pg_receivexlog. It's not that clear what it means. It'd be ugly, but we
  could rename them --create-slot/drop-slot.
 
 +1 on doing it, -1 on it being ugly.

The reason I'm calling it uglyu is that it's different from
pg_recvlogical. We could change it there, too? A bit late, but probably
better than having a discrepancy forever?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On 2014-10-03 17:55:19 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote:
  Well, I think the issue is that having a GUC that can't reasonably be
  tuned by 95% of our users is nearly useless.  Few users are going to run
  benchmarks to see what the optimal value is.
 
  It's possible to convince customers to play with a performance
  influencing parameter and see how the results are. Even in
  production.
 
 I'm a bit dubious that people will be willing to experiment in production
 with a GUC that requires a database restart to change.

I've convinced customers to restart databases with several different
shared_buffers settings... So I'm pretty sure it's possible, in *some*
cases, for xloginsert_slots.

And even if it's just test/pre-production machines - they're not going
to benchmark settings they can't reasonably set in production.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

Andres Freund and...@2ndquadrant.com writes:
 On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote:
 Well, I think the issue is that having a GUC that can't reasonably be
 tuned by 95% of our users is nearly useless.  Few users are going to run
 benchmarks to see what the optimal value is.

 It's possible to convince customers to play with a performance
 influencing parameter and see how the results are. Even in
 production.

I'm a bit dubious that people will be willing to experiment in production
with a GUC that requires a database restart to change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

On Fri, Oct  3, 2014 at 11:15:13PM +0200, Andres Freund wrote:
  As far as gathering data, I don't think we are going to do any better in
  terms of performance/simplicity/reliability than to have a single PGPROC
  entry to record when we enter/exit a lock, and having a secondary
  process scan the PGPROC array periodically.
 
 I don't think that'll give meaningful results given the very short times
 most lwlocks are held. And it'll give not very interesting results for

I figured you could get more data the more often you sampled.

 multiple lwlocks held at the same time - most of the time the 'earlier'
 held ones are more interesting than the last acquired one...

Well, I thought lock _waiting_ would be the most interesting measure,
not locks held.  Doesn't pg_locks show locks held?

  I am assuming almost no one cares about the number of locks, but rather
  they care about cummulative lock durations.
 
 I actually don't think that's true. Every lock acquiration implies a
 number of atomic locks. Those are expensive. And if you see individual
 locks acquired a high number of times in multiple proceses that's
 something important. It causes significant bus traffic between sockets,
 while not necessarily visible in the lock held times.

True, but I don't think users are going to get much value from those
numbers, and they are hard to get.  Server developers might want to know
lock counts, but in those cases performance might not be as important.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On Fri, Oct  3, 2014 at 11:58:14PM +0200, Andres Freund wrote:
 On 2014-10-03 17:55:19 -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote:
   Well, I think the issue is that having a GUC that can't reasonably be
   tuned by 95% of our users is nearly useless.  Few users are going to run
   benchmarks to see what the optimal value is.
  
   It's possible to convince customers to play with a performance
   influencing parameter and see how the results are. Even in
   production.
  
  I'm a bit dubious that people will be willing to experiment in production
  with a GUC that requires a database restart to change.
 
 I've convinced customers to restart databases with several different
 shared_buffers settings... So I'm pretty sure it's possible, in *some*
 cases, for xloginsert_slots.
 
 And even if it's just test/pre-production machines - they're not going
 to benchmark settings they can't reasonably set in production.

Do we really want to expose a setting a few of us _might_ ask customers
to change?  I don't think so --- create a custom build if you want that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On 2014-10-03 18:08:56 -0400, Bruce Momjian wrote:
 On Fri, Oct  3, 2014 at 11:58:14PM +0200, Andres Freund wrote:
  On 2014-10-03 17:55:19 -0400, Tom Lane wrote:
   Andres Freund and...@2ndquadrant.com writes:
On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote:
Well, I think the issue is that having a GUC that can't reasonably be
tuned by 95% of our users is nearly useless.  Few users are going to 
run
benchmarks to see what the optimal value is.
   
It's possible to convince customers to play with a performance
influencing parameter and see how the results are. Even in
production.
   
   I'm a bit dubious that people will be willing to experiment in production
   with a GUC that requires a database restart to change.
  
  I've convinced customers to restart databases with several different
  shared_buffers settings... So I'm pretty sure it's possible, in *some*
  cases, for xloginsert_slots.
  
  And even if it's just test/pre-production machines - they're not going
  to benchmark settings they can't reasonably set in production.
 
 Do we really want to expose a setting a few of us _might_ ask customers
 to change?

They also will try that themselves. Our customers aren't a horde of dumb
people. Some of them are willing to try things if they hit scalability
problesm. And *lots* of people hit scalability problems with
postgres. In fact I've seen big users migrate away from postgres because
of them.

And it's not like this only affects absurd cases. Even a parallel
restore will benefit.

 I don't think so --- create a custom build if you want that.

Won't work in the majority of real world cases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On Sat, Oct  4, 2014 at 12:13:00AM +0200, Andres Freund wrote:
  Do we really want to expose a setting a few of us _might_ ask customers
  to change?
 
 They also will try that themselves. Our customers aren't a horde of dumb
 people. Some of them are willing to try things if they hit scalability
 problesm. And *lots* of people hit scalability problems with
 postgres. In fact I've seen big users migrate away from postgres because
 of them.
 
 And it's not like this only affects absurd cases. Even a parallel
 restore will benefit.

I disagree.  I just don't see the value in having such undefined
variables.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixed xloginsert_locks for 9.4

On 2014-10-03 18:16:28 -0400, Bruce Momjian wrote:
 On Sat, Oct  4, 2014 at 12:13:00AM +0200, Andres Freund wrote:
   Do we really want to expose a setting a few of us _might_ ask customers
   to change?
  
  They also will try that themselves. Our customers aren't a horde of dumb
  people. Some of them are willing to try things if they hit scalability
  problesm. And *lots* of people hit scalability problems with
  postgres. In fact I've seen big users migrate away from postgres because
  of them.
  
  And it's not like this only affects absurd cases. Even a parallel
  restore will benefit.
 
 I disagree.  I just don't see the value in having such undefined
 variables.

undefined variables? I'm not arguing that we don't need documentation
for it. Obviously we'd need that. I'm arguing against taking away
significant scalability possibilities from our users. My bet is that
it's more than 50% on a bigger machine.

I don't think we can offer absolutely accurate tuning advice, but I'm
sure we can give some guidance. Let me try.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 3, 2014 at 2:44 PM, Kevin Grittner kgri...@ymail.com wrote:
 I've never seen atomic used to mean you never get an error
 before.

 When you are saying atomic you mean something quite different.

Perhaps I should have been more careful on that point. Just to be
crystal clear: I don't intend that you'll never get an error (some
cases clearly *should* error). But I want confidence that reasonable,
simple UPSERTs will never error due to a duplicate violation that they
thought the statement was supposed to avoid. That's the point of
UPSERT. Guarantees matter. I would like to make guarantees that are at
least as good as the upsert looping subxact pattern.

Teradata refers to an atomic UPSERT. It also has a MERGE command.

 My complaint is quite straightforward, really. I don't want to
 have to tell users to do this: http://stackoverflow.com/a/2249

 I think you are confusing syntax with semantics.  I grant that a
 reasonable person could have concerns about using the MERGE syntax
 to implement the semantics you want in the special case that an
 appropriate unique index exists, but pretending that the semantics
 can't be achieved with that syntax is just silly.

I am not pretending that it can't be done - just, as I said, that to
do so would have bad consequences when time came to generalize the
syntax to support more advanced cases.

 At the same time, I also don't want to have to live with the
 consequences of implementing a MERGE that does not exhibit that
 behavior. Which is to say, the consequences of doing something
 like selectively using different types of snapshots (MVCC or
 dirty - the two different ideas of each row that are in
 tension) based on the exact clauses used. That seems like asking
 for trouble, TBH.

 Now *that* is getting more to a real issue.

Yes, it is. I am opposed to using the MERGE syntax for this *because*
MERGE is useful (independently useful, when done properly), not
because it is not useful.

 We routinely pick very
 different plans based on the presence or absence of an index, and
 we use special snapshots in the course of executing many DML
 statements (if FK triggers are fired), but this would be the first
 place I can think of that the primary DML statement behaves that
 way.  You and a couple others have expressed concern about that,
 but it seems pretty vague and hand-wavey.  If a different execution
 node is used for the special behavior, and that node is generated
 based on the unique index being used, I'm really having problems
 seeing where the problem lies.

We need to avoid duplicate violations. The authoritative source of
truth here is a unique index, because the presence or absence of a
conclusively-visible conflict tuple controls whether our insert fails
or succeeds respectively. We need a dirty snapshot to see tuples not
in our MVCC snapshot, since they need to be handled too. Otherwise,
when we fail to handle them, the MERGE's INSERT has a dup violation
(which is not what I want, for practical reasons).

If we're seq scanning a table, even with a dirty snapshot, there is no
authoritative source of truth. Heap tuples will be inserted
concurrently, and we'll have no idea what to do about it without
reference to a unique index as a choke point/single source of final
truth about what is and isn't going to be duplicated.

As implemented, my UPSERT patch will have UPSERTs not really use their
MVCC snapshot for simple cases, just like INSERTs in general. UPDATEs
always use their MVCC snapshot, as the ModifyTable node pulls up
tuples from an ordinary relation scan. UPSERT needs a DirtySnapshot
scan of the unique index to find duplicates, because that's always how
we find duplicates. At the same time, MERGE is not at liberty to not
work just because of the absence of a unique index (plus even row
locking must be prepared for conflicts). And since the MERGE is driven
by an outer join - *not* an INSERT-like search for duplicates - we'll
find ourselves providing one behavior when detecting one case, and
another when detecting the other, with subtle and confusing
differences between each case. If we fake an outer join by doing an
INSERT-like search for duplicates in a unique index (to make sure we
see duplicates), then we get into trouble elsewhere. Like, we cannot
let users not ask for an INSERT in their MERGE, because we'd be
conclusively deciding to *not* (say) UPDATE on the basis of the
absence of a value in a unique index, as opposed to the presence.

Suddenly, we're in the business of proving a negativeor are we?.
My head hurts.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan p...@heroku.com wrote:
 Yes, it is. I am opposed to using the MERGE syntax for this *because*
 MERGE is useful (independently useful, when done properly), not
 because it is not useful.

As I've mentioned, there is also the practical argument:  No one else
does this. That suggests to me that it's hard.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax