from:"\"Kevin Grittner\""

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2017-06-16 Thread Kevin Grittner

On Fri, Jun 16, 2017 at 5:31 AM, Marina Polyakova
 wrote:

> And thank you very much for your explanation how and why transactions with
> failures should be retried! I'll try to implement all of it.

To be clear, part of "retrying from the beginning" means that if a
result from one statement is used to determine the content (or
whether to run) a subsequent statement, that first statement must be
run in the new transaction and the results evaluated again to
determine what to use for the later statement.  You can't simply
replay the statements that were run during the first try.  For
examples, to help get a feel of why that is, see:

https://wiki.postgresql.org/wiki/SSI

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2017-06-15 Thread Kevin Grittner

On Thu, Jun 15, 2017 at 4:18 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:

> As far as I understand her proposal, it is exactly the opposite -- if a
> transaction fails, it is discarded.  And this P.S. note is asking
> whether this is a good idea, or would we prefer that failing
> transactions are retried.
>
> I think it's pretty obvious that transactions that failed with
> some serializability problem should be retried.

Agreed all around.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2017-06-15 Thread Kevin Grittner

On Thu, Jun 15, 2017 at 2:16 PM, Andres Freund  wrote:
> On 2017-06-14 11:48:25 +0300, Marina Polyakova wrote:

>> I suggest a patch where pgbench client sessions are not disconnected because
>> of serialization or deadlock failures and these failures are mentioned in
>> reports.
>
> I think that's a good idea and sorely needed.

+1

>> P.S. Does this use case (do not retry transaction with serialization or
>> deadlock failure) is most interesting or failed transactions should be
>> retried (and how much times if there seems to be no hope of success...)?
>
> I can't quite parse that sentence, could you restate?

The way I read it was that the most interesting solution would retry
a transaction from the beginning on a serialization failure or
deadlock failure.  Most people who use serializable transactions (at
least in my experience) run though a framework that does that
automatically, regardless of what client code initiated the
transaction.  These retries are generally hidden from the client
code -- it just looks like the transaction took a bit longer.
Sometimes people will have a limit on the number of retries.  I
never used such a limit and never had a problem, because our
implementation of serializable transactions will not throw a
serialization failure error until one of the transactions involved
in causing it has successfully committed -- meaning that the retry
can only hit this again on a *new* set of transactions.

Essentially, the transaction should only count toward the TPS rate
when it eventually completes without a serialization failure.

Marina, did I understand you correctly?

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 weekly progress reports (week 2)

2017-06-13 Thread Kevin Grittner

On Tue, Jun 13, 2017 at 1:02 PM, Andrew Borodin  wrote:
> 2017-06-13 18:00 GMT+05:00 Shubham Barai :

> Good job!

+1!  :-)

> So, in current HEAD test predicate_gist_2.spec generate false
> positives, but with your patch, it does not?

Keep in mind, that false positives do not break *correctness* of serializable
transactions as long as it involves another transaction.  (It *would* be a bug
if a transaction running alone could cause a serialization failure.)  A false
*negative* is always a bug.

That said, false positives hurt performance, so we should keep the rate as low
as practicable.

> I'd suggest keeping spec tests with your code in the same branch, it's
> easier.

+1

> Also it worth to clean up specs style and add some words to
> documentation.

+1

> Kevin, all, how do you think, is it necessary to expand these tests
> not only on Index Only Scan, but also on Bitmap Index Scan? And may be
> KNN version of scan too?
> I couldn't find such tests for B-tree, do we have them?

Off-hand, I don't know.  It would be interesting to run regression tests with
code coverage and look at the index AMs.

https://www.postgresql.org/docs/9.6/static/regress-coverage.html

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Kevin Grittner

On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan  wrote:

> My assumption about how transition tables ought to behave here is
> based on the simple fact that we already fire both AFTER
> statement-level triggers, plus my sense of aesthetics, or bias. I
> admit that I might be missing the point, but if I am it would be
> useful to know how.

Well, either will work.  My inclination is that a single statement
should cause one execution of the FOR EACH STATEMENT trigger, but if
a good case can be made that we should have a FOR EACH STATEMENT
trigger fire for each clause within a statement -- well, it won't be
a problem for matview maintenance.  The biggest hurt there would be
to *my* sense of aesthetics.  ;-)

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Kevin Grittner

On Wed, Jun 7, 2017 at 4:48 PM, Thomas Munro
 wrote:

> Is there anything about that semantics that is incompatible with the
> incremental matview use case?

Nothing incompatible at all.  If we had separate "new" tables for
UPDATE and DELETE we would logically need to do a "counting"-style
UNION of them just like we will want to do with the OLD (with a
count of -1) and NEW (with a count of 1) to get to a delta now.  So
keeping them separate would be marginally more work for incremental
matview, but not a big deal.

> For example, are the "counting" and
> "DRed" algorithms you've proposed (for non-recursive and recursive
> views) driven entirely by deletions and insertions, so that updates
> look like deletes followed by inserts anyway?

Counting is best done from a "delta" relation which has old and new
combined with opposite counts.  I'm sure that will be fine with
DRed, too.

> If so, I think our
> current treatment of ON CONFLICT DO UPDATE should be fine: you can't
> tell whether the tuples in the new table result from insert or update
> without also consulting the old table, but if the algorithm treats all
> tuples in the old table as deletes (even though in this case they
> result from updates) and all tuples in the new table as inserts (even
> though in this case *some* of them result from updates) anyway then I
> don't think it matters.

I don't think so, either.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Kevin Grittner

On Tue, Jun 6, 2017 at 4:42 PM, Robert Haas  wrote:

> So, are you willing and able to put any effort into this, like say
> reviewing the patch Thomas posted, and if so when and how much?  If
> you're just done and you aren't going to put any more work into
> maintaining it (for whatever reasons), then I think you should say so
> straight out.  People shouldn't have to guess whether you're going to
> maintain your patch or not.

It has become clear that the scope of problems being found now
exceed what I can be sure of being able to fix in time to make for a
stable release, in spite of the heroic efforts Thomas has been
putting in.  I had hoped to get this into the first or second CF of
this release, same with the release before, and same with the
release before that.  At least landing it in the final CF drew the
level of review and testing needed to polish it, but it's far from
ideal timing (or procedure).  I can revert from v10 and deal with
all of this for the first CF of some future release, but if someone
feels they can deal with it in v10 I'll stand back and offer what
help I can.

You mentioned blame earlier.  That seems pointless to me.  I'm
looking to see what works and what doesn't.  When I went to develop
SSI it was because my employer needed it, and they asked what it
would take to get that done; I said one year of my time without
being drawn off for other tasks to get it to a point where they
would be better off using it than not, and then two years half time
work to address community concerns and get it committed, and follow
up on problems.  They gave me that and it worked, and worked well.
In this case, resources to carry it through were not reserved when I
started, and when I became full-up with other things, then problems
surfaced.  That doesn't work.  I don't want to start something big
again until I have resources set up and reserved, as a priority, to
see it through.  It's a matter of what works for both me and the
community and what doesn't.

In the meantime I'll try to keep to small enough issues that the
resources required to support what I do is not beyond what I can
deliver.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-07 Thread Kevin Grittner

On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro
 wrote:
> On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
>  wrote:
>> On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan  wrote:
>>> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
>>> case -- one for updated tuples, and the other for inserted tuples.
>>
>> Hmm.  Right.  INSERT ... ON CONFLICT DO UPDATE causes both AFTER
>> INSERT and AFTER UPDATE statement-level triggers to be fired, but then
>> both kinds see all the tuples -- those that were inserted and those
>> that were updated.  That's not right.
>
> Or maybe it is right.

The idea of transition tables is that you see all changes to the
target of a single statement in the form of delta relations -- with
and "old" table for any rows affected by a delete or update and a
"new" table for any rows affected by an update or delete.  If we
have a single statement that combines INSERT and UPDATE behaviors,
it might make sense to have an "old" table for updates and a single
"new" table for both.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-07 Thread Kevin Grittner

On Sun, Jun 4, 2017 at 11:27 AM, Mengxing Liu
 wrote:

> "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all),  
> cpu user time should be about 12*4 = 48.
> There seems to be no process blocked by IO.
>
> procs ---memory-- ---swap-- -io -system-- 
> --cpu-
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa 
> st
> 28  0  0 981177024 315036 7084376000 0 900  1  0 
> 99  0  0
> 21  1  0 981178176 315036 7084378400 0 0 25482 329020 12  
> 3 85  0  0
> 18  1  0 981179200 315036 7084379200 0 0 26569 323596 12  
> 3 85  0  0
> 17  0  0 981175424 315036 7084380800 0 0 25374 322992 12  
> 4 85  0  0
> 12  0  0 981174208 315036 7084382400 0 0 24775 321577 12  
> 3 85  0  0
>  8  0  0 981179328 315036 7084533600 0 0 13115 199020  6  
> 2 92  0  0
> 13  0  0 981179200 315036 7084579200 0 0 22893 301373 11  
> 3 87  0  0
> 11  0  0 981179712 315036 7084580800 0 0 26933 325728 12  
> 4 84  0  0
> 30  0  0 981178304 315036 7084582400 0 0 23691 315821 11  
> 4 85  0  0
> 12  1  0 981177600 315036 7084583200 0 0 29485 320166 12  
> 4 84  0  0
> 32  0  0 981180032 315036 7084584800 0 0 25946 316724 12  
> 4 84  0  0
> 21  0  0 981176384 315036 7084586400 0 0 24227 321938 12  
> 4 84  0  0
> 21  0  0 981178880 315036 7084588000 0 0 25174 326943 13  
> 4 83  0  0

This machine has 120 cores?  Is hyperthreading enabled?  If so, what
you are showing might represent a total saturation of the 30 cores.
Context switches of about 300,000 per second is pretty high.  I can't
think of when I've seen that except when there is high spinlock
contention.

Just to put the above in context, how did you limit the test to 30
cores?  How many connections were open during the test?

> The flame graph is attached. I use 'perf' to generate the flame graph. Only 
> the CPUs running PG server are profiled.
> I'm not familiar with other part of PG. Can you find anything unusual in the 
> graph?

Two SSI functions stand out:
10.86% PredicateLockTuple
 3.51% CheckForSerializableConflictIn

In both cases, most of that seems to go to lightweight locking.  Since
you said this is a CPU graph, that again suggests spinlock contention
issues.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 : Proposal for predicate locking in gist index

2017-06-06 Thread Kevin Grittner

On Thu, Jun 1, 2017 at 12:49 AM, Andrew Borodin  wrote:

> First, I just do not know, can VACUUM erase page with predicate lock?

For handling in btree, see:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtpage.c;h=f815fd40b20e98a88cb2fb5c71005ea125a459c9;hb=refs/heads/master#l1406

Note also this discussion:

https://www.postgresql.org/message-id/4d669122.80...@enterprisedb.com

It doesn't look like we ever got to the optimization Heikki
suggested in that post, so on rare occasions we could see a false
positive from this.  Perhaps we should give this another look while
we're in the AMs.

> Right now, GiST never deletes pages, as far as I know, so this
> question is only matter of future compatibility.

ok

> Second, when we are doing GiST inserts, we can encounter unfinished
> split. That's not a frequent situation, but still. Should we skip
> finishing split or should we add it to collision check too?

When a page is split, I think you need to call
PredicateLockPageSplit() before it gets to the point that an insert
to get to it.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: GSoC 2017 weekly progress reports ("Explicitly support predicate locks in index access methods besides b-tree")

2017-06-06 Thread Kevin Grittner

On Mon, Jun 5, 2017 at 12:40 PM, Shubham Barai  wrote:

> GSoC (week 1)

> 4. went through source code of gist index to understand its implementation
>
> 5. found appropriate places in gist index AM to insert calls to existing
> functions (PredicateLockPage(),  CheckForSerializableConflictIn() ...etc)

When you have a patch for any AM, please post it to the list.  Each
AM will be reviewed and considered for commit separately.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-06 Thread Kevin Grittner

On Tue, Jun 6, 2017 at 3:41 PM, Marko Tiikkaja  wrote:
> On Tue, Jun 6, 2017 at 10:25 PM, Kevin Grittner  wrote:

>> It took years to get an in-depth review, then I was asked
>> not to commit it because others were working on patches that would
>> conflict.  That just doesn't leave enough time to address these
>> issues before release.  Fundamentally, I'm not sure that there is a
>> level of interest sufficient to support the effort.

> I can only say that the lack of this feature comes up on a weekly basis on
> IRC, and a lot of people would be disappointed to see it reverted.

Well, at PGCon I talked with someone who worked on the
implementation in Oracle 20-some years ago.  He said they had a team
of 20 people full time working on the feature for years to get it
working.  Now, I think the PostgreSQL community is a little lighter
on its feet, but without more active participation from others than
there has been so far, I don't intend to take another run at it.
I'll be happy to participate at such point that it's not treated as
a second-class feature set.  Barring that, anyone else who wants to
take the patch and run with it is welcome to do so.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-06 Thread Kevin Grittner

Nice as it would be to add a SQL standard feature and advance the
effort to get to incremental maintenance of materialized views, and
much as I really appreciate the efforts Thomas has put into trying
to solve these problems, I agree that it is best to revert the
feature.  It took years to get an in-depth review, then I was asked
not to commit it because others were working on patches that would
conflict.  That just doesn't leave enough time to address these
issues before release.  Fundamentally, I'm not sure that there is a
level of interest sufficient to support the effort.

I'll give it a few days for objections before reverting.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-03 Thread Kevin Grittner

On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
 wrote:

> So, afterTriggers.query_stack is used to handle the reentrancy that
> results from triggers running further statements that might fire
> triggers.  It isn't used for dealing with extra ModifyTable nodes that
> can appear in a plan because of wCTEs.  Could it also be used for that
> purpose?  I think that would only work if it is the case that each
> ModifyTable node begin and then runs to completion (ie no interleaving
> of wCTE execution) and then its AS trigger fires, which I'm not sure
> about.

I don't think we want to commit to anything that depends on a CTE
creating an optimization fence, although *maybe* that would be OK in
the case of DML as a CTE.  That's a pretty special case; I'm not
sure whether the standard discusses it.

> If that is the case, perhaps AfterTriggerBeginQuery and
> AfterTriggerEndQuery could become the responsibility of
> nodeModifyTable.c rather than execMain.c.  I haven't tried this yet
> and it may well be too ambitious at this stage.

In a world where one statement can contain multiple DML statements
within CTEs, that may make sense regardless of transition table
issues; but I agree we're past the time to be considering making
such a change for v10.

> Other ideas: (1) ban wCTEs that target relations with transition
> tables in PG10, because we're out of time;

Given that we haven't even discussed what to do about an UPDATE
statement with a CTE that updates the same table when there are
BEFORE EACH ROW UPDATE triggers on that table (which perhaps return
NULL for some but not all rows?), I'm not sure we've yet plumbed the
depths of this morass.

For example, for this:

drop table if exists t1 cascade;

create table t1 (c1 int not null, c2 text not null default '');
insert into t1 (c1) values (1), (2), (3), (4), (5);

drop function if exists t1_upd_func() cascade;
create function t1_upd_func()
  returns trigger
  language plpgsql
as $$
begin
  if old.c1 between 2 and 4 then
new.c2 = old.c2 || ';' || old.c1::text || ',' || new.c1::text;
  end if;
  if old.c1 > 3 then
return null;
  end if;
  return new;
end;
$$;
create trigger t1_upd_update
  before update
  on t1
  for each row execute procedure t1_upd_func();
with x as (update t1 set c1 = c1 - 1 returning c1)
  update t1 set c1 = t1.c1 + 1 from x where x.c1 = t1.c1;
select * from t1;

... what is the correct result?

I get this:

 c1 |  c2
+--
  4 |
  5 |
  0 |
  1 | ;2,1
  2 | ;3,2
(5 rows)

It is definitely not what I expected, and seems wrong in multiple
ways from a logical perspective, looking  at those as set-based
operations.  (Tautologies about the procedural mechanism that
creates the result are not defenses of the result itself.)

Can we fix things exclusive of transition tables in this release?
If not, the only reasonable thing to do is to try to avoid further
complicating things with CTE/transition table usage until we sort
out the basics of triggers firing from CTE DML in the first place.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-03 Thread Kevin Grittner

On Sat, Jun 3, 2017 at 1:51 AM, Mengxing Liu
 wrote:

> I tried 30 cores. But the CPU utilization is about 45%~70%.
> How can we distinguish  where the problem is? Is disk I/O or Lock?

A simple way is to run `vmstat 1` for a bit during the test.  Can
you post a portion of the output of that here?  If you can configure
the WAL directory to a separate mount point (e.g., use the --waldir
option of initdb), a snippet of `iostat 1` output would be even
better.

I think the best thing may be if you can generate a CPU flame graph
of the worst case you can make for these lists:
http://www.brendangregg.com/flamegraphs.html  IMO, such a graph
highlights the nature of the problem better than anything else.

> Does "traversing these list" means the function "RWConflictExists"?
> And "10% waiting on the corresponding kernel mutexes" means the
> lock in the function CheckForSerializableIn/out ?

Since they seemed to be talking specifically about the conflict
list, I had read that as SerializableXactHashLock, although the
wording is a bit vague -- they may mean something more inclusive.

> If I used 30 cores for server, and 90 clients, RWConflictExists
> consumes 1.0% CPU cycles, and CheckForSerializableIn/out takes 5%
> in all.
> But the total CPU utilization of PG is about 50%. So the result
> seems to be matched.
> If we can solve this problem, we can use this benchmark for the
> future work.

If you can get a flame graph of CPU usage on this load, that would
be ideal.  At that point, we can discuss what is best to attack.
Reducing something that is 10% of the total PostgreSQL CPU load in a
particular workload sounds like it could still have significant
value, although if you see a way to attack the other 90% (or some
portion of it larger than 10%) instead, I think we could adjust the
scope based on those results.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-02 Thread Kevin Grittner

> Mengxing Liu wrote:

>> The CPU utilization of CheckForSerializableConflictOut/In is
>> 0.71%/0.69%.

How many cores were on the system used for this test?  The paper
specifically said that they didn't see performance degradation on
the PostgreSQL implementation until 32 concurrent connections with
24 or more cores.  The goal here is to fix *scaling* problems.  Be
sure you are testing at an appropriate scale.  (The more sockets,
cores, and clients, the better, I think.)

On Fri, Jun 2, 2017 at 10:08 AM, Alvaro Herrera
 wrote:

> Kevin mentioned during PGCon that there's a paper by some group in
> Sydney that developed a benchmark on which this scalability
> problem showed up very prominently.

https://pdfs.semanticscholar.org/6c4a/e427e6f392b7dec782b7a60516f0283af1f5.pdf

"[...] we see a much better scalability of pgSSI than the
corresponding implementations on InnoDB. Still, the overhead of
pgSSI versus standard SI increases significantly with higher MPL
than one would normally expect, reaching around 50% with MPL 128."

"Our profiling showed that PostgreSQL spend 2.3% of the overall
runtime in traversing these list, plus 10% of its runtime waiting on
the corresponding kernel mutexes."

If you cannot duplicate their results, you might want to contact the
authors for more details on their testing methodology.

Note that the locking around access to the list is likely to be a
bigger problem than the actual walking and manipulation of the list
itself.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 : Proposal for predicate locking in gist index

2017-05-31 Thread Kevin Grittner

On Wed, May 31, 2017 at 3:02 PM, Shubham Barai  wrote:

> I have been accepted as GSoC student for the project "Explicitly support
> predicate locks in index access methods besides b-tree". I want to share my
> approach for implementation of page level predicate locking in gist index.

For the benefit of all following the discussion, implementing
support in an index AM conceptually consists of two things:
(1) Any scan with user-visible results must create SIRead predicate
locks on "gaps" scanned.  (For example, a scan just to find an
insertion spot for an index entry does not generally count, whereas
a scan to satisfy a user "EXISTS" test does.)
(2) Any insert into the index must CheckForSerializableConflictIn()
at enough points that at least one of them will detect an overlap
with a predicate lock from a preceding scan if the inserted index
entry might have changed the results of that preceding scan.

Detecting such a conflict does not automatically result in a
serialization failure, but is only part of tracking the information
necessary to make such a determination.  All that is handled by the
existing machinery if the AM does the above.

With a btree index, locking gaps at the leaf level is normally
sufficient, because both scan and insertion of an index entry must
descend that far.

> The main difference between b-tree and gist index while searching for a
> target tuple is that in gist index, we can determine if there is a match or
> not at any level of the index.

Agreed.  A gist scan can fail at any level, but for that scan to
have produced a different result due to insertion of an index entry,
*some* page that the scan looked at must be modified.

> The simplest way to do that will be by inserting a call for
> prdicatelockpage() in gistscanpage().

Yup.

> Insertion algorithm also needs to check for conflicting predicate locks at
> each level of the index.

Yup.

> We can insert a call for CheckForSerializableConflictIn() at two places in
> gistdoinsert().
>
> 1. after acquiring an exclusive lock on internal page (in case we are trying
> to replace an old search key)
>
> 2. after acquiring an exclusive lock on leaf page
>
> If there is not enough space for insertion, we have to copy predicate lock
> from an old page to all new pages generated after a successful split
> operation. For that, we can insert a call for PredicateLockPageSplit() in
> gistplacetopage().

That all sounds good.  When you code a patch, don't forget to add
tests to src/test/isolation.

Do you need any help getting a development/build/test environment
set up?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-31 Thread Kevin Grittner

On Wed, May 31, 2017 at 1:44 AM, Noah Misch  wrote:

> IMMEDIATE ATTENTION REQUIRED.

I should be able to complete review and testing by Friday.  If there
are problems I might not take action until Monday; otherwise I
should be able to do so on Friday.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-18 Thread Kevin Grittner

On Thu, May 18, 2017 at 5:16 AM, Amit Langote
 wrote:

> Do we need to update documentation?  Perhaps, some clarification on the
> inheritance/partitioning behavior somewhere.

Yeah, I think so.

> -Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL));
> +Assert((enrmd->reliddesc == InvalidOid) !=
> +   (enrmd->tupdesc == NULL));
>
> Perhaps, unintentional change?

Agreed; line is not long enough to need to wrap.

> I'm not sure if it's significant for transition tables, but what if a
> partition's BR trigger modified the tuple?  Would we want to include the
> modified version of the tuple in the transition table or the original as
> the patch does?  Same for the code in CopyFrom().

Good spot!  If the BR trigger on the child table modifies or
suppresses the action, I strongly feel that must be reflected in the
transition table.  This needs to be fixed.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-16 Thread Kevin Grittner

On Tue, May 16, 2017 at 4:50 PM, Thomas Munro
 wrote:
> On Wed, May 17, 2017 at 9:20 AM, Kevin Grittner  wrote:
>> On Wed, May 10, 2017 at 7:02 AM, Thomas Munro
>>  wrote:
>>> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro 
>>>  wrote:
>>>> 2.  If you attach a row-level trigger with transition tables to any
>>>> inheritance child, it will see transition tuples from all tables in
>>>> the inheritance hierarchy at or below the directly named table that
>>>> were modified by the same statement, sliced so that they appear as
>>>> tuples from the directly named table.
>>>
>>> Of course that's a bit crazy, not only for trigger authors to
>>> understand and deal with, but also for plan caching: it just doesn't
>>> really make sense to have a database object, even an ephemeral one,
>>> whose type changes depending on how the trigger was invoked, because
>>> the plans stick around.
>>
>> The patch to add transition tables changed caching of a trigger
>> function to key on the combination of the function and the target
>> relation, rather than having one cache entry regardless of the
>> target table.
>
> Right.  That works as long as triggers always see tuples in the format
> of the relation that they're attached to, and I think that's the only
> sensible choice.  The problem I was thinking about was this:  We have
> only one pair of tuplestores and in the current design it holds tuples
> of a uniform format, yet it may (eventually) need to be scanned by a
> statement trigger attached to a the named relation AND any number of
> row triggers attached to children with potentially different formats.
> That implies some extra conversions are required at scan time.  I had
> a more ambitious patch that would deal with sone of that by tracking
> storage format and scan format separately (next time your row trigger
> is invoked the scan format will be the same but the storage format may
> be different depending on which relation you named in a query), but
> I'm putting that to one side for this release.  That was a bit of a
> rabbit hole, and there are some tricky design questions about tuple
> conversions (to behave like DB2 with subtables may even require
> tuplestore with per-tuple type information) and also the subset of
> rows that each row trigger should see (which may require extra tuple
> origin metadata or separate tuplestores).

Yeah, I wish this had surfaced far earlier, but I missed it and none
of the reviews prior to commit caught it, either.  :-(  It seems too
big to squeeze into v10 now.  I just want to have that general
direction in mind and not paint ourselves into any corner that makes
it hard to get there in the next release.

> I'm about to post a much simpler patch that collects uniform tuples
> from children, addressing the reported bug, and simply rejects
> transition tables on row-triggers on tables that are in either kind of
> inheritance hierarchy.  More soon...

I agree that we can safely go that far, but not farther.  Thanks!

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-16 Thread Kevin Grittner

On Wed, May 10, 2017 at 7:02 AM, Thomas Munro
 wrote:
> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro 
>  wrote:
>> 2.  If you attach a row-level trigger with transition tables to any
>> inheritance child, it will see transition tuples from all tables in
>> the inheritance hierarchy at or below the directly named table that
>> were modified by the same statement, sliced so that they appear as
>> tuples from the directly named table.
>
> Of course that's a bit crazy, not only for trigger authors to
> understand and deal with, but also for plan caching: it just doesn't
> really make sense to have a database object, even an ephemeral one,
> whose type changes depending on how the trigger was invoked, because
> the plans stick around.

The patch to add transition tables changed caching of a trigger
function to key on the combination of the function and the target
relation, rather than having one cache entry regardless of the
target table.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-15 Thread Kevin Grittner

[Apologies to all for my recent absence from community lists, and
special thanks to Thomas and Robert for picking up the slack.]

On Tue, May 9, 2017 at 4:51 PM, Thomas Munro
 wrote:
> On Tue, May 9, 2017 at 10:29 PM, Thomas Munro  
> wrote:

> Recall that transition tables can be specified for statement-level
> triggers AND row-level triggers.  If you specify them for row-level
> triggers, then they can see all rows changed so far each time they
> fire.

No, they see all rows from the statement, each time.

test=# create table t (c int not null);
CREATE TABLE
test=# create function t_func()
test-#   returns trigger
test-#   language plpgsql
test-# as $$
test$# begin
test$#   raise notice '% / % = %',
test$#new.c,
test$#(select sum(c) from n),
test$#(select new.c::float / sum(n.c) from n);
test$#   return null;
test$# end;
test$# $$;
CREATE FUNCTION
test=# create trigger t_trig
test-#   after insert or update on t
test-#   referencing new table as n
test-#   for each row
test-#   execute procedure t_func();
CREATE TRIGGER
test=# insert into t select generate_series(1,5);
NOTICE:  1 / 15 = 0.0667
NOTICE:  2 / 15 = 0.133
NOTICE:  3 / 15 = 0.2
NOTICE:  4 / 15 = 0.267
NOTICE:  5 / 15 = 0.333
INSERT 0 5

This behavior is required for this feature by the SQL standard.

> Now our policy of firing the statement level triggers only for
> the named relation but firing row-level triggers for all modified
> relations leads to a tricky problem for the inheritance case: what
> type of transition tuples should the child table's row-level triggers
> see?

The record format for the object on which the trigger was declared, IMO.

> Suppose you have an inheritance hierarchy like this:
>
>  animal
>   -> mammal
>   -> cat
>
> You define a statement-level trigger on "animal" and another
> statement-level trigger on "mammal".  You define a row-level trigger
> on "cat".  When you update either "animal" or "mammal", the row
> triggers on "cat" might run.  Row-level triggers on "cat" see OLD and
> NEW as "cat" tuples, of course, but if they are configured to see
> transition tables, should they see "cat", "mammal" or "animal" tuples
> in the transition tables?  With my patch as it is, that depends on
> which level of the hierarchy you explicitly updated!

I think that the ideal behavior would be that if you define a
trigger on "cat", you see rows in the "cat" format; if you define a
trigger on rows for "mammal", you see rows in the "mammal" format;
if you define a trigger on rows for "animal", you see rows in the
"animal" format.  Also, the ideal would be that we support an ONLY
option for trigger declaration.  If your statement is ONLY on the a
given level in the hierarchy, the row triggers for only that level
would fire.  If you don't use ONLY, a row trigger at that level
would fire for operations at that level or any child level, but with
a record format matching the level of the trigger.

Now, that may be too ambitious for this release.  If so, I suggest
we not implement anything that would be broken by the above, and
throw a "not implemented" error when necessary.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How huge does mvtest_huge need to be?

2017-05-03 Thread Kevin Grittner

On Wed, May 3, 2017 at 4:37 PM, Tom Lane  wrote:

> At this point I'm inclined to just delete the whole test.

ok

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How huge does mvtest_huge need to be?

2017-05-03 Thread Kevin Grittner

On Wed, May 3, 2017 at 12:08 PM, Tom Lane  wrote:

> So ... is there a good reason to be using a large table here, and
> if so what is it, and how big does the table really need to be
> to provide useful test coverage?

Hm.  This seems like a particularly useless size.  It would test a
possibly useful corner case if it was over 10MB so that it was over
vacuum's truncation threshold, but that would obviously be even
slower.  It doesn't seem justified.  How about 500 so it at least
goes to a second page which is then truncated to 1 page.

The "huge" in the object names then seems odd, of course.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-05-02 Thread Kevin Grittner

On Fri, Apr 28, 2017 at 10:56 PM, Tom Lane  wrote:

> They will fire if you have an INSTEAD OF row-level trigger; the existence
> of that trigger is what determines whether we implement DML on a view
> through the view's own triggers or through translation to an action on
> the underlying table.
>
> I do not think it'd be reasonable to throw an error for creation of
> a statement-level view trigger when there's no row-level trigger,
> because that just imposes a hard-to-deal-with DDL ordering dependency.
>
> You could make a case for having the updatable-view translation code
> print a WARNING if it notices that there are statement-level triggers
> that cannot be fired due to the translation.

Oh, I see -- you can add all the AFTER ... FOR EACH STATEMENT
triggers you want for an updatable view and they will quietly sit
there without firing no matter how many statements perform the
supposedly triggering action, but as soon as you add a INSTEAD OF
... FOR EACH ROW trigger they spring to life.  On the face of it that
seems to me to violate the POLA, but I kinda see how it evolved.

I need to look at this and the rather similar odd behavior under
inheritance.  I hope to post something Friday.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-01 Thread Kevin Grittner

On Mon, May 1, 2017 at 11:53 AM, Robert Haas  wrote:
> On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner  wrote:
>> On Mon, May 1, 2017 at 10:01 AM, Robert Haas  wrote:
>>> It seems pretty clear to me that this is busted.
>>
>> I don't think you actually tested anything that is dependent on any
>> of my patches there.
>
> I was testing which rows show up in a transition table, so I assumed
> that was related to the transition tables patch.  Note that this is
> not about which triggers are fired, just about how inheritance
> interacts with transition tables.

Yeah, I got confused a bit there, comparing to the updateable views case.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-01 Thread Kevin Grittner

On Mon, May 1, 2017 at 10:01 AM, Robert Haas  wrote:

> It seems pretty clear to me that this is busted.

I don't think you actually tested anything that is dependent on any
of my patches there.

> Adding this as an open item.  Kevin?

It will take some time to establish what legacy behavior is and how
the new transition tables are impacted.  My first reaction is that a
trigger on the parent should fire for any related action on a child
(unless maybe the trigger is defined with an ONLY keyword???) using
the TupleDesc of the parent.  Note that the SQL spec mandates that
even in a AFTER EACH ROW trigger the transition tables must
represent all rows affected by the STATEMENT.  I think that this
should be independent of triggers fired at the row level.  I think
the rules should be similar for updateable views.

This will take some time to investigate, discuss and produce a
patch.  I think best case is Friday.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-28 Thread Kevin Grittner

On Fri, Apr 28, 2017 at 3:54 PM, Kevin Grittner  wrote:

> Not firing a table's DML because it was fired off a view would be crazy,

should read:

Not firing a table's trigger because the trigger is on DML run from a
view's INSTEAD OF trigger would be crazy,

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-28 Thread Kevin Grittner

On Fri, Apr 28, 2017 at 2:42 PM, Tom Lane  wrote:
> Kevin Grittner  writes:
>> On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro
>>  wrote:
>>> For views, aside from the question of transition tables, I noticed
>>> that statement triggers don't seem to fire at all with updatable
>>> views.  Surely they should -- isn't that a separate bug?
>
>> I checked out 25dc142a (before any of my commits for $subject),
>> built it, and tried the above -- with no warning generated.  I then
>> used an UPDATE and DELETE against the view, also with no trigger
>> fired (nor any error during trigger creation or DML).  Does anyone
>> know whether such trigger ever fired at any point in the commit
>> history?
>
> [ experiments... ]  They did, and do, fire if you do it the old-style
> way with an INSTEAD OF row trigger.

Here is the table from near the start of CREATE TRIGGER docs,
"folded" such that I hope it remains intelligible in a fixed-width
font after email systems get done with it:

When
  Event
Row-level  Statement-level

BEFORE
  INSERT/UPDATE/DELETE
Tables and foreign tables  Tables, views, and foreign tables
  TRUNCATE
—  Tables

AFTER
  INSERT/UPDATE/DELETE
Tables and foreign tables  Tables, views, and foreign tables
  TRUNCATE
—  Tables

INSTEAD OF
  INSERT/UPDATE/DELETE
Views  —
  TRUNCATE
—  —

The claim seems to be that you can create a { BEFORE | AFTER } {
event [ OR ... ] } ...  FOR EACH STATEMENT trigger where event is {
INSERT | UPDATE | DELETE } on an updateable view.  Well, you can
*create* it, but it will never fire.

> They don't fire if you're relying
> on an updatable view.  It seems we fire the table's statement triggers
> instead, ie, the update is fully translated into an update on the
> underlying table.

Well, certainly that should *also* happen.  Not firing a table's DML
because it was fired off a view would be crazy, or so it seems to
me.

> I'm not sure how intentional that was, but it's not a completely
> unreasonable definition on its face, and given the lack of field
> complaints since 9.3, I think we should probably stick to it.

Are you talking about being able to create INSERT, UPDATE, and
DELETE triggers on the view (which never fire), or about firing
triggers on the table when an INSTEAD OF trigger is fired.

> However, if you didn't understand that from the documentation,
> then we have a documentation problem.

I totally get that there are INSTEAD OF triggers and have no quibble
with how they behave.  I can't even argue that the above chart is
wrong in terms of what CREATE TRIGGER allows.  However, creating
triggers that can never fire seems entirely wrong.

>> If we do get these working, don't they deserve at least
>> one regression test?
>
> Are you sure there isn't one?

Well, I was sort of hoping that the triggers that can now be defined
but can never fire *did* fire at some point.  But if that were true,
and they subsequently were broken, it would mean we lacked
regression tests for that case.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-28 Thread Kevin Grittner

On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro
 wrote:

> My colleague Prabhat Sahu reported off list that transition tables
> don't work for views.  I probably should have thought about that when
> I fixed something similar for partitioned tables, and after some
> experimentation I see that this is also broken for foreign tables.

Good spot.  Don't know why that didn't occur to me to look at.

> For foreign tables using postgres_fdw, I see that transition tables
> capture new rows for INSERT but capture nothing for DELETE and UPDATE.

Will dig into that in a bit, but first...

> For views, aside from the question of transition tables, I noticed
> that statement triggers don't seem to fire at all with updatable
> views.  Surely they should -- isn't that a separate bug?
>
> Example:
>
>   create table my_table (i int);
>   create view my_view as select * from my_table;
>   create function my_trigger_function() returns trigger language plpgsql as
> $$ begin raise warning 'hello world'; return null; end; $$;
>   create trigger my_trigger after insert or update or delete on my_view
> for each statement execute procedure my_trigger_function();
>   insert into my_view values (42);
>
> ... and the world remains ungreeted.

I checked out 25dc142a (before any of my commits for $subject),
built it, and tried the above -- with no warning generated.  I then
used an UPDATE and DELETE against the view, also with no trigger
fired (nor any error during trigger creation or DML).  Does anyone
know whether such trigger ever fired at any point in the commit
history?  Should we remove the documentation that anything except
INSTEAD OF triggers work on a view, and generate errors for attempt
to do otherwise, or is there something to salvage that has worked at
some point?  If we do get these working, don't they deserve at least
one regression test?

Will post again after I have a chance to review the FDW issue.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-04-17 Thread Kevin Grittner

On Tue, Mar 28, 2017 at 10:36 PM, Craig Ringer  wrote:

> Personally I have to agree that the learning curve is very steep. Some
> of the docs and presentations help, but there's a LOT to understand.

Some small patches can be kept to a fairly narrow set of areas, and
if you can find a similar capability to can crib technique for
handling some of the more mysterious areas it might brush up
against.  When I started working on my first *big* patch that was
bound to touch many areas (around the start of development for 9.1)
I counted lines of code and found over a million lines just in .c
and .h files.  We're now closing in on 1.5 million lines.  That's
not counting over 376,000 lines of documentation in .sgml files,
over 12,000 lines of text in README* files, over 26,000 lines of
perl code, over 103,000 lines of .sql code (60% of which is in
regression tests), over 38,000 lines of .y code (for flex/bison
parsing), about 9,000 lines of various type of code just for
generating the configure file, and over 439,000 lines of .po files
(for message translations).  I'm sure I missed a lot of important
stuff there, but it gives some idea the challenge it is to get your
head around it all.

My first advice is to try to identify which areas of the code you
will need to touch, and read those over.  Several times.  Try to
infer the API to areas *that* code needs to reference from looking
at other code (as similar to what you want to work on as you can
find), reading code comments and README  files, and asking
questions.  Secondly, there is a lot that is considered to be
"coding rules" that is, as far as I've been able to tell, only
contained inside the heads of veteran PostgreSQL coders, with
occasional references in the discussion list archives.  Asking
questions, proposing approaches before coding, and showing work in
progress early and often will help a lot in terms of discovering
these issues and allowing you to rearrange things to fit these
conventions.  If someone with the "gift of gab" is able to capture
these and put them into a readily available form, that would be
fantastic.

> * SSI (haven't gone there yet myself)

For anyone wanting to approach this area, there is a fair amount to
look at.  There is some overlap, but in rough order of "practical"
to "theoretical foundation", you might want to look at:

https://www.postgresql.org/docs/current/static/transaction-iso.html

https://wiki.postgresql.org/wiki/SSI

The SQL standard

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/storage/lmgr/README-SSI;hb=refs/heads/master

http://www.vldb.org/pvldb/vol5.html

http://hdl.handle.net/2123/5353

Papers cited in these last two.  I have found papers authored by
Alan Fekete or Adul Adya particularly enlightening.

If any of the other areas that Craig listed have similar work
available, maybe we should start a Wiki page where we list areas of
code (starting with the list Craig included) as section headers, and
put links to useful reading below each?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-10 Thread Kevin Grittner

On Mon, Apr 10, 2017 at 1:17 PM, Tom Lane  wrote:
> Kevin Grittner  writes:
>> On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane  wrote:
>>> I notice that the safe-snapshot code path is not paying attention to
>>> parallel-query cases, unlike the lock code path.  I'm not sure how
>>> big a deal that is...
>
>> Parallel workers in serializable transactions should be using the
>> transaction number of the "master" process to take any predicate
>> locks, and if parallel workers are doing any DML work against
>> tuples, that should be using the master transaction number for
>> xmin/xmax and serialization failure testing.
>
> Right, but do they record the master's PID rather than their own in
> the SERIALIZABLEXACT data structure?

There had better be just a single SERIALIZABLEXACT data structure
for a serializable transaction, or it just doesn't work.  I can't
see how it would have anything but the master's PID in it.  If
parallel execution is possible in a serializable transaction and
things are done any other way, we have a problem.

> Maybe it's impossible for a parallel worker to acquire its own
> snapshot at all, in which case this is moot.  But I'm nervous.

I have trouble seeing what sane semantics we could possibly have if
each worker for a parallel scan used a different snapshot -- under
any transaction isolation level.  I know that under the read
committed transaction isolation level we can follow xmax pointers to
hit tuples on the target relation which are not in the snapshot used
for the scan, but I don't see how that would negate the need to have
the scan itself run on a single snapshot,

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-10 Thread Kevin Grittner

On Mon, Apr 10, 2017 at 9:28 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Here's a pair of draft patches for review:

Thanks.

> Pushed with cosmetic improvements.

Thanks.

> I notice that the safe-snapshot code path is not paying attention to
> parallel-query cases, unlike the lock code path.  I'm not sure how
> big a deal that is...

Parallel workers in serializable transactions should be using the
transaction number of the "master" process to take any predicate
locks, and if parallel workers are doing any DML work against
tuples, that should be using the master transaction number for
xmin/xmax and serialization failure testing.  If either of those
rules are being violated, parallel processing should probably be
disabled within a serializable transaction.  I don't think
safe-snapshot processing needs to do anything special if the above
rules are followed, nor can I see anything special it *could* do.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-08 Thread Kevin Grittner

On Sat, Apr 8, 2017 at 12:56 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Sat, Apr 8, 2017 at 4:22 PM, Tom Lane  wrote:

>> Based on the above, here is a version that introduces a simple boolean
>> function pg_waiting_for_safe_snapshot(pid) and adds that the the
>> query.  This was my "option 1" upthread.
>
> Nah, this is not good either.  Yes, it's a fairly precise conversion
> of what Kevin's patch did, but I think that patch is wrong even
> without considering the performance angle.  If you look back at the
> discussions in Feb 2016 that led to what we had, it turned out to be
> important not just to be able to say that process X is blocked, but
> that it is blocked by one of the other isolationtest sessions, and
> not by some random third party (like autovacuum).  I do not know
> whether there is, right now, any way for autovacuum to be the guilty
> party for a SafeSnapshot wait --- but it does not seem like a good
> plan to assume that there never will be.

It would make no sense to run autovacuum at the serializable
transaction isolation level, and only overlapping read-write
serializable transactions can block the attempt to acquire a safe
snapshot.

> So I think what we need is functionality very much like what you had
> in the prior patch, ie identify the specific PIDs that are causing
> process X to wait for a safe snapshot.  I'm just not happy with how
> you packaged it.
>
> Here's a sketch of what I think we ought to do:
>
> 1. Leave pg_blocking_pids() alone; it's a published API now.

Fair enough.

> 2. Add GetSafeSnapshotBlockingPids() more or less as you had it
> in the previous patch (+ Kevin's recommendations).  There might be
> value in providing a SQL-level way to call that, but I'm not sure,
> and it would be independent of fixing isolationtester anyway.

It seems entirely plausible that someone might want to know what is
holding up the start of a backup or large report which uses the READ
ONLY DEFERRABLE option, so I think there is a real use case for a
documented SQL function similar to pg_blocking_pids() to show the
pids of connections currently running transactions which are holding
things up.  Of course, they may not initially know whether it is
being blocked by heavyweight locks or concurrent serializable
read-write transactions, but it should not be a big deal to run two
separate functions.

Given the inability to run isolation tests to cover the deferrable
code, we used a variation on DBT-2 that could cause serialization
anomalies to generate a high concurrency saturation run using
serializable transactions, and started a SERIALIZABLE READ ONLY
DEFERRABLE transaction 1200 times competing with this load, timing
how long it took to start.  To quote the VLDB paper[1], "The median
latency was 1.98 seconds, with 90% of transactions able to obtain a
safe snapshot within 6 seconds, and all within 20 seconds. Given the
intended use (long-running transactions), we believe this delay is
reasonable."  That said, a single long-running serializable
read-write transaction could hold up the attempt indefinitely --
there is no maximum bound.  Hence the benefit of a SQL function to
find out what's happening.

> 3. Invent a SQL function that is dedicated specifically to supporting
> isolationtester and need not be documented at all; this gets us out
> of the problem of whether it's okay to whack its semantics around
> anytime somebody thinks of something else they want to test.
> I'm imagining an API like
>
>   isolation_test_is_waiting_for(int, int[]) returns bool
>
> so that isolationtester's query would reduce to something like
>
> SELECT pg_catalog.isolation_test_is_waiting_for($1, '{...}')
>
> which would take even more cycles out of the parse/plan overhead for it
> (which is basically what's killing the CCA animals).  Internally, this
> function would call pg_blocking_pids and, if necessary,
> GetSafeSnapshotBlockingPids, and would check for matches in its array
> argument directly; it could probably do that significantly faster than the
> general-purpose array && code.  So we'd have to expend a bit of backend C
> code, but we'd have something that would be quite speedy and we could
> customize in future without fear of breaking behavior that users are
> depending on.

Good suggestion.

Thomas, would you like to produce a patch along these lines, or
should I?

--
Kevin Grittner

[1]  Dan R. K. Ports and Kevin Grittner. 2012.
 Serializable Snapshot Isolation in PostgreSQL.
 Proceedings of the VLDB Endowment, Vol. 5, No. 12.
 The 38th International Conference on Very Large Data Bases,
 August 27th - 31st 2012, Istanbul, Turkey.
 http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 9:24 PM, Thomas Munro
 wrote:

> 2.  Did I understand correctly that it is safe to scan the list of
> SERIALIZABLEXACTs and access the possibleUnsafeConflicts list while
> holding only SerializableXactHashLock,

Yes.

> and that 'inLink' is the correct link to be following?

If you're starting from the blocked (read-only) transaction (which
you are), inLink is the one to follow.

Note: It would be better form to use the SxactIsDeferrableWaiting()
macro than repeat the bit-testing code directly in your function.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-04-07 Thread Kevin Grittner

On Mon, Feb 27, 2017 at 7:35 AM, Dagfinn Ilmari Mannsåker
 wrote:
> Kevin Grittner  writes:
>
>> It occurred to me that it would make sense to allow these settings
>> to be attached to a database or table (though *not* a role).  I'll
>> look at that when I get back if you don't get to it first.
>
> Attached is a draft patch (no docs or tests) on top of the previous one,
> adding reloptions for the per-relation and per-page thresholds.  That
> made me realise that the corresponding GUCs don't need to be PGC_SIGHUP,
> but could be PGC_SUSET or PGC_USERSET.  I'll adjust the original patch
> if you agree.  I have not got around around to adding per-database
> versions of the setting, but could have a stab at that.

Unfortunately, I was unable to get the follow-on patch to allow
setting by relation into a shape I liked.  Let's see what we can do
for the next release.  The first patch was applied with only very
minor tweaks.  I realize that nothing would break if individual
users could set their granularity thresholds on individual
connections, but as someone who has filled the role of DBA, the
thought kinda made my skin crawl.  I left it as PGC_SIGHUP for now;
we can talk about loosening that up later, but I think we should
have one or more use-cases that outweigh the opportunities for
confusion and bad choices by individual programmers to justify that.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 3:47 PM, Thomas Munro
 wrote:
> On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner  wrote:
>> On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund  wrote:
>>
>>> I'd rather fix the issue, than remove the tests entirely.  Seems quite
>>> possible to handle blocking on Safesnapshot in a similar manner as 
>>> pg_blocking_pids?
>>
>> I'll see what I can figure out.
>
> Ouch.  These are the other ways I thought of to achieve this:
>
> https://www.postgresql.org/message-id/CAEepm%3D1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN%3D0LY3pJ49Ksg%40mail.gmail.com
>
> I'd be happy to write one of those, but it may take a day as I have
> some other commitments.

Please give it a go.  I'm dealing with putting out fires with
customers while trying to make sure I have tested the predicate
locking GUCs patch sufficiently.  (I think it's ready to go, and has
passed all tests so far, but there are a few more I want to run.)
I'm not sure I can come up with a solution faster than you, given
that.  Since it is an improvement to performance for a test-only
environment on a feature that is already committed and not causing
problems for production environments, hopefully people will tolerate
a fix a day or two out.  If not, we'll have to revert and get it
into the first CF for v11.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 1:37 PM, Andres Freund  wrote:

> I think it makes sense to go through those and see whether it's
> realistic to commit any of them.
>
> Ready for Committer:
>
> Add GUCs for predicate lock promotion thresholds:
> - claimed by Kevin, should be easy enough

I was planning on pushing this today.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund  wrote:

> I'd rather fix the issue, than remove the tests entirely.  Seems quite
> possible to handle blocking on Safesnapshot in a similar manner as 
> pg_blocking_pids?

I'll see what I can figure out.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 12:28 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 04/07/2017 12:57 PM, Andres Freund wrote:
>>> I don't think any recent changes are supposed to affect deadlock
>>> detector behaviour?
>
>> Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
>> recent changes have made the isolation tests run much much longer.
>
> Ouch.  I see friarbird's run time for the isolation tests has gone from an
> hour and change to over 5 hours in one fell swoop.  hyrax not much better.
> Oddly, non-CCA animals don't seem to have changed much.
>
> Eyeing recent patches, it seems like the culprit must be Kevin's
> addition to isolationtester's wait query:

Ouch.  Without this we don't have regression test coverage for the
SERIALIZABLE READ ONLY DEFERRABLE code, but it's probably not worth
adding 4 hours to any tests, even if it only shows up with
CLOBBER_CACHE_ONLY.  I assume the consensus is that I should revert
it?

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC] Push-based query executor discussion

2017-04-06 Thread Kevin Grittner

Sorry, I didn't notice that this was going to a public list.  That URL
is only available to people who signed up as mentors for PostgreSQL
GSoC participation this year.  Does the link to the draft work for you?

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC] Push-based query executor discussion

2017-04-06 Thread Kevin Grittner

On Thu, Apr 6, 2017 at 8:11 AM, Alexander Korotkov
 wrote:

>> https://docs.google.com/document/d/1dvBETE6IJA9AcXd11XJNPsF_VPcDhSjy7rlsxj262l8/edit?usp=sharing

> I'd love to see a comment from Andres Freund who is leading executor
> performance improvements.

Note that the final proposal is here:

https://summerofcode.withgoogle.com/serve/5874530240167936/

Also, I just entered a comment about an important question that I
think needs to be answered right up front.

-- 
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-04-04 Thread Kevin Grittner

On Mon, Apr 3, 2017 at 7:16 PM, Thomas Munro
 wrote:
> On Tue, Apr 4, 2017 at 3:41 AM, Kevin Grittner  wrote:
>> On Mon, Apr 3, 2017 at 8:59 AM, Tom Lane  wrote:
>>> Thomas Munro  writes:
>>>> Or perhaps the code to inject trigger data transition tables into SPI
>>>> (a near identical code block these three patches) should be somewhere
>>>> common so that each PLs would only need to call a function.  If so,
>>>> where should that go?
>>>
>>> spi.c?
>>
>> Until now, trigger.c didn't know about SPI, and spi.c didn't know
>> about triggers.  The intersection was left to referencing code, like
>> PLs.  Is there any other common code among the PLs dealing with this
>> intersection?  If so, maybe a new triggerspi.c file (or
>> spitrigger.c?) would make sense.  Possibly it could make sense from
>> a code structure PoV even for a single function, but it seems kinda
>> iffy for just this function.  As far as I can see it comes down to
>> adding it to spi.c or creating a new file -- or just duplicating
>> these 30-some lines of code to every PL.
>
> Ok, how about SPI_register_trigger_data(TriggerData *)?  Or any name
> you prefer...  I didn't suggest something as specific as
> SPI_register_transition_tables because think it's plausible that
> someone might want to implement SQL standard REFERENCING OLD/NEW ROW
> AS and make that work in all PLs too one day, and this would be the
> place to do that.
>
> See attached, which add adds the call to all four built-in PLs.  Thoughts?

Worked on the docs some more and then pushed it.

Nice job cutting the number of *.[ch] lines by 30 while adding support for
the other three core PLs.  :-)

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SERIALIZABLE with parallel query

2017-04-03 Thread Kevin Grittner

On Fri, Mar 10, 2017 at 8:19 PM, Thomas Munro
 wrote:
> On Wed, Feb 22, 2017 at 2:01 PM, Robert Haas  wrote:
>> I don't think I know enough about the serializable code to review
>> this, or at least not quickly, but it seems very cool if it works.
>> Have you checked what effect it has on shared memory consumption?
>
> I'm not sure how to test that.  Kevin, can you provide any pointers to
> the test workloads you guys used when developing SSI?

During development there was first and foremost the set of tests
which wound up implemented in the isolation testing environment
developed by Heikki, although for most of the development cycle
these tests were run by a python tool written by Markus Wanner
(which was not as fast as Heikki's C-based tool, but provided a lot
more detail in case of failure -- so it was very nice to have until
we got down to polishing things).

The final stress test to chase out race conditions and the
performance benchmarks were all run by Dan Ports on big test
machines at MIT.  I'm not sure how much I can say to elaborate on
what is in section 8 of this paper:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

I seem to remember that he had some saturation run versions of the
"DBT-2++" tests, modified to monitor for serialization anomalies
missed by the implementation, which he sometimes ran for days at a
time on MIT testing machines.  There were some very narrow race
conditions uncovered by this testing, which at high concurrency on a
16 core machine might hit a particular problem less often than once
per day.

I also remember that both Anssi Kääriäinen and Yamamoto Takahashi
did a lot of valuable testing of the patch and found problems that
we had missed.  Perhaps they can describe their testing or make
other suggestions.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"

2017-04-03 Thread Kevin Grittner

On Sat, Apr 1, 2017 at 9:03 AM, anant khandelwal  wrote:

> My name is Anant Khandelwal currently i am pursuing masters from IIT - Delhi
> and previously i am a software engineer.
>
> I am particularly interested in working on the project "Explicitly support
> predicate locks in index access methods besides b tree".

Anant,

Your post was mostly identical (as in copy/paste level identical) to a
post by Shubham Barai four days earlier.

https://www.postgresql.org/message-id/calxaepvrcjzz0sj2kb_ghatrrdej08rygurftr5nuqxc6ut...@mail.gmail.com

https://www.postgresql.org/message-id/CAD=a8sbzwgd4u-tasjp0owtpileffhoz5o5ev0um6digqbv...@mail.gmail.com

Unless you can produce convincing proof to the contrary, your proposal
will be disqualified because of plagiarism.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-04-03 Thread Kevin Grittner

On Mon, Apr 3, 2017 at 8:59 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Or perhaps the code to inject trigger data transition tables into SPI
>> (a near identical code block these three patches) should be somewhere
>> common so that each PLs would only need to call a function.  If so,
>> where should that go?
>
> spi.c?

Until now, trigger.c didn't know about SPI, and spi.c didn't know
about triggers.  The intersection was left to referencing code, like
PLs.  Is there any other common code among the PLs dealing with this
intersection?  If so, maybe a new triggerspi.c file (or
spitrigger.c?) would make sense.  Possibly it could make sense from
a code structure PoV even for a single function, but it seems kinda
iffy for just this function.  As far as I can see it comes down to
adding it to spi.c or creating a new file -- or just duplicating
these 30-some lines of code to every PL.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"

2017-04-01 Thread Kevin Grittner

;_hash_first()
> ->_hash_readnext()
> ->_hash_readprev()
>
> 2.CheckForSerializableConflictIn()
>
> -> _hash_doinsert()
>
> 3. PredicateLockPageSplit()

I have not looked in detail at this point, but that seems plausible
and indicates admirable research effort in drafting this proposal.

> This is just an idea also i understand that  Page level predicate locking
> exists in the btree AM, where it required the addition of 17 function calls
> to implement, but remains unimplemented in the gin, gist, spgist, brin, and
> hash index AMs. So we nned to insert function calls at other places also.

Exactly.

>  Also tell me can we evaluate the performance on PostgreSQL on the following
> workloads
>
> SIBENCH microbenchMark
> TPC-c++

Those are good, but pgbench (both read-only and read-write loads) is
popular to include because any pg hacker can duplicate the tests.

https://www.postgresql.org/docs/current/static/pgbench.html

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-31 Thread Kevin Grittner

On Fri, Mar 31, 2017 at 12:58 PM, Robert Haas  wrote:
> On Fri, Mar 31, 2017 at 1:50 PM, Tom Lane  wrote:

>> I would vote for calling the struct typedef EphemeralNamedRelation and
>> using the abbreviation ENR (capitalized that way, not as Enr) in related
>> function names, where that seemed sensible.  I really do *not* like
>> "Enr" as a global name.
>
> Yeah, I had the same thought about capitalization but wasn't sure if
> it was worth suggesting.  But since Tom did, +1 from me.

Will do.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-31 Thread Kevin Grittner

On Thu, Mar 30, 2017 at 11:51 AM, Kevin Grittner  wrote:

> New version attached.  It needs some of these problem cases added to
> the testing, and a mention in the docs that only C and plpgsql
> triggers can use the feature so far.  I'll add those tomorrow.

Done and attached.

Now the question is, should it be pushed?  It's been through just
about every CF in the last three years with little modification, and
finally got a thorough enough review in this CF that I think it can
be considered.  Here are the numbers:

 85 files changed, 2266 insertions(+), 132 deletions(-)

Of that, 70 lines are the plpgsql implementation (which I should
probably push separately), about 200 lines are docs and 623 lines
are new regression tests.  Most of the rest only comes into play if
the feature is used.

This adds support for SQL standard sub-feature, although only in
triggers written in C and plpgsql.  (Other PLs will probably require
fewer lines than plpgsql.)  It also provides infrastructure needed
to get incremental maintenance of materialized views based on just
simple declarative DDL.  Tom has expressed hope that it could be
used to improve performance and memory usage for AFTER triggers, and
I believe it can, but that that should be a follow-on patch.  It
might provide the basis of set-based statement-level enforcement of
referential integrity, with the regression tests providing a rough
proof of concept.

My inclination is to push it late today, but be ready to revert if
there are any hard-to-fix surprise problems missed in review and
testing; but if the general preference is to hold it for version 11,
that's OK with me, too.

--
Kevin Grittner

transition-v14.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-31 Thread Kevin Grittner

On Thu, Mar 23, 2017 at 7:14 PM, Thomas Munro
 wrote:

> my only other comment would be a bikeshed issue:
> Enr isn't a great name for a struct.

I know, but EphemeralNamedRelation starts to get kinda long,
especially when making the normal sorts of concatenations.  I
started making the change and balked when I saw things like
EphemeralNamedRelationMetadataData and a function named
EphemeralNamedRelationMetadataGetTupDesc() in place of
EnrmdGetTupDesc().  A 40 character function name make for a lot of
line-wrapping to stay within pgindent limits.  Any suggestions?

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-30 Thread Kevin Grittner

On Thu, Mar 23, 2017 at 11:36 PM, Thomas Munro
 wrote:

> One more thought: should this be allowed?
>
> postgres=# create table mytab (i int) partition by list (i);
> CREATE TABLE
> postgres=# create table mytab1 partition of mytab for values in (42);
> CREATE TABLE
> postgres=# create function my_trigger_function() returns trigger as $$
> begin end; $$ language plpgsql;
> CREATE FUNCTION
> postgres=# create trigger my_trigger after update on mytab referencing
> old table as my_old for each statement execute procedure
> my_trigger_function();
> CREATE TRIGGER

> Perhaps the moral equivalent should be possible for statement triggers
> with transition tables, and that already works with your patch as far
> as I know.  So I think your patch probably just needs to reject them
> on partitioned tables.

> [patch provided]

Yeah, that looks good.  Included in next patch version.


On Sun, Mar 26, 2017 at 6:39 AM, Thomas Munro
 wrote:

> BTW I had to make the following change to your v12 because of commit b8d7f053:

Yeah, I ran into that, too, and used exactly the same fix.


On Sun, Mar 26, 2017 at 6:39 AM, Thomas Munro
 wrote:
> On Fri, Mar 24, 2017 at 1:14 PM, Thomas Munro

>> When PlanCacheRelCallback runs, I don't think it understands that
>> these named tuplestore RangeTblEntry objects are dependent on the
>> subject table.  Could that be fixed like this?
>>
>> @@ -2571,6 +2582,9 @@ extract_query_dependencies_walker(Node *node,
>> PlannerInfo *context)
>> if (rte->rtekind == RTE_RELATION)
>> context->glob->relationOids =
>>
>> lappend_oid(context->glob->relationOids, rte->relid);
>> +   else if (rte->rtekind == RTE_NAMEDTUPLESTORE)
>> +   context->glob->relationOids =
>> +
>> lappend_oid(context->glob->relationOids, [subject table's OID]);
>
> I'm not sure if this is the right approach and it may have style
> issues, but it does fix the crashing in the ALTER TABLE case I
> reported: see attached patch which applies on top of your v12.

I had been working along similar lines, but had not gotten it
working.  Merged your version and mine, taking the best of both.
:-)

Thanks for the reviews and the fixes!

New version attached.  It needs some of these problem cases added to
the testing, and a mention in the docs that only C and plpgsql
triggers can use the feature so far.  I'll add those tomorrow.

--
Kevin Grittner


transition-v13.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [pgsql-students] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"

2017-03-30 Thread Kevin Grittner

On Tue, Mar 28, 2017 at 12:36 PM, Shubham Barai
 wrote:

> * Hash Index

> currently, I am trying to understand how the splitting of bucket works.
> should we acquire predicate lock on every page from an old and new bucket in
> case there is a predicate lock on a page of an old bucket?

For page level locking in hash indexes, I think it might be fine to
lock the first page of a bucket.

Also, in case you missed it, here are some guidelines I suggested.
There weren't any comments, which means they probably didn't offend
anyone else too badly.  They're just my opinion, but you might want
to consider them:

Each GSoC student proposal should be a PDF file of 6 to 8 pages.  In
the end, Google will publish these documents on a web page, so the
student should make each proposal something which they will be happy
to have future potential employers review.

Some ideas for desirable content:

  - A resume or CV of the student, including any prior GSoC work
  - Their reasons for wanting to participate
  - What else they have planned for the summer, and what their time
commitment to the GSoC work will be
  - A clear statement that there will be no intellectual property
problems with the work they will be doing -- that the PostgreSQL
community will be able to use their work without encumbrances
(e.g., there should be no agreements related to prior or
ongoing work which might assign the rights to the work they do
to someone else)
  - A description of what they will do, and how
  - Milestones with dates
  - What they consider to be the test that they have successfully
completed the project

-- 
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [pgsql-students] GSoC 2017 Proposal for "Explicitly support predicate locks in index access methods besides btree"

2017-03-30 Thread Kevin Grittner

On Tue, Mar 28, 2017 at 12:36 PM, Shubham Barai
 wrote:

> My name is Shubham Barai and I am a final year student at Maharashtra
> Institute of Technology, Pune, India. I am very interested in contributing
> Postgresql this year through GSoC project.

Welcome!  Sorry I didn't spot this post earlier -- I'm behind on
reading the community lists and this didn't trigger any of the phrases
that pop things out to my attention right away.

> I am particularly interested in working on the project "Explicitly support
> predicate locks in index access methods besides btree". I have gone through
> some research papers which were recommended on
> https://wiki.postgresql.org/wiki/GSoC_2017 to understand the concept of
> Serializable Snapshot Isolation used in PostgreSQL. I have also browsed
> through the codebase to get some idea of different access methods for gin,
> gist, and hash indexes. I want to discuss my proposal to get some feedback
> before I write my final proposal. Sorry, I am discussing my proposal little
> late. I was really busy in my academics.

Understandable, but please be careful to get your final proposal in by
the deadline.  Deadlines in GSoC are not flexible.

> Currently, only B+-trees support page level predicate locking.For other
> indexes, it acquires relation level lock which can lead to unnecessary
> serialization failure due to rw dependency caused by any insert into the
> index. So, the main task of this project is to support page level predicate
> locking for remaining indexes.

Right.

> [calls out several places that specific calls to predicate locking functions 
> are needed]

> There may be a lot of other places where we need to insert function calls
> for predicate locking that I haven't included yet. I didn't go into details
> of every index AM.

That will be about half the work of the project.  It is fine to
identify examples for your proposal, to show that you know what to
look for, but tracking down every last location can be completed after
the proposal is accepted.  The other half of the work will be testing
and responding to issues others might raise.

> can anyone help me find existing tests for b-tree?

I think this should be it:

kgrittn@kevin-desktop:~/pg/master$ find src/backend/access/nbtree
-type f -name '*.c' | grep -v '/tmp_check/' | grep -v '/Debug/' |
xargs egrep -n 'PredicateLock|SerializableConflict'
src/backend/access/nbtree/nbtinsert.c:201:
CheckForSerializableConflictIn(rel, NULL, buf);
src/backend/access/nbtree/nbtinsert.c:402:
 CheckForSerializableConflictIn(rel, NULL, buf);
src/backend/access/nbtree/nbtinsert.c:784:
PredicateLockPageSplit(rel,
src/backend/access/nbtree/nbtsearch.c:1040:
PredicateLockRelation(rel, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1052:
PredicateLockPage(rel, BufferGetBlockNumber(buf),
src/backend/access/nbtree/nbtsearch.c:1483:
 PredicateLockPage(rel, blkno, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1578:
 PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf),
scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1869:
PredicateLockRelation(rel, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1874: PredicateLockPage(rel,
BufferGetBlockNumber(buf), scan->xs_snapshot);
src/backend/access/nbtree/nbtpage.c:1410:
PredicateLockPageCombine(rel, leafblkno, leafrightsib);

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-30 Thread Kevin Grittner

On Wed, Mar 29, 2017 at 11:17 PM, Mengxing Liu
 wrote:
> Thanks, I've updated the proposal. Just one issue:
> I agree that we can make skip list a general data structure.  But
> can we use the fixed-level skip list as a Plan B? Or a quick attempt
> before the general data structure ?
> Because I am not familiar with shared memory structure and tricks
> used in it, and I cannot estimate how much time it would take.

It's not really too bad for fixed allocation shared memory, and I
can help with that.  If I thought it would save much I could see
doing a prototype without generalization, but you would still have
most of the same shared memory issues, since the structure *must*
live in shared memory.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-27 Thread Kevin Grittner

On Sat, Mar 25, 2017 at 9:24 PM, Mengxing Liu
 wrote:

> I've updated the proposal according to your comments.
> But I am still wondering that can you review it for a double-check
> to make sure I've made everything clear?

Additional comments added.

I'm afraid a few new issues came to mind reading it again.  (Nothing
serious; just some points that could benefit from a little
elaboration.)

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-24 Thread Kevin Grittner

On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu
 wrote:

> I've finished a draft proposal for "Eliminate O(N^2) scaling from
> rw-conflict tracking in serializable transactions".

I've attached some comments to the document; let me know if they
don't show up for you or you need clarification.

Overall, if the comments are addressed, I think it is great.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-23 Thread Kevin Grittner

On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu
 wrote:

> I've finished a draft proposal for "Eliminate O(N^2) scaling from
> rw-conflict tracking in serializable transactions".  You can find
> it from GSOC website or by the link below.
> https://docs.google.com/document/d/17TAs3EJIokwPU7UTUmnlVY3ElB-VHViyX1zkQJmrD1A/edit?usp=sharing
>
> I was wondering if you have time to review the proposal and give me some 
> comments?

Will take a look and give you an initial review in a day or two.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Candidate for local inline function?

2017-03-17 Thread Kevin Grittner

On Fri, Mar 17, 2017 at 3:23 PM, Andres Freund  wrote:
> On 2017-03-17 15:17:33 -0500, Kevin Grittner wrote:
>> Why do we warn of a hazard here instead of eliminating said hazard
>> with a static inline function declaration in executor.h?
>
> Presumably because it was written long before we started relying on
> inline functions :/

Right.  git blame says it was changed in 2004.

>> /*
>>  * ExecEvalExpr was formerly a function containing a switch statement;
>>  * now it's just a macro invoking the function pointed to by an ExprState
>>  * node.  Beware of double evaluation of the ExprState argument!
>>  */
>> #define ExecEvalExpr(expr, econtext, isNull) \
>> ((*(expr)->evalfunc) (expr, econtext, isNull))
>>
>> Should I change that to a static inline function doing exactly what
>> the macro does?  In the absence of multiple evaluations of a
>> parameter with side effects, modern versions of gcc have generated
>> the same code for a macro versus a static inline function, at least
>> in the cases I checked.
>
> I'm absolutely not against changing this to an inline function, but I'd
> prefer if that code weren't touched quite right now, there's a large
> pending patch of mine in the area.  If you don't mind, I'll just include
> the change there, rather than have a conflict?

Fine with me.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Candidate for local inline function?

2017-03-17 Thread Kevin Grittner

Why do we warn of a hazard here instead of eliminating said hazard
with a static inline function declaration in executor.h?

/*
 * ExecEvalExpr was formerly a function containing a switch statement;
 * now it's just a macro invoking the function pointed to by an ExprState
 * node.  Beware of double evaluation of the ExprState argument!
 */
#define ExecEvalExpr(expr, econtext, isNull) \
((*(expr)->evalfunc) (expr, econtext, isNull))

Should I change that to a static inline function doing exactly what
the macro does?  In the absence of multiple evaluations of a
parameter with side effects, modern versions of gcc have generated
the same code for a macro versus a static inline function, at least
in the cases I checked.

--
Kevin Grittner

[HACKERS] Guidelines for GSoC student proposals

2017-03-17 Thread Kevin Grittner

I've found various sources that give hints about what a student
proposal should look like, but nothing I could just give as a link,
so I pulled together what I could find, tempered by my own ideas and
opinions.  I suggest that we send the below, or something like it to
each student who expresses interest in making a proposal, or who
submits a proposal that doesn't meet the below guidelines.  Thoughts
or suggestions for changes before we do?  Remember, time is short,
so this cannot be a 200 message bike-shedding debate -- we just need
to provide some sort of guidance to students in a timely way, with
the timeline being:

  February 27 - March 20
Potential student participants discuss application ideas with
mentoring organizations
  March 20 16:00 UTC
Student application period opens
  April 3 16:00 UTC
Student application deadline

Each GSoC student proposal should be a PDF file of 6 to 8 pages.  In
the end, Google will publish these documents on a web page, so the
student should make each proposal something which they will be happy
to have future potential employers review.

Some ideas for desirable content:

  - A resume or CV of the student, including any prior GSoC work
  - Their reasons for wanting to participate
  - What else they have planned for the summer, and what their time
commitment to the GSoC work will be
  - A clear statement that there will be no intellectual property
problems with the work they will be doing -- that the PostgreSQL
community will be able to use their work without encumbrances
(e.g., there should be no agreements related to prior or
ongoing work which might assign the rights to the work they do
to someone else)
  - A description of what they will do, and how
  - Milestones with dates
  - What they consider to be the test that they have successfully
completed the project

Note that a student proposal is supposed to be far more detailed
than the ideas for projects provided by the organization -- those
are intended to be ideas for what the student might write up as a
proposal, not ready-to-go proposal documents.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-15 Thread Kevin Grittner

On Wed, Mar 15, 2017 at 11:35 AM, Mengxing Liu
 wrote:

>> On a NUMA machine It is not at all unusual to see bifurcated results
>> -- with each run coming in very close to one number or a second
>> number, often at about a 50/50 rate, with no numbers falling
>> anywhere else.  This seems to be based on where the processes and
>> memory allocations happen to land.
>>
>
> Do you mean that for a NUMA machine, there usually exists two
> different results of its performance?
> Just two? Neither three nor four?

In my personal experience, I have often seen two timings that each
run randomly matched; I have not seen nor heard of more, but that
doesn't mean it can't happen.  ;-)

> At first, I will compile and install PostgreSQL by myself and try
> the profile tools (perf or oprofile).

perf is newer, and generally better if you can use it.  Don't try to
use either on HP hardware -- the BIOS uses some of the same hardware
registers that other manufacturers leave for use of profilers; an HP
machine is likely to freeze or reboot if you try to run either of
those profilers under load.

> Then I will run one or two benchmarks using different config,
> where I may need your help to ensure that my tests are close to the
> practical situation.

Yeah, we should talk about OS and PostgreSQL configuration before
you run any benchmarks.  Neither tends to come configured as I would
run a production system.

> PS: Disable NUMA in BIOS means that CPU can use its own memory
> controller when accessing local memory to reduce hops.

NUMA means that each CPU chip directly controls some of the RAM
(possibly with other, non-CPU controllers for some RAM).  The
question is whether the BIOS or the OS controls the memory
allocation.  The OS looks at what processes are on what cores and
tries to use "nearby" memory for allocations.  This can be pessimal
if the amount of RAM that is under contention is less than the size
of one memory segment, since all CPU chips need to ask the one
managing that RAM for each access.  In such a case, you actually get
best performance using a cpuset which just uses one CPU package and
the memory segments directly managed by that CPU package.  Without
the cpuset you may actually see better performance for this workload
by letting the BIOS interleave allocations, which spreads the RAM
allocations around to memory managed by all CPUs, and no one CPU
becomes the bottleneck.  The access is still not uniform, but you
dodge a bottleneck -- albeit less efficiently than using a custom
cpuset.

> On the contrary, "enable" means UMA.

No.  Think about this: regardless of this BIOS setting each RAM
package is directly connected to one CPU package, which functions as
its memory controller.  Most of the RAM used by PostgreSQL is for
disk buffers -- shared buffers in shared memory and OS cache.
Picture processes running on different CPU packages accessing a
single particular shared buffer.  Also picture processes running on
different CPU packages using the same spinlock at the same time.  No
BIOS setting can avoid the communications and data transfer among
the 8 CPU packages, and the locking of the cache lines.

> Therefore, I think Tony is right, we should disable this setting.
>
> I got the information from here.
> http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/

Ah, that explains it.  There is no such thing as "disabling NUMA" --
you can have the BIOS force interleaving, or you can have the BIOS
leave the NUMA memory assignment to the OS.  I was assuming that by
"disabling NUMA" you meant to have BIOS control it through
interleaving.  You meant the opposite -- disabling the BIOS override
of OS NUMA control.  I agree that we should leave NUMA scheduling to
the OS. There are, however, some non-standard OS configuration
options that allow NUMA to behave better with PostgreSQL than the
defaults allow.  We will need to tune a little.

The author of that article seems to be assuming that the usage will
be with applications like word processing, spreadsheets, or browsers
-- where the OS can place all the related processes on a single CPU
package and all (or nearly all) memory allocations can be made from
associated memory -- yielding a fairly uniform and fast access when
the BIOS override is disabled.  On a database product which wants to
use all the cores and almost all of the memory, with heavy
contention on shared memory, the situation is very different.
Shared resource access time is going to be non-uniform no matter
what you do.  The difference is that the OS can still make *process
local* allocations from nearby memory segments, while BIOS cannot.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-15 Thread Kevin Grittner

On Tue, Mar 14, 2017 at 3:45 PM, Kevin Grittner  wrote:
>> On 3/14/17 17:34, Mengxing Liu wrote:
>>> There are several alternative benchmarks. Tony suggested that we
>>> should use TPC-E and TPC-DS.
>
> More benchmarks is better, all other things being equal.  Keep in
> mind that good benchmarking practice with PostgreSQL generally
> requires a lot of setup time (so that we're starting from the exact
> same conditions for every run), a lot of run time (so that the
> effects of vacuuming, bloat, and page splitting all comes into play,
> like it would in the real world), and a lot of repetitions of each
> run (to account for variation).  In particular, on a NUMA machine it
> is not at all unusual to see bifurcated

Sorry I didn't finish that sentence.

On a NUMA machine It is not at all unusual to see bifurcated results
-- with each run coming in very close to one number or a second
number, often at about a 50/50 rate, with no numbers falling
anywhere else.  This seems to be based on where the processes and
memory allocations happen to land.

Different people have dealt with this in different ways.  If you can
get five or more runs of a given test, perhaps the best is to throw
away the high and low values and average the rest.  Other approaches
I've seen are to average all, take the median, or take the best
result.  The worst result of a set of runs is rarely interesting, as
it may be due to some periodic maintenance kicking in (perhaps at
the OS level and not related to database activity at all).

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-14 Thread Kevin Grittner

On Tue, Mar 14, 2017 at 6:00 AM, DEV_OPS  wrote:
> On 3/14/17 17:34, Mengxing Liu wrote:

>>>>> The worst problems have been
>>>>> seen with 32 or more cores on 4 or more sockets with a large number
>>>>> of active connections.  I don't know whether you have access to a
>>>>> machine capable of putting this kind of stress on it (perhaps at
>>>>> your university?), but if not, the community has access to various
>>>>> resources we should be able to schedule time on.
>>>> There is a NUMA machine ( 120 cores, 8 sockets) in my lab.
>>> Fantastic!  Can you say a bit more about the architecture and OS?
>>
>> Intel(R) Xeon(R) CPU at 2.3GHz, with 1TB physical DRAM and 1.5 TB
>> SSD, running Ubuntu 14.04, Kernel 3.19.
>> I guess NUMA is disabled in BIOS, but I will check that.

I'm not sure what it would mean to "disable" NUMA -- if the CPU
chips are each functioning as memory controller for a subset of the
RAM you will have non-uniform memory access speeds from any core to
different RAM locations.  You can switch it to "interleaved" access,
so the speed of access from a core to any logical memory address
will be rather random, rather than letting the OS try to arrange
things so that processes do most of their access to memory that is
faster for them.  It is generally better to tune NUMA in the OS than
to randomize things at the BIOS level.

>> However, there is only one NIC, so network could be the
>> bottleneck if we use too many cores?

Well, if we run the pgbench client on the database server box, the
NIC won't matter at all.  If we move the client side to another box,
I still think that when we hit this problem, it will dwarf any
impact of the NIC throughput.

> The configuration is really cool, for the SSD, is it SATA interface?
> NVMe interface, or is PCIe Flash? different SSD interface had different
> performance characters.

Yeah, knowing model of SSD, as well as which particular Xeon we're
using, would be good.

> for the NUMA, because the affinity issue will impact PostgreSQL
> performance. so, Please disabled it if possible

I disagree.  (see above)

>> There are several alternative benchmarks. Tony suggested that we
>> should use TPC-E and TPC-DS.

More benchmarks is better, all other things being equal.  Keep in
mind that good benchmarking practice with PostgreSQL generally
requires a lot of setup time (so that we're starting from the exact
same conditions for every run), a lot of run time (so that the
effects of vacuuming, bloat, and page splitting all comes into play,
like it would in the real world), and a lot of repetitions of each
run (to account for variation).  In particular, on a NUMA machine it
is not at all unusual to see bifurcated

>> Personally, I am more familiar with TPC-C.

Unfortunately, the TPC-C benchmark does not create any cycles in the
transaction dependencies, meaning that it is not a great tool for
benchmarking serializable transactions.  I know there are variations
on TPC-C floating around that add a transaction type to do so, but
on a quick search right now, I was unable to find one.

>> And pgbench seems only have TPC-B built-in benchmark.

You can feed it your own custom queries, instead.

>> Well, I think we can easily find the implementations of these
>> benchmarks for PostgreSQL.

Reportedly, some of the implementations of TPC-C are not very good.
There is DBT-2, but I've known a couple of people to look at that
and find that it needed work before they could use it.  Based on the
PostgreSQL versions mentioned on the Wiki page, it has been
neglected for a while:

https://wiki.postgresql.org/wiki/DBT-2

>> The paper you recommended to me used a special benchmark defined
>> by themselves. But it is quite simple and easy to implement.

It also has the distinct advantage that we *know* they created a
scenario where the code we want to tune was using most of the CPU on
the machine.

>> For me, the challenge is profiling the execution. Is there any
>> tools in PostgreSQL to analysis where is the CPU cycles consumed?
>> Or I have to instrument and time by myself?

Generally oprofile or perf is used if you want to know where the
time is going.  This creates a slight dilemma -- if you configure
your build with --enable-cassert, you get the best stack traces and
you can more easily break down execution profiles.  That especially
true if you disable optimization and don't omit frame pointers.  But
all of those things distort the benchmarks -- adding a lot of time,
and not necessarily in proportion to where the time goes with an
optimized build.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-13 Thread Kevin Grittner

On Sun, Mar 12, 2017 at 4:08 PM, Thomas Munro
 wrote:

> I found a new way to break it: run the trigger function so
> that the plan is cached by plpgsql, then ALTER TABLE incompatibly,
> then run the trigger function again.  See attached.

The first part doesn't seem so bad.  Using the transition table in a
FOR EACH STATEMENT trigger you get:

test=# update hoge set name = (name::text || name::text)::integer;
ERROR:  attribute 2 has wrong type
DETAIL:  Table has type integer, but query expects text.
CONTEXT:  SQL statement "SELECT (SELECT string_agg(id || '=' || name,
',') FROM d)"
PL/pgSQL function hoge_upd_func() line 3 at RAISE

... while putting each row on its own line from a FOR EACH ROW
trigger you get:

test=# update hoge set name = (name::text || name::text)::integer;
ERROR:  type of parameter 15 (integer) does not match that when
preparing the plan (text)
CONTEXT:  SQL statement "SELECT (SELECT string_agg(old.id || '=' ||
old.name, ','))"
PL/pgSQL function hoge_upd_func() line 3 at RAISE

Does anyone think the first message needs improvement?  If so, to
what?

Obviously the next part is a problem.  With the transition table we
get a segfault:

test=# -- now drop column 'name'
test=# alter table hoge drop column name;
ALTER TABLE
test=# update hoge set id = id;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

While the row-oriented query manages to dodge it:

test=# alter table hoge drop column name;
ALTER TABLE
test=# update hoge set id = id;
ERROR:  record "old" has no field "name"
CONTEXT:  SQL statement "SELECT (SELECT string_agg(old.id || '=' ||
old.name, ','))"
PL/pgSQL function hoge_upd_func() line 3 at RAISE

I expected that existing mechanisms would have forced re-planning of
a trigger function if the table the function was attached to was
altered.  Either that was a bit "optimistic", or the old TupleDesc
is used for the new plan.  Will track down which it is, and fix it.

>> Do you suppose we should have all PLs that are part of the base
>> distro covered?
>
> I vote for doing that in Postgres 11.  My pl/python patch[1] may be a
> useful starting point, but I haven't submitted it in this CF and
> nobody has shown up with pl/tcl or pl/perl versions.

OK, but we'd better add something to the docs saying that only C and
plpgsql triggers are supported in v10.

>> What is necessary to indicate an additional SQL feature covered?
>
> I assume you're talking about information_schema.sql_features

I had forgotten we had that in a table.  I was thinking more of the docs:

https://www.postgresql.org/docs/current/static/features.html

I guess if we change one, we had better change the other.  (Or are
they generated off a common source?)

> a couple of thoughts occurred to me when looking for
> references to transition tables in an old draft standard I have.
> These are both cases where properties of the subject table should
> probably also affect access to the derived transition tables:
>
> * What privileges implications are there for transition tables?  I'm
> wondering about column and row level privileges; for example, if you
> can't see a column in the subject table, I'm guessing you shouldn't be
> allowed to see it in the transition table either, but I'm not sure.

I'll see how that works in FOR EACH ROW triggers.  We should match
that, I think.  Keep in mind that not just anyone can put a trigger
on a table.

> * In future we could consider teaching it about functional
> dependencies as required by the spec; if you can SELECT id, name FROM
>  GROUP BY id, I believe you should be able to SELECT
> id, name FROM  GROUP BY id, but currently you can't.

Interesting idea.

I'll post a new patch once I figure out the dropped column on the
cached function plan.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-13 Thread Kevin Grittner

On Sat, Mar 11, 2017 at 8:39 PM, Mengxing Liu
 wrote:

>> The worst problems have been
>> seen with 32 or more cores on 4 or more sockets with a large number
>> of active connections.  I don't know whether you have access to a
>> machine capable of putting this kind of stress on it (perhaps at
>> your university?), but if not, the community has access to various
>> resources we should be able to schedule time on.
>
> There is a NUMA machine ( 120 cores, 8 sockets) in my lab.

Fantastic!  Can you say a bit more about the architecture and OS?

> I think it's enough to put this kind of stress.

The researchers who saw this bottleneck reported that performance
started to dip at 16 cores and the problem was very noticeable at 32
cores.  A stress test with 120 cores on 8 sockets will be great!

I think perhaps the first milestone on the project should be to
develop a set of benchmarks we want to compare to at the end.  That
would need to include a stress test that clearly shows the problem
we're trying to solve, along with some cases involving 1 or two
connections -- just to make sure we don't harm performance for
low-contention situations.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-11 Thread Kevin Grittner

On Fri, Mar 10, 2017 at 9:12 PM, Mengxing Liu
 wrote:

> My name is Mengxing Liu. I am interested in the project "Eliminate
> O(N^2) scaling from rw-conflict tracking in serializable
> transactions”. After discussing with Kevin off-list, I think it's
> time to post discussion here. I am afraid that there are two things
> that I need your help. Thank you very much.

Welcome to the -hackers list!  This is a key part of how development
happens in the community.  Don't be shy about posting questions and
ideas, but also expect fairly blunt discussion to occur at times;
don't let that put you off.

>>> So the task is clear. We can use a tree-like or hash-like data
>>> structure to speed up this function.
>>
>> Right -- especially with a large number of connections holding a
>> large number of conflicts.  In one paper with high concurrency, they
>> found over 50% of the CPU time for PostgreSQL was going to these
>> functions (including functions called by them).  This seems to me to
>> be due to the O(N^2) (or possibly worse) performance from the number
>> of connections.
>
> Anyone knows the title of this paper? I want to reproduce its
> workloads.

I seem to remember there being a couple other papers or talks, but
this is probably the most informative:

http://sydney.edu.au/engineering/it/research/tr/tr693.pdf

>> Remember, I think most of the work here is going to be in
>> benchmarking.  We not only need to show improvements in simple test
>> cases using readily available tools like pgbench, but think about
>> what types of cases might be worst for the approach taken and show
>> that it still does well -- or at least not horribly.  It can be OK
>> to have some slight regression in an unusual case if the common
>> cases improve a lot, but any large regression needs to be addressed
>> before the patch can be accepted.  There are some community members
>> who are truly diabolical in their ability to devise "worst case"
>> tests, and we don't want to be blind-sided by a bad result from one
>> of them late in the process.
>>
>
> Are there any documents or links introducing how to test and
> benchmark PostgreSQL? I may need some time to learn about it.

There is pgbench:

https://www.postgresql.org/docs/devel/static/pgbench.html

A read-only load and a read/write mix should both be tested,
probably with a few different scales and client counts to force the
bottleneck to be in different places.  The worst problems have been
seen with 32 or more cores on 4 or more sockets with a large number
of active connections.  I don't know whether you have access to a
machine capable of putting this kind of stress on it (perhaps at
your university?), but if not, the community has access to various
resources we should be able to schedule time on.

It may pay for you to search through the archives of the last year
or two to look for other benchmarks and see what people have
previously done.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-10 Thread Kevin Grittner

> [two people interested in the same GSoC project]

It's an interesting problem to have.

If neither of you chooses to voluntarily back down, the obvious
resolution is for the mentors to vote on the better proposal.  If we
do that early enough during the student application period, there
might still be time for the person whose proposal wasn't chosen to
submit a proposal for an alternative project.  As I see it, that
means:

 - I would tend to favor a proposal submitted on the first day
   (beginning March 20 16:00 UTC), if only one is.

 - I would push for very early voting by the PostgreSQL mentors.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-10 Thread Kevin Grittner

[including Mengxing Liu in response, for reasons that should become
obvious below...]

Hi George,

On Thu, Mar 9, 2017 at 6:49 PM, George Papadrosou  wrote:

> my name is George Papadrosou, this is my first semester as
> graduate student at Georgia Tech and would like to submit a
> proposal to Google Summer of Code, for the project "Eliminate
> O(N^2) scaling from rw-conflict tracking in serializable
> transactions”.

I was recently contacted off-list by Mengxing Liu, who has been
looking at the same project, and said he was planning to submit a
GSoC proposal.  Rather than have one of you sit this out, do either
of you feel comfortable taking a different project instead?  Since
you've both been looking at the serializable code and supporting
documents, perhaps one of you could change to the other suggested
Serializable project?

> I am going to prepare a draft proposal for this project and share
> it with you soon. The project’s description is pretty clear, do
> you think it should be more strictly defined in the proposal?

At a minimum, the proposal should include list of milestones you
expect to reach along the way, and a timeline indicating when you
expect to reach them.  Some description of the benchmarks you intend
to run would be also be very good.

> Until then, I would like to familiarize myself a bit with the
> codebase and fix some bug/todo. I didn’t find many [E] marked
> tasks in the todo list so the task I was thinking is "\s without
> arguments (display history) fails with libedit, doesn't use pager
> either - psql \s not working on OSX”. However, it works on my OSX
> El Capitan laptop with Postgres 9.4.4. Would you suggest some
> other starter task?

There is a CommitFest in progress; reviewing patches is a good way
to become involved and familiar with the community processes.

https://wiki.postgresql.org/wiki/CommitFest

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SQL Standard Feature T211

2017-03-09 Thread Kevin Grittner

[separate thread from transition table patch, since a different
audience might be interested]

Four things are required to claim support for Feature T211, "Basic
trigger capability":
 - support for the CREATE TRIGGER statement
 - the ability to declare and reference transition tables for AFTER
   triggers
 - support for the DROP TRIGGER statement
 - support for TRIGGER in the GRANT and REVOKE statements

With the addition of transition tables we have all four, although
I'm not sure whether the CREATE TRIGGER statement conforms closely
enough to claim the feature.  The two basic issues I can think of
are:
 - we don't allow the OLD and NEW transition variable names to be
   specified -- using hard-coded values instead
 - we don't support the standard  formats,
   instead supporting only our EXECUTE PROCEDURE syntax

Do we leave T211, "Basic trigger capability" on our "unsupported"
list until those two points are addressed?  Does anyone know of
anything else that would need to be done?

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-09 Thread Kevin Grittner

On Tue, Mar 7, 2017 at 6:28 PM, Kevin Grittner  wrote:

> New patch attached.

And bit-rotted less than 24 hours later by fcec6caa.

New patch attached just to fix bit-rot.

That conflicting patch might be a candidate to merge into the new
Ephemeral Named Relation provided by my patch, for more flexibility
and extensibility...

-- 
Kevin Grittner


transition-v12.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-07 Thread Kevin Grittner

On Mon, Feb 20, 2017 at 7:44 PM, Thomas Munro
 wrote:

> Was it intentional that this test doesn't include any statements that
> reach the case where the trigger does RAISE EXCEPTION 'RI error'?
> From the output generated there doesn't seem to be any evidence that
> the triggers run at all, though I know from playing around with this
> that they do

Tests expanded to cover more.  Some bugs found and fixed in the process.  :-/

> + * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
>
> These copyright messages are missing 3 years' worth of bugfixes.

Those were only in files created with the initial patch in 2014.  No
bug fixes missing.  Updated the years to 2017, though.

> +SPI_get_caller_relation(const char *name)
>
> Do we need this function?  It's not used by the implementation.

Good point.  It seemed useful way back when, but since no uses for
it emerged, it should be removed until such time (if any) that it
would be useful.

> +typedef struct NamedTuplestoreScan
> +{
> + Scan scan;
> + char   *enrname;
> +} NamedTuplestoreScan;
>
> Nearly plan node structs always have a comment for the members below
> 'scan'; I think this needs one too because 'enrname' is not
> self-explanatory.

Done.

>   /*
> + * Capture the NEW and OLD transition TABLE tuplestores (if specified for
> + * this trigger).
> + */
> + if (trigdata->tg_newtable || trigdata->tg_oldtable)
> + {
> + estate.queryEnv = create_queryEnv();
> + if (trigdata->tg_newtable)
> + {
> + Enr enr = palloc(sizeof(EnrData));
> +
> + enr->md.name = trigdata->tg_trigger->tgnewtable;
> + enr->md.tupdesc = trigdata->tg_relation->rd_att;
> + enr->md.enrtuples = tuplestore_tuple_count(trigdata->tg_newtable);
> + enr->reldata = trigdata->tg_newtable;
> + register_enr(estate.queryEnv, enr);
> + SPI_register_relation(enr);
> + }
>
> Why do we we have to call register_enr and also SPI_register_relation here?

Essentially, because plpgsql does some things through SPI and some
things not.  Both cases are covered.

> On Mon, Dec 19, 2016 at 4:35 AM, Thomas Munro
>  wrote:
>> On Sun, Dec 18, 2016 at 3:15 PM, Kevin Grittner  wrote:
>>> There were a few changes Thomas included in the version he posted,
>>> without really delving into an explanation for those changes.  Some
>>> or all of them are likely to be worthwhile, but I would rather
>>> incorporate them based on explicit discussion, so this version
>>> doesn't do much other than generalize the interface a little,
>>> change some names, and add more regression tests for the new
>>> feature.  (The examples I worked up for the rough proof of concept
>>> of enforcement of RI through set logic rather than row-at-a-time
>>> navigation were the basis for the new tests, so the idea won't get
>>> totally lost.)  Thomas, please discuss each suggested change (e.g.,
>>> the inclusion of the query environment in the parameter list of a
>>> few more functions).
>>
>> I was looking for omissions that would cause some kind of statements
>> to miss out on ENRs arbitrarily.  It seemed to me that
>> parse_analyze_varparams should take a QueryEnvironment, mirroring
>> parse_analyze, so that PrepareQuery could pass it in.  Otherwise,
>> PREPARE wouldn't see ENRs.  Is there any reason why SPI_prepare should
>> see them but PREPARE not?
>
> Any thoughts about that?

Do you see any way to test that code, or would it be dead code there
"just in case" we later decided to do something that needed it?  I'm
not a big fan of the latter.  I've had to spend too much time
maintaining and/or ripping out code that fits that description.

On Mon, Feb 20, 2017 at 9:43 PM, Thomas Munro
 wrote:
> On Tue, Feb 21, 2017 at 7:14 AM, Thomas Munro
>  wrote:
>> On Fri, Jan 20, 2017 at 2:49 AM, Kevin Grittner  wrote:
>>> Attached is v9 which fixes bitrot from v8.  No other changes.
>>>
>>> Still needs review.
>
> Based on a suggestion from Robert off-list I tried inserting into a
> delta relation from a trigger function and discovered that it
> segfaults

Fixed.  Such an attempt now generates something like this:

ERROR:  relation "d" cannot be the target of a modifying statement
CONTEXT:  SQL statement "INSERT INTO d VALUES (100, 100, 'x')"
PL/pgSQL function level2_table_bad_usage_func() line 3 at SQL statement

New patch attached.

Miscellanea:

Do you suppose we should have all PLs that are part of the base
distro covered?

What is necessary to indicate an additional SQL feature covered?

--
Kevin Grittner


transition-v11.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WARNING: relcache reference leak: relation "p1" not closed

2017-03-06 Thread Kevin Grittner

With e434ad39ae7316bcf35fd578dd34ad7e1ff3c25f I did a `make world`,
`make install-world`, a fresh default initdb, a start with default
config, `make installcheck`, connected to the regression database
with psql as the initial superuser, and ran:

regression=# vacuum freeze analyze;
WARNING:  relcache reference leak: relation "p1" not closed
VACUUM

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WARNING: relcache reference leak: relation "p1" not closed

2017-03-06 Thread Kevin Grittner

[original message held up for review -- should be along eventualy...]

On Mon, Mar 6, 2017 at 3:11 PM, Kevin Grittner  wrote:
> With e434ad39ae7316bcf35fd578dd34ad7e1ff3c25f I did a `make world`,
> `make install-world`, a fresh default initdb, a start with default
> config, `make installcheck`, connected to the regression database
> with psql as the initial superuser, and ran:
>
> regression=# vacuum freeze analyze;
> WARNING:  relcache reference leak: relation "p1" not closed
> VACUUM

Still present in e6477a8134ace06ef3a45a7ce15813cd398e72d8

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-03-02 Thread Kevin Grittner

On Thu, Mar 2, 2017 at 9:04 AM, David Steele  wrote:
> On 2/20/17 10:43 PM, Thomas Munro wrote:

>> Based on a suggestion from Robert off-list I tried inserting into a
>> delta relation from a trigger function and discovered that it
>> segfaults:
>>
>>   * frame #0: 0x0001057705a6
>> postgres`addRangeTableEntryForRelation(pstate=0x7fa58186a4d0,
>> rel=0x, alias=0x, inh='\0',
>> inFromCl='\0') + 70 at parse_relation.c:1280 [opt]
>> frame #1: 0x00010575bbda
>> postgres`setTargetTable(pstate=0x7fa58186a4d0,
>> relation=0x7fa58186a098, inh=, alsoSource='\0',
>> requiredPerms=1) + 90 at parse_clause.c:199 [opt]
>> frame #2: 0x000105738530 postgres`transformStmt [inlined]
>> transformInsertStmt(pstate=) + 69 at analyze.c:540 [opt]
>> frame #3: 0x0001057384eb
>> postgres`transformStmt(pstate=, parseTree=)
>> + 2411 at analyze.c:279 [opt]
>> frame #4: 0x000105737a42
>> postgres`transformTopLevelStmt(pstate=,
>> parseTree=0x7fa58186a438) + 18 at analyze.c:192 [opt]
>> frame #5: 0x0001059408d0
>> postgres`pg_analyze_and_rewrite_params(parsetree=0x7fa58186a438,
>> query_string="insert into d values (100, 100, 'x')",
>> parserSetup=(plpgsql.so`plpgsql_parser_setup at pl_comp.c:1017),
>> parserSetupArg=0x7fa58185c2a0, queryEnv=0x7fa581857798) + 128
>> at postgres.c:706 [opt]
>
> Do you know when you will have a new patch ready?
>
> This looks like an interesting and important feature but I think it's
> going to have to come together quickly if it's going to make it into v10.

I hope to post  a new version addressing review comments by Monday (6 March).

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Kevin Grittner

On Thu, Feb 9, 2017 at 1:44 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Feb 9, 2017 at 2:08 PM, Tom Lane  wrote:
>>> Although this doesn't really settle whether we ought to do 3a (with
>>> backwards-compatibility function aliases in core) or 3b (without 'em).
>>> Do people want to re-vote, understanding that those are the remaining
>>> choices?
>
>> I prefer (3c) put them in an extension and let people that need 'em
>> install 'em, but not have them available by default.
>
> As far as the core code is concerned, 3b and 3c are the same thing.

I think so, too, if we're talking about an extension in core.

My vote is for 3b.  If someone wants to write the alias extension and
make it available outside of core, fine -- though they don't need anyone's
vote to do so.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-06 Thread Kevin Grittner

On Sun, Feb 5, 2017 at 10:24 PM, Michael Paquier
 wrote:

>> So... somebody want to tally up the votes here?
>
> Here is what I have, 6 votes clearly stated:
> 1. Rename nothing: Daniel,
> 2. Rename directory only: Andres
> 3. Rename everything: Stephen, Vladimir, David S, Michael P (with
> aliases for functions, I could live without at this point...)

I vote for 3 as well, with 1 as the only sane alternative.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-03 Thread Kevin Grittner

On Thu, Jan 26, 2017 at 3:55 PM, Robert Haas  wrote:

> The substantive issue here is whether we should go forward with this
> change, back out the change we already did, or leave things as they
> are.  Tom, David, and I seem to be in lock step on at least the
> following conclusion: halfway in between is bad.

I agree.

> So I have every
> intention of continuing to push very hard for us to go either forward
> or backward.

+1

Given the number of times I've known of people deleting files from
pg_xlog because the name made the contents seem unimportant, I
think finishing the job is better.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-01-23 Thread Kevin Grittner

On Mon, Jan 23, 2017 at 9:50 AM, Dagfinn Ilmari Mannsåker
 wrote:

> [new patch addressing issues raised]

Thanks!

>> In releases prior to this patch, max_pred_locks_per_relation was
>> calculated as "max_pred_locks_per_transaction / 2".  I know that
>> people have sometimes increased max_pred_locks_per_transaction
>> specifically to raise the limit on locks within a relation before
>> the promotion to relation granularity occurs.  It seems kinda
>> anti-social not to support a special value for continuing that
>> behavior or, if we don't want to do that, at least modifying
>> pg_upgrade to set max_pred_locks_per_relation to the value that was
>> in effect in the old version.
>
> This is exactly what we've been doing at my workplace

It occurred to me that it would make sense to allow these settings
to be attached to a database or table (though *not* a role).  I'll
look at that when I get back if you don't get to it first.

>>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>>>> One thing I don't like about this patch is that if a user has
>>>> increased max_pred_locks_per_transaction, they need to set
>>>> max_pred_locks_per_relation to half of that to retain the current
>>>> behaviour, or they'll suddenly find themselves with a lot more
>>>> relation locks.  If it's possible to make a GUCs default value
>>>> dependent on the value of another, that could be a solution.
>>>> Otherwise, the page lock threshold GUC could be changed to be
>>>> expressed as a fraction of max_pred_locks_per_transaction, to keep
>>>> the current behaviour.
>>
>>> Another option would be to have a special sentinel (e.g. -1) which is
>>> the default, and keeps the current behaviour.
>
> The attached updated patch combines the two behaviours.  Positive values
> mean an absolute number of locks, while negative values mean
> max_predicate_locks_per_transaction / -max_predicate_locks_per_relation.
> Making the default value -2 keeps backwards compatibility.
>
> Alternatively we could have a separate setting for the fraction (which
> could then be a floating-point number, for finer control), and use the
> absolute value if that is zero.

I will need some time to consider that

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-01-23 Thread Kevin Grittner

On Mon, Jan 23, 2017 at 9:50 AM, Dagfinn Ilmari Mannsåker
 wrote:
> Kevin Grittner  writes:
>
>> (1)  The new GUCs are named max_pred_locks_per_*, but the
>> implementation passes them unmodified from a function specifying at
>> what count the lock should be promoted.  We either need to change
>> the names or (probably better, only because I can't think of a good
>> name for the other way) return the GUC value + 1 from the function.
>> Or maybe change the function name and how the returned number is
>> used, if we care about eliminating the overhead of the increment
>> instruction.  That actually seems least confusing.
>
> I've done the latter, and renamed the function MaxPredicateChildLocks.
> I also changed the default for max_pred_locks_per_page from 3 to 4 and
> the minimum to 0, to keep the effective thresholds the same.
>
>> (2)  The new GUCs are declared and documented to be set only on
>> startup.  This was probably just copied from
>> max_pred_locks_per_transaction, but that sets an allocation size
>> while these don't.  I think these new GUCs should be PGC_SIGHUP,
>> and documented to change on reload.
>
> Fixed.
>
>> (3)  The documentation for max_pred_locks_per_relation needs a fix.
>> Both page locks and tuple locks for the relation are counted toward
>> the limit.
>
> Fixed.
>
>> In releases prior to this patch, max_pred_locks_per_relation was
>> calculated as "max_pred_locks_per_transaction / 2".  I know that
>> people have sometimes increased max_pred_locks_per_transaction
>> specifically to raise the limit on locks within a relation before
>> the promotion to relation granularity occurs.  It seems kinda
>> anti-social not to support a special value for continuing that
>> behavior or, if we don't want to do that, at least modifying
>> pg_upgrade to set max_pred_locks_per_relation to the value that was
>> in effect in the old version.
>
> This is exactly what we've been doing at my workplace, and I mentioned
> this in my original email:
>
>>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>>>> One thing I don't like about this patch is that if a user has
>>>> increased max_pred_locks_per_transaction, they need to set
>>>> max_pred_locks_per_relation to half of that to retain the current
>>>> behaviour, or they'll suddenly find themselves with a lot more
>>>> relation locks.  If it's possible to make a GUCs default value
>>>> dependent on the value of another, that could be a solution.
>>>> Otherwise, the page lock threshold GUC could be changed to be
>>>> expressed as a fraction of max_pred_locks_per_transaction, to keep
>>>> the current behaviour.
>>
>>> Another option would be to have a special sentinel (e.g. -1) which is
>>> the default, and keeps the current behaviour.
>
> The attached updated patch combines the two behaviours.  Positive values
> mean an absolute number of locks, while negative values mean
> max_predicate_locks_per_transaction / -max_predicate_locks_per_relation.
> Making the default value -2 keeps backwards compatibility.
>
> Alternatively we could have a separate setting for the fraction (which
> could then be a floating-point number, for finer control), and use the
> absolute value if that is zero.
>
> Regards,
>
> Ilmari
>
> --
> "A disappointingly low fraction of the human race is,
>  at any given time, on fire." - Stig Sandbeck Mathisen
>



-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-01-20 Thread Kevin Grittner

A couple things occurred to me after hitting "Send".

In addition to the prior 2 points:

(3)  The documentation for max_pred_locks_per_relation needs a fix.
Both page locks and tuple locks for the relation are counted toward
the limit.

In releases prior to this patch, max_pred_locks_per_relation was
calculated as "max_pred_locks_per_transaction / 2".  I know that
people have sometimes increased max_pred_locks_per_transaction
specifically to raise the limit on locks within a relation before
the promotion to relation granularity occurs.  It seems kinda
anti-social not to support a special value for continuing that
behavior or, if we don't want to do that, at least modifying
pg_upgrade to set max_pred_locks_per_relation to the value that was
in effect in the old version.  In any event, it seems more like
autovacuum_work_mem or autovacuum_vacuum_cost_limit than like
effective_cache_size.

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-01-20 Thread Kevin Grittner

On Thu, Jan 5, 2017 at 12:19 PM, Tom Lane  wrote:
> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>>> One thing I don't like about this patch is that if a user has increased
>>> max_pred_locks_per_transaction, they need to set
>>> max_pred_locks_per_relation to half of that to retain the current
>>> behaviour, or they'll suddenly find themselves with a lot more relation
>>> locks.  If it's possible to make a GUCs default value dependent on the
>>> value of another, that could be a solution.  Otherwise, the page lock
>>> threshold GUC could be changed to be expressed as a fraction of
>>> max_pred_locks_per_transaction, to keep the current behaviour.
>
>> Another option would be to have a special sentinel (e.g. -1) which is
>> the default, and keeps the current behaviour.
>
> FWIW, interdependent GUC defaults are generally unworkable.
> See commit a16d421ca and the sorry history that led up to it.
> I think you could make it work as a fraction, though.

I think that, ultimately, both an fraction of actual (the number of
tuples on a page or the number of pages in a relation) *and* an
absolute number (as this patch implements) could be useful.  The
former would be more "adaptable" -- providing reasonable behavior
for different sized tuples and different sized tables, while the
latter would prevent a single table with very small tuples or a lot
of pages from starving all other uses.  This patch implements the
easier part, and is likely to be very useful to many users without
the other part, so I think it is worth accepting as a step in the
right direction, and consistent with not letting the good be the
enemy of the perfect.

There are a couple minor formatting issues I can clean up as
committer, but there are a couple more substantive things to note.

(1)  The new GUCs are named max_pred_locks_per_*, but the
implementation passes them unmodified from a function specifying at
what count the lock should be promoted.  We either need to change
the names or (probably better, only because I can't think of a good
name for the other way) return the GUC value + 1 from the function.
Or maybe change the function name and how the returned number is
used, if we care about eliminating the overhead of the increment
instruction.  That actually seems least confusing.

(2)  The new GUCs are declared and documented to be set only on
startup.  This was probably just copied from
max_pred_locks_per_transaction, but that sets an allocation size
while these don't.  I think these new GUCs should be PGC_SIGHUP,
and documented to change on reload.

Other than that, I think it is in good shape and I am would feel
good about committing it.  Of course, if there are any objections
to it, I will not go ahead without reaching consensus.  I am on
vacation next week, so I would not do so before then; in fact I may
not have a chance to respond to any comments before then.  If the
author wishes to make these changes, great; otherwise I can do it
before commit.

I will mark this Ready for Committer.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-01-20 Thread Kevin Grittner

On Fri, Jan 20, 2017 at 2:08 PM, Nico Williams  wrote:
> On Fri, Jan 20, 2017 at 01:37:33PM -0600, Kevin Grittner wrote:

>> There is currently plenty of room for pseudo-MV implementations,
>> and may be for a while.  It's a good indication of the need for the
>> feature in core.  An implementation in the guts of core can have
>> advantages that nothing else can, of course.  For example, for
>> eager application of the deltas, nothing will be able to beat
>> capturing tuples already in RAM and being looked at for possible
>> trigger firing into a RAM-with-spill-to-disk tuplestore.
>
> BTW, automatic updates of certain types of MVs should be easy: add
> constraints based on NEW/OLD rows from synthetic triggers to the
> underlying query.

Convincing me that this is a good idea for actual MVs, versus
pseudo-MVs using tables, would be an uphill battle.  I recognize
the need to distinguish between MVs which contain recursive CTEs in
their definitions and MVs that don't, so that the DRed algorithm
can be used for the former and the counting algorithm for the
latter; but firing triggers for row-at-a-time maintenance is not
going to be efficient for very many cases, and the cost of
identifying those cases to handle them differently is probably
going to exceed any gains.  Comparative benchmarks, once there is
an implementation using set-based techniques, could potentially
convince me; but there's not much point arguing about it before
that exists.  :-)

> However, there is a bug in the query planner that prevents this
> from being very fast.  At some point I want to tackle that bug.

What bug is that?

> Basically, the planner does not notice that a table source in a
> join has a lookup key sufficiently well-specified by those additional
> constraints that it should be the first table source in the outermost
> loop.

Is that a description of what you see as the bug?  Can you give an
example, to clarify the point?

I am dubious, though, of the approach in general, as stated above.

>> I don't have time to review what you've done right now, but will
>> save that link to look at later, if you give permission to borrow
>> from it (with proper attribution, of course) if there is something
>> that can advance what I'm doing.  If such permission is not
>> forthcoming, I will probably avoid looking at it, to avoid any
>> possible copyright issues.
>
> Our intention is to contribute this.  We're willing to sign
> reasonable contribution agreements.

Posting a patch to these lists constitutes an assertion that you
have authority to share the IP, and are doing so.  Referencing a
URL is a bit iffy, since it doesn't leave an archival copy of the
contribution under the community's control.

> I'd appreciate a review, for sure.  Thanks!

Would it be possible to get your approach running using tables
and/or (non-materialized) views as an extension?  A trigger-based
way to maintain pseudo-MVs via triggers might make an interesting
extension, possibly even included in contrib if it could be shown
to have advantages over built-in MVs for some non-trivial
applications.

> There's a gotcha w.r.t. NULL columns, but it affects the built-in
> REFRESH as well, IIRC.  The commentary in our implementation
> discusses that in more detail.

Could you report that on a new thread on the lists?  I've seen
comments about such a "gotcha", but am not clear on the details.
It probably deserves its own thread.  Once understood, we can
probably fix it.

Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-01-20 Thread Kevin Grittner

On Thu, Jan 19, 2017 at 4:14 PM, Nico Williams  wrote:

> I hope what I've done about delta relations will be mostly irrelevant
> given your patch (which I've not looked at in detail),

Reviews welcome!

> but just FYI,
> I've built an alternate, all-SQL-coded materialized view system that
> captures deltas between refreshes and deltas from direct DMLs of the
> materialized view:

There is currently plenty of room for pseudo-MV implementations,
and may be for a while.  It's a good indication of the need for the
feature in core.  An implementation in the guts of core can have
advantages that nothing else can, of course.  For example, for
eager application of the deltas, nothing will be able to beat
capturing tuples already in RAM and being looked at for possible
trigger firing into a RAM-with-spill-to-disk tuplestore.

> https://github.com/twosigma/postgresql-contrib/blob/master/pseudo_mat_views.sql
>
> There are some good ideas there, IMO, even if that implementation were
> useless because of your patch.

I don't have time to review what you've done right now, but will
save that link to look at later, if you give permission to borrow
from it (with proper attribution, of course) if there is something
that can advance what I'm doing.  If such permission is not
forthcoming, I will probably avoid looking at it, to avoid any
possible copyright issues.

> Incidentally, it's really nice that PG has some "higher order" SQL
> features that make this sort of thing easier.  In particular, here, row
> values and record types, and being able to refer to a table as a column
> of the table's record type.

Yeah, I found that quite handy in developing the REFRESH feature,
and expect to be using it in incremental maintenance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-01-20 Thread Kevin Grittner

Attached is v10 which fixes bitrot from v9 caused by ea15e186.

Still needs review.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


transition-v10.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-01-19 Thread Kevin Grittner

Attached is v9 which fixes bitrot from v8.  No other changes.

Still needs review.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


transition-v9.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Packages: Again

2017-01-13 Thread Kevin Grittner

On Fri, Jan 13, 2017 at 12:35 PM, Serge Rielau  wrote:

> Yes my proposal to nest schemata is “radical” and this community
> is not falling into that camp.
> But there is nothing holy about database.schema.object.attribute

It is mandated by the U.S. and international SQL standard documents.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-13 Thread Kevin Grittner

On Fri, Jan 13, 2017 at 7:09 AM, Michael Paquier
 wrote:

> There is no real harm in including current_logfiles in base
> backups, but that's really in the same bag as postmaster.opts or
> postmaster.pid, particularly if the log file name has a
> timestamp.

I'm going to dispute that -- if postmaster.opts and postmaster.pid
are present when you restore, it takes away a level of insurance
against restoring a corrupted image of the database without knowing
it.  In particular, if the backup_label file is deleted (which
happens with alarmingly frequency), the startup code may think it
is dealing with a cluster that crashed rather than with a restore
of a backup.  This often leads to corruption (anything from
"database can't start" to subtle index corruption that isn't
noticed for months).  The presence of log files from the time of
the backup do not present a similar hazard.

So while I agree that there is no harm in including
current_logfiles in base backups, I object to the comparisons to
the more dangerous files.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-12 Thread Kevin Grittner

On Thu, Jan 12, 2017 at 12:12 PM, Tom Lane  wrote:
> Vladimir Rusinov  writes:
>> On Thu, Jan 12, 2017 at 4:57 PM, Euler Taveira  wrote:
>>> As Robert suggested in the other email: extension to create old names.
>
>> I don't follow the reasoning for the extension. It seem to have downsides
>> of both alternatives combined: we still break people's code, and we add
>> even more maintenance burden than just keeping aliases.
>
> Yeah, I'm not terribly for the extension idea.  Robert cited the precedent
> of contrib/tsearch2, but I think the history of that is a good argument
> against this: tsearch2 is still there 9 years later and there's no
> indication that we'll ever get rid of it.  We can let things rot
> indefinitely in core too.  If we do ever agree to get rid of the aliases,
> stripping them out of pg_proc.h will not be any harder than removing
> a contrib directory would be.

How about just leaving it to someone who cares about the aliases to
post such an extension at pgxn.org (or anywhere else they like).
It can be around as long as someone cares enough to maintain it.

I guess that makes my vote -1 on aliases by the project.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [DOCS] [HACKERS] Questionable tag usage

2017-01-10 Thread Kevin Grittner

On Tue, Jan 10, 2017 at 9:58 AM, Tom Lane  wrote:

> whether to continue using "see section m.n"-type cross-references

For my part, I have a preference for including the section name
with the link text, although if it took much work to add it (rather
than being the new default) I might question whether the benefit
was worth the effort of adding it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RustgreSQL

2017-01-10 Thread Kevin Grittner

On Tue, Jan 10, 2017 at 7:48 AM, Robert Haas  wrote:

> I'm not meaning to be funny or sarcastic or disrespectful when I say
> that I think C is the best possible language for PostgreSQL.  It works
> great, and we've got a ton of investment in making it work.  I can't
> see why we'd want to start converting even a part of the code to
> something else.  Perhaps it seems like a good idea from 10,000 feet,
> but in practice I believe it would be fraught with difficulties - and
> if it injected even a few additional instructions into hot code paths,
> it would be a performance loser.

It strikes me that exactly the set of functions that Joel is
suggesting could be written in another language is the set where
the declarations in the .h files could be considered for
replacement with static inline functions, potentially giving a
significant performance boost which would not be available if they
were instead converted to another language.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd behavior with PG_TRY

2017-01-06 Thread Kevin Grittner

On Fri, Jan 6, 2017 at 5:43 AM, Michael Paquier
 wrote:

> If a variable is modified within PG_TRY and then referenced in
> PG_CATCH it needs to be marked as volatile to be strictly in
> conformance with POSIX. This also ensures that any compiler does not
> do any stupid optimizations with those variables in the way they are
> referenced and used.

That sort of begs the question of why PG_exception_stack is not
marked as volatile, since the macros themselves modify it within
the PG_TRY block and reference it within the PG_CATCH block.  Is
there some reason this variable is immune to the problem?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-01-06 Thread Kevin Grittner

On Thu, Jan 5, 2017 at 7:32 PM, Jonathon Nelson  wrote:
> On Thu, Jan 5, 2017 at 1:01 PM, Andres Freund  wrote:
>> On 2017-01-05 12:55:44 -0600, Jonathon Nelson wrote:

>>> In our lab environment and with a 16MiB setting, we saw substantially
>>> better network utilization (almost 2x!), primarily over high bandwidth
>>> delay product links.
>>
>> That's a bit odd - shouldn't the OS network stack take care of this in
>> both cases?  I mean either is too big for TCP packets (including jumbo
>> frames).  What type of OS and network is involved here?
>
> In our test lab, we make use of multiple flavors of Linux. No jumbo frames.
> We simulated anything from 0 to 160ms RTT (with varying degrees of jitter,
> packet loss, etc.) using tc. Even with everything fairly clean, at 80ms RTT
> there was a 2x improvement in performance.

Is there compression and/or encryption being performed by the
network layers?  My experience with both is that they run faster on
bigger chunks of data, and that might happen before the data is
broken into packets.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-12-17 Thread Kevin Grittner

On Sun, Dec 4, 2016 at 11:35 PM, Haribabu Kommi
 wrote:

> Moved to next CF with "waiting on author" status.

Patch v8 attempts to address the issues explicitly raised in
Thomas Munro's review.  An opaque query environment is created
that, for now, only passes through ephemeral named relations, of
which the only initial implementation is named tuplestores; but
the techniques are intended to support both other types of
ephemeral named relations and environmental properties (things
affecting parsing, planning, and execution that are not coming from
the system catalog) besides ENRs.  There is no clue in the access
to the QE whether something is, for example, stored in a list or a
hash table.  That's on purpose, so that the addition of other
properties or changes to their implementation doesn't affect the
calling code.

There were a few changes Thomas included in the version he posted,
without really delving into an explanation for those changes.  Some
or all of them are likely to be worthwhile, but I would rather
incorporate them based on explicit discussion, so this version
doesn't do much other than generalize the interface a little,
change some names, and add more regression tests for the new
feature.  (The examples I worked up for the rough proof of concept
of enforcement of RI through set logic rather than row-at-a-time
navigation were the basis for the new tests, so the idea won't get
totally lost.)  Thomas, please discuss each suggested change (e.g.,
the inclusion of the query environment in the parameter list of a
few more functions).

Changed to "Needs review" status.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


transition-v8.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-16 Thread Kevin Grittner

I guess the preceding posts leave us with these guarantees about
read-only transactions which we might want to make explicit in the
documentation:

(1)  A serialization failure cannot be initially thrown on a COMMIT
attempt for a read-only transaction; however, if a subtransaction
catches a serialization failure exception on a statement within the
transaction, and doesn't re-throw it or throw any other error which
terminates the transaction, the serialization failure can show up
on the COMMIT attempt.  (NOTE:  We may want to check whether the
"doomed" flag is set on a write conflict for a serializable
transaction.  It seems to me that it should be, but that might have
been missed.  If so, that should be treated as a bug and fixed.)

(2)  A read-only transaction cannot show results inconsistent with
already-committed transactions, so there is no chance of needing to
discard results from a read-only transaction due to failure of the
transaction to commit.

Both of these should hold for both explicit read-only transactions
(which are set to READ ONLY after a BEGIN or START, or due to
default_transaction_read_only being set tot true and not
overridden), and implicit read-only transactions.  It is still
worthwhile to explicitly set serializable transactions to read-only
whenever possible, for performance reasons.

The idea that a serialization failure is not possible on the first
(or only) statement o a read-only transaction was in error, and
should be disregarded.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-16 Thread Kevin Grittner

On Fri, Dec 16, 2016 at 8:24 AM, Robert Haas  wrote:
> On Thu, Dec 15, 2016 at 9:01 AM, Kevin Grittner  wrote:
>> I also realized some other properties of read-only transactions
>> that might interest you (and that I should probably document).
>> Since the only way for a read-only transaction to be the on
>> experiencing a serialization failure is if Tout commits before the
>> read-only transaction (which is always Tin) acquires its snapshot,
>> Tpivot is still running when Tin acquires its snapshot, Tpivot
>> commits before a serialization failure involving Tin is detected,
>> and *then* Tin reads a data set affected by the writes of Tpivot.
>> Since a snapshot is only acquired when a statement is run which
>> requires a snapshot, that means that a query run in an implicit
>> transaction (i.e., there is no START or BEGIN statement to
>> explicitly start it; the SELECT or other data-accessing statement
>> automatically starts the transaction so it has a valid context in
>> which to run) that does not write data can never return bad results
>> nor receive a serialization failure.  Nor can those things happen
>> on the *first* or *only* non-writing statement in an explicit
>> transaction.
>
> I don't understand this argument.  Every statement in a read-only,
> serializable transaction runs with the same snapshot, so I don't see
> how it can make a difference whether we're in the middle of running
> the first statement or the tenth.  Tpivot might commit in the middle
> of executing the first statement of the transaction, which might then
> -- later on during the execution of that same statement -- do
> something that causes it to acquire a bunch more SIREAD locks.

Good point.  For the read only transaction to be the one to receive
a serialization failure, it must acquire a snapshot while Tpivot is
still running, and read a data set which was affected by Tpivot,
and must do so after Tpivot has successfully committed.  However,
if the commit of Tpivot comes after Tin has parsed the statement,
determined that it is one that requires a snapshot, and acquired
its snapshot and before it reads the modified data set, Tin could
get the serialization failure.  Muddled thinking on my part to
think of acquiring the snapshot to be atomic with running the
statement which caused the snapshot to be acquired.

Good catch!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-15 Thread Kevin Grittner

On Thu, Dec 15, 2016 at 9:53 AM, Ian Jackson  wrote:

> I don't think "set max_pred_locks_per_transaction generously" is a
> practical thing to write in the documentation, because the application
> programmer, or admin, has no sensible way to calculate what a
> sufficiently generous value is.

ok

> You seem to be implying that code relying on the summarised data might
> make over-optimistic decisions.  That seems dangerous to me, but (with
> my very dim view of database innards) I can't immediately see how to
> demonstrate that it must in any case be excluded.

No, with any of these conditions, the information on which a
decision to generate a serialization failure is summarized into
something less granular, and in all cases we turn any "in doubt"
situations into serialization failures; that is, you can get false
positives (a serialization failure exception where complete
information could have avoided it) but not false negatives (a
serialization anomaly appearing in the database or query results
from a transaction which commits).  Based on that alone, I think it
is fair to say that it does not weaken guarantees about
serialization failures for read-only transactions not being
possible on commit unless the initial exception is suppressed in a
subtransaction nor that anomalous results are not possible in a
read-only transaction.  The granularity promotion of predicate
locks could not affect the guarantees about never seeing a
serialization failure on the first statement that reads data in a
read-only transaction, but I would need to take a very close look
at how the SLRU summarization of committed transactions might
affect that one -- we lose some of the detail about the relative
order of the commits and snapshot acquisitions, and that might be
enough to allow a false positive on that first statement.  That
would require more study than I can give it this month.

I do remember that Dan ran some saturation workloads to stress this
feature for days and weeks at a time pushing things to the point of
using the SLRU summarization, and I remember thinking it odd that
certain tests generated zero errors on the read-only transactions,
which I'm pretty sure were single-statement transactions.  It was
only during this week's discussion that I had the epiphany about
that only being possible if the read-only transaction had multiple
statements; but the fact that such long saturation runs with SLRU
summarization showed no errors on read-only transactions supports
the idea that such summarization doesn't compromise that guarantee.
Unfortunately, it falls short of proof.  :-(

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-15 Thread Kevin Grittner

transactions -- those that, in spite of not being
*declared* as READ ONLY never wrote any data.  Although these have
higher overhead than transactions explicitly declared READ ONLY up
front, many of the properties of the explicitly declared read-only
transaction hold -- including, it seems to me, the property of
never returning badly serialized data nor of being chosen to
receive the serialization failure error unless both Tout and Tpivot
have already committed (in that order).

> Supposing I understand your `doomed' flag correctly, I think it is
> then probably possible to construct an argument that proves that
> allowing the application to trap and suppress serialisation failures
> does not make it harder to provide coherency guarantees.

That is the intention.

> Or to put it another way: does pgsql already detect serialisation
> problems (in transactions which only read) at the point where it would
> otherwise return data not consistent with any serialisation order ?
> (As it does in the `Rollover' example.)

Yes, as explained above.

> If so presumably it always throws a serialisation failure at that
> point.  I think that is then sufficient.  There is no need to tell the
> application programmer they have to commit even transactions which
> only read.

Well, if they don't explicitly start a transaction there is no need
to explicitly commit it, period.  An implicit transaction is
created if a statement needing execution context (such as a SELECT)
is started outside of any explicit transaction; but such a
transaction is always explicitly committed or rolled back upon
completion of the statement.

There is always a transaction, but there is not always a need to
explicitly manage it.

> If my supposition is right then I will try to develop this argument
> more formally.  I think that would be worthwhile because the converse
> property is very surprising to non-database programmers, and would
> require very explicit documentation by pgsql, and careful attention by
> application programmers.  It would be nice to be able to document a
> stronger promise.

If you can put together a patch to improve the documentation, that
is always welcome!  In case you're not already aware of how we do
things, patches are developed against the master branch, and then
there is a determination about how far back to back-patch it in
stable branches.  While the rule is that stable branches are only
modified to correct serious bugs or security vulnerabilities, in
order to make it as safe as possible for people to apply minor
releases without fear of breaking something that works, I think we
could consider an argument for back-patching a doc change that
clarifies or fills omissions that make it difficult to use a
feature correctly.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-14 Thread Kevin Grittner

On Wed, Dec 14, 2016 at 11:12 AM, Ian Jackson  wrote:
> Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: 
> Retry on constraint violation [and 2 more messages] [and 1 more messages]"):
>> On Wed, Dec 14, 2016 at 10:20 AM, Ian Jackson  
>> wrote:

>> I would alter that slightly to:
>>
>> There is a consistent serialization of all serializable
>> transactions which successfully commit.
>
> Here `serializable' means SERIALIZABLE ?

I'm not entirely sure what you mean to convey by the
capitalization, so I'll just say that 'serializable' there referred
to the transaction isolation level.  (I *think* that was what you
were getting at.)

>> For examples, please see this Wiki page.  You might be particularly
>> interested in the examples in the "Read Only Transactions" section:
>>
>> https://wiki.postgresql.org/wiki/SSI
>
> Thanks.  I read that part of the wiki page.  But in that example, we
> are told that T1 will be aborted, not T3.

That is true in the first "Deposit Report") example in that
section.  The second ("Rollover") example shows the read-only
transaction (T3) being the one which is aborted and retried.

> Can it happen that a transaction which does not make any update
> attempts, will see "impossible" data, and that this is only detected
> at COMMIT ?  Does that apply even to READ ONLY transactions ?

As Robert pointed out, a read-only transaction cannot normally be
aborted by a serialization failure on COMMIT.  Under exceptional
conditions, like an attempt to suppress the serialization failure,
you might see the commit aborted, though.

Also as pointed out by Robert, the state seen by a read-only
transaction doesn't lack internal consistency, but it will be
rolled back with a serialization failure exception if it can show a
state which is inconsistent with some successfully-committed state
of the database.

In the "Rollover" example, the first time T3 is attempted its
SELECT it would have shown rows containing 100 and 11, were it not
canceled.  That could have been consistent with the earlier state
of 100 and 10 and the business rules that the first number can only
change by having the second number added to it, and the second
number can only change by being incremented; but that state and
those rules don't fit with the *later* state of 110, 11, because
that requires that the second number be added to the first before
it was incremented, and if we allow the result of the first T3
transaction attempt to be seen, it would tell us that the increment
happened first.  Since we've already allowed successful commit of
transactions putting things into a state only consistent with
adding 10 to 100 before incrementing 10 to 11, cancel the read-only
transaction and start over.  This time it will show something
consistent with the apparent order of execution of the other
transactions.

Note that neither the order that the first two transaction started
in (T1->T2) nor the order they committed in (T2->T1) determines the
apparent order of execution.  It is the rw-dependencies that
control (T1 reads a version of data before T2's work is applied, so
T1 *appears* to have run before T2 in apparent order of execution.)
Since both are successfully committed with that apparent order of
execution, a third transaction (T3), which sees the work of T2
(since it had committed when the snapshot for T3 was taken) but not
T1 (since it had not committed when the snapshot for T3 was taken)
cannot be allowed to proceed.

I know an example like that can cause one's head to hurt a bit
(been there), but even if you don't fight your way to a full grasp
of that case, it will hopefully give you some idea of both why we
can have high concurrency with this approach, and why it is
necessary to discard results from failed transactions.

>> Once a serialization failure occurs the transaction is flagged as
>> "doomed" and will not, under any circumstances be allowed to
>> commit.  If you find any exception to that, please report it as a
>> bug.
>
> Right.  I think this prevents any exception-catching arrangements from
> suppressing the serialisation failure.  Since AIUI it is not possible
> to run the outer COMMIT from within an exception trapping context.

Right.

> If it /is/ possible to run that outer COMMIT in a way which swallows
> the exception then [...]

That is not possible, as I understand things.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-14 Thread Kevin Grittner

On Wed, Dec 14, 2016 at 10:20 AM, Ian Jackson  wrote:

> Let me try to summarise my understanding of what the developers think
> they can and intend to promise, about SERIALIZABLE transactions:
>
>  There is a consistent serialisation of all transactions which
>  successfully commit; or which do not attempt to make any changes.
>
>  A "consistent serialisation" means that there is an order in which
>  the same transactions might have been executed giving exactly the
>  answers.  This includes, if applicable, the same errors.  (The
>  database is free to generate synchronisation failure errors 40P01 and
>  40001 whenever it chooses.)

I would alter that slightly to:

There is a consistent serialization of all serializable
transactions which successfully commit.

>  A transaction which attempts to make any changes, and which does not
>  commit (whether because the application never asks for COMMIT, or
>  because of reported synchronisation failure) might see internally
>  inconsistent data, or an internally-consistent view which is not
>  compatible with any serialisation of other transactions.  An
>  application which needs a coherent view should not rely on any of the
>  information from such a transaction.

Even a read-only transaction can see a state that is not consistent
with business rules (as enforced in the software) given that some
particular later state is reached.

For examples, please see this Wiki page.  You might be particularly
interested in the examples in the "Read Only Transactions" section:

https://wiki.postgresql.org/wiki/SSI

>  Serialisation failures in subtransactions might cause the parent
>  transaction to experience a serialisation failure too.

There is currently at least one bug which may allow serialization
anomalies into the database if a constraint violation error is
thrown in a subtransaction and that subtransaction catches and
suppresses that exception and rolls back its work without throwing
an error.  I expect that any bugs of this type are will be fixed in
a minor release set soon -- probably the next one that is released.
Note that I don't think that an exception from any source other
than a declarative constraint can cause this type of problem, and
that other conditions must exist in combination with this to create
a serialization anomaly.

A serialization failure within any subtransaction will ensure the
top level transaction will fail, even if there is an attempt to
catch this exception and commit the top level transaction.  It
would be possible to catch a serialization failure exception and
throw some *other* exception to terminate the transaction; however,
(to step into very convoluted territory) if that other exception is
caught and suppressed, the serialization failure error would occur.
Once a serialization failure occurs the transaction is flagged as
"doomed" and will not, under any circumstances be allowed to
commit.  If you find any exception to that, please report it as a
bug.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2016-12-14 Thread Kevin Grittner

On Wed, Dec 14, 2016 at 8:16 AM, Dagfinn Ilmari Mannsåker
 wrote:

> Attached is a patch

Please add this to the open CommitFest to ensure that it is reviewed.

http://commitfest.postgresql.org/action/commitfest_view/open

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4000 matches

Mail list logo