Re: [HACKERS] One process per session lack of sharing

2016-07-13 Thread Craig Ringer
On 12 July 2016 at 21:57,  wrote:

> Hi
>
> Is  there  any  plan  to  implement  "session  per  thread" or "shared
> sessions between thread"?
>

As has been noted by others, there isn't any such plan right now.

PostgreSQL isn't threaded. It uses a multi-processing
shared-nothing-by-default memory with explicit shared memory, plus
copy-on-write memory forking from the postmaster for initial backend state
(except on Windows, where we emulate that). It relies on processes being
cheap and light-weight.

This is a very poor fit with Java's thread-based shared-by-default model
with expensive heavyweight process startup and cheap threads. Process
forking doesn't clone all threads and the JVM has lots of worker threads,
so we can't start the JVM once in the postmaster then clone it with each
forked postgres backend. Plus the JVM just isn't designed to cope with that
and would surely get thoroughly confused when its file handles are cloned,
its process ID changes, etc. This is one of the reasons PL/Java has never
really taken off. We can mitigate the JVM startup costs a bit by preloading
the JVM libraries into the postmaster and using the JVM's base class
library preloading, but unless you're running trivial Java code you still
do a lot of work at each JVM start after the postgres backend forks.


> We  have analyzed  the  ability to contribute  pgSql to jvm bytecode
> compiler but with
> current   thread   model  this  idea  is  far  from optimal.(Vm can be
> different of course.
> But currently we use oracle and jvm is important for us)
>

Yep, that's a real sticking point for a number of people.

The usual solution at this point is to move most of the work into an
application-server mid-layer. That moves work further away from the DB,
which has its own costs, and isn't something you're likely to be happy with
if you're looking at things like optimising PL/PgSQL with a bytecode
compiler. But it's the best we have right now.


> We have faced with some lack of sharing resources.
> So in our test memory usage per session:
> Oracle: about 5M
> MSSqlServer: about 4M
> postgreSql: about 160М
>

Yep, that sounds about right. Unfortunately.

You may be able to greatly reduce that cost if you can store your cached
compiled data in a shared memory segment created by your extension. This
will get a bit easier with the new dynamic shared memory infrastructure,
but it's going to be no fun at all to make that play with the JVM. You'll
probably need a lot of JNI.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] One process per session lack of sharing

2016-07-13 Thread Craig Ringer
On 14 July 2016 at 03:59, Robert Haas  wrote:


> I agree that there's not really a plan to implement this, but I don't
> agree that connection pooling solves the whole problem.  Most people
> can't get by with statement pooling, so in practice you are looking at
> transaction pooling or session pooling.  And that means that you can't
> really keep the pool size as small as you'd like because backends can
> be idle in transaction for long enough to force the pool size to be
> pretty large.  Also, pooling causes the same backends to get reused
> for different sessions which touch different relations and different
> functions so that, for example, the relcache and the PL/pgsql function
> caches grow until every one of those sessions has everything cached
> that any client needs.  That can cause big problems.
>
> So, I actually think it would be a good idea to think about this.


I agree. It's been on my mind for a while, but I've been assuming it's
likely to involve such architectural upheaval as to be impractical.

Right now PostgreSQL conflates "user session state" and "execution engine"
into one process. This means we need an external connection pooler to
handle things if we want more user connections than we can efficiently
handle in terms of number of executors. Current poolers don't do much to
keep track of user state, they just arbitrate access to executors and
expect applications to re-establish any needed state (SET vars, LISTEN,
etc) or not use features that require persistence across the current
pooling level.

This leaves users in the hard position of using very high, inefficient
max_connections values to keep track of application<->DB state or jump
through awkward hoops to use transaction pooling, either at the application
level (Java appserver pools, etc) or through a proxy.

If using the high max_connections approach the user must also ensure that
they don't have all those max_connections actually doing work at the same
time using some kind of external coordination. Otherwise they'll thrash the
server and face out of memory issues (especially with our rather simplistic
work_mem management, etc) and poor performance.

The solution, to me, is to separate "user state" and "executor". Sounds
nice, but we use global variables _everywhere_ and it's assumed throughout
the code that we have one user session for the life of a backend, though
with some exceptions for SET SESSION AUTHORIZATION. It's not likely to be
fun.

The
> problem, of course, is that as long as we allow arbitrary parts of the
> code - including extension code - to declare global variables and
> store arbitrary stuff in them without any coordination, it's
> impossible to imagine hibernating and resuming a session without a
> risk of things going severely awry.


Yeah. We'd definitely need a session state management mechanism with save
and restore functionality.

There's also stuff like:

* LISTEN
* advistory locking at the session level
* WITH HOLD cursors

that isn't simple to just save and restore. Those are some of the same
things that are painful with transaction pooling right now.


> This was a major issue for
> parallel query, but we've solved it, mostly, by designating the things
> that rely on global variables as parallel-restricted, and there
> actually aren't a ton of those.  So I think it's imaginable that we
> can get to a point where we can, at least in some circumstances, let a
> backend exit and reconstitute its state at a later time.  It's not an
> easy project, but I think it is one we will eventually need to do.
>

I agree on both points, but I think "not easy" is rather an understatement.

Starting with a narrow scope would help. Save/restore GUCs and the other
easy stuff, and disallow sessions that are actively LISTENing, hold
advisory locks, have open cursors, etc from being saved and restored.

BTW, I think this would also give us a useful path toward allowing
connection poolers to change the active user and re-authenticate on an
existing backend. Right now you have to use SET ROLE or SET SESSION
AUTHORIZATION (ugh) and can't stop the client you hand the connection to
from just RESETing back to the pooler's user and doing whatever it wants.


> Insisting that the current model is working is just sticking our head
> in the sand.  It's mostly working, but there are workloads where it
> fails badly - and competing database products survive a number of
> scenarios where we just fall on our face.
> 
>

Yep, and like parallel query it's a long path, but it's one we've got to
face sooner or later.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Fix typo in postgres_fdw/deparse.c

2016-07-13 Thread Masahiko Sawada
Hi all,

Attached patch fixes small typo in contrib/postgres_fdw/deparse.c

s/whenver/whenever/g

Regards,

--
Masahiko Sawada


fix_small_typo_in_deparse_c.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-13 Thread Robert Haas
On Jul 13, 2016, at 12:25 PM, Stephen Frost  wrote:
> I disagree.  Adding a column is certainly changing the structure, as is
> removing one.  This certainly hasn't changed my opinion that it's
> worthwhile to consider this change, even at this point in the release
> cycle, given we need to make a change regardless.

Without getting into the substantive question here, I think that as a matter of 
policy, when somebody thinks we're whacking things around too much post-beta, 
we should lean in the direction of whacking them around less.  You don't have 
to agree with Peter on the merits to think that this is optional tinkering. It 
clearly is.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance - tidbitmap

2016-07-13 Thread Andres Freund
Hi,

On 2016-06-24 16:29:53 -0700, Andres Freund wrote:
> 4) Various missing micro optimizations have to be performed, for more
>architectural issues to become visible. E.g. [2] causes such bad
>slowdowns in hash-agg workloads, that other bottlenecks are hidden.

One such issue is the usage of dynahash.c in tidbitmap.c. In many
queries, e.g. tpch q7, the bitmapscan is often the bottleneck. Profiling
shows that this is largely due to dynahash.c being slow. Primary issues
are: a) two level structure doubling the amount of indirect lookups b)
indirect function calls c) using separate chaining based conflict
resolution d) being too general.

I've quickly hacked up an alternative linear addressing hashtable
implementation. And the improvements are quite remarkable.

Example Query:
EXPLAIN ANALYZE SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate >= 
'1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date);

before:
┌──┐
│QUERY PLAN 
   │
├──┤
│ Aggregate  (cost=942046.72..942046.73 rows=1 width=8) (actual 
time=4647.908..4647.909 rows=1 loops=1) 
   │
│   ->  Bitmap Heap Scan on lineitem  (cost=193514.84..919246.17 rows=9120222 
width=8) (actual time=2667.821..3885.598 rows=9113219 loops=1)   │
│ Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= 
'1996-12-31'::date))│
│ Heap Blocks: exact=585542 
   │
│ ->  Bitmap Index Scan on i_l_shipdate  (cost=0.00..191234.78 
rows=9120222 width=0) (actual time=2461.622..2461.622 rows=9113219 loops=1) │
│   Index Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate 
<= '1996-12-31'::date))│
│ Planning time: 0.170 ms   
   │
│ Execution time: 4648.921 ms   
   │
└──┘

timing without analyze: 4136.425 4101.873 4177.441

after:
┌┐
│   QUERY PLAN  
 │
├┤
│ Aggregate  (cost=942046.72..942046.73 rows=1 width=8) (actual 
time=3218.422..3218.423 rows=1 loops=1) 
 │
│   ->  Bitmap Heap Scan on lineitem  (cost=193514.84..919246.17 rows=9120222 
width=8) (actual time=1153.707..2430.500 rows=9113219 loops=1) │
│ Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= 
'1996-12-31'::date))  │
│ Heap Blocks: exact=585542 
 │
│ ->  Bitmap Index Scan on i_l_shipdate  (cost=0.00..191234.78 
rows=9120222 width=0) (actual time=952.161..952.161 rows=9113219 loops=1) │
│   Index Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate 
<= '1996-12-31'::date))  │
│ Planning time: 1.075 ms   
 │
│ Execution time: 3221.861 ms   
 │
└┘
(8 rows)

timing without analyze: 2647.364 2674.456 2680.197

as you can see the the time for the bitmap index scan goes from 2461.622
to 952.161.

I'm not proposing to apply the patch as-is, but I think it's a good
starting point to discuss how to evolve our use of hash tables.

I'm wondering whether we can do 'macro based templates' or
something. I.e. have something like the simplehash in the patch in
simplehash.h, 

Re: [HACKERS] Improving executor performance

2016-07-13 Thread Andres Freund
Hi,

On 2016-06-24 16:29:53 -0700, Andres Freund wrote:
> 2) Baring 1) tuple deforming is the biggest bottleneck in nearly all
>queries I looked at. There's various places we trigger deforming,
>most ending in either slot_deform_tuple(), heap_getattr(),
>heap_deform_tuple().
>
>This can be optimized significantly, but even after that it's still a
>major issue.
>
>Part of the problem is that we sometimes don't know how many elements
>to access (especially in index and hashing related code), the other
>part that we're deforming a lot of columns that we'll never need,
>just because we need a subsequent one.
>
>The other part is that our tuple format requires a number of
>relatively expensive operations to access data (e.g. alignment
>computations, checking the null bitmap).

> 6) The use of linked lists adds noticeable #instructions overhead and
>branch misses. Just replacing {FuncExprState,BoolExprState}->args
>with an array gives a ~10%ish boost for queries with a bunch of
>quals.  Another example that appears to hurts noticeably, but which
>I've not experimentally fixed, is AggState->hash_needed.
>
>To a large degree these seem fairly easily fixable; it's kinda boring
>work, but not really hard.

As previously discussed many of the "architectural" changes show limited
success until a few other bottlenecks are fixed. Most prominently slot
deforming and, what I'm planning to primarily discuss in this email,
expression evaluation.

Even in the current state, profiles of queries evaluating a large number
of tuples very commonly show expression evaluation to be one of the
major costs. Due to the number of recursive calls that's not always easy
to pinpoint though, the easiest thing to spot is usually
MakeFunctionResultNoSet showing up prominently.

While, as in 6) above, removing linked lists from the relevant
structures helps, it's not that much. Staring at this for a long while
made me realize that, somewhat surprisingly to me, is that one of the
major problems is that we are bottlenecked on stack usage. Constantly
entering and leaving this many functions for trivial expressions
evaluations hurts considerably. Some of that is the additional numbers
of instructions, some of that is managing return jump addresses, and
some of that the actual bus traffic. It's also rather expensive to check
for stack limits at a very high rate.


Attached (in patch 0003) is a proof-of-concept implementing an
expression evalution framework that doesn't use recursion. Instead
ExecInitExpr2 computes a number of 'steps' necessary to compute an
expression. These steps are stored in a linear array, and executed one
after another (save boolean expressions, which can jump to later steps).
E.g. to compute colname = 1 the steps are 1) fetch var, 2) evaluate
const, 3) call function.

Expression evaluation then is a switch statement over the opcodes of
each of these steps.

Using the machinery of those 'steps' to compute projections, quals, and
constraint evaluation then allows to reduce the number of indirect
function calls/jumps further.

My preliminary implementation so far implements only function calls,
boolean expression, constant evaluation and variable evaluation. For
everything else I'm falling back to the current expression machinery.

By combining expression, qual and target list processing, we can also
always generate the appropriate slot_getsomeattrs() calls.

Note that the patch currently does *NOT* support targetlist SRFs, and
instead just errors out. This is not a fundamental issue. I just didn't
want to invest time in supporting something we want to reimplement
anyway.   Similarly subplans currently don't work because
of:
/*
 * Evaluate lefthand expressions and form a projection tuple. First we
 * have to set the econtext to use (hack alert!).
which doesn't work quite like that atm.


I've used (0004) a similar method to reduce the number of branches and
pipeline stalls in slot_deform_tuple considerably. For each attribute a
'step' is generated, which contains exactly the computations required to
deform that individual datum. That e.g. allows to separate cases for
different alignments and null-checks.


Having expression evaluation and slot deforming as a series of simple
sequential steps, instead of complex recursive calls, would also make it
fairly straightforward to optionally just-in-time compile those.


For motivation, here's some random performance differences:
SELECT SUM(l_quantity * l_extendedprice) FROM lineitem;
master: 5038.382 4965.310 4983.146
patches: 4194.593 4153.759 4168.986
tpch-q1
master: 21274.896
dev: 17952.678

For queries involving more complex expressions, the difference can be
a larger.


Both, expression processing and tuple deforming, can use considerable
additional improvements. But I think the approaches presented here are
the biggest step that I can see.


The reason that I'm bringing this up before 

Re: [HACKERS] bug in citext's upgrade script for parallel aggregates

2016-07-13 Thread Andreas Karlsson

On 07/09/2016 05:42 AM, David Rowley wrote:

On 30 June 2016 at 03:49, Robert Haas  wrote:

On Sat, Jun 25, 2016 at 3:44 AM, Andreas Karlsson  wrote:

On 06/24/2016 01:31 PM, David Rowley wrote:

Seems there's a small error in the upgrade script for citext for 1.1
to 1.2 which will cause min(citext) not to be parallel enabled.

max(citext)'s combinefunc is first set incorrectly, but then updated
to the correct value. I assume it was meant to set the combine
function for min(citext) instead.

Fix attached. I've assumed that because we're still in beta that we
can get away with this fix rather than making a 1.3 version to fix the
issue.


Yes, this is indeed a bug.


Since we've already released beta2, I think we need to do a whole new
extension version.  We treated beta1 as a sufficiently-significant
event to mandate a version bump, so we should do the same here.


Ok, good point. Patch attached.


Thanks!

I tested the patch and it looks good.

Andreas




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread Tom Lane
Alvaro Herrera  writes:
> Now, I think requesting psql not to split query strings is a good
> feature, but having it depend on using \; instead of ; seems way too
> obscure.  If we want to offer that choice, I propose we do it via some
> properly designed mechanism rather than being a random emergent
> characteristic falling out of a bunch of historical coincidences.

I think it was designed to do that; if you look at the code in psqlscan.l
that causes this to happen, it's clearly intentional not a "random
emergent characteristic".

Personally, I'm fine with documenting this behavior and having done.
What I don't like is Fabien's suggestion that we alter the behavior.
It's possible that that'll break existing applications, and the argument
that printing rather than discarding the PQresult is better seems pretty
weak anyway.  Discarding a PQresult seems like it would have some uses.

Worth noting by the way is that
select 1 \; select 2;
has the same behavior as
psql -c 'select 1; select 2;'
since in both cases the whole string is sent in one PQexec.  I wonder
whether promoting \; to a recognized and documented behavior would
allow us to get away with converting -c strings to normal parsing
behavior, as was discussed and then rejected on compatibility grounds
not too long ago.  People who need to keep the old behavior could do so
by putting in backslashes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread Alvaro Herrera
David G. Johnston wrote:
> On Wed, Jul 13, 2016 at 5:44 PM, Fabien COELHO  wrote:
> 
> > Although "\;" behavior is not documented, I would have expected both
> >>> results to be shown one after the other, or having a an error, but not a
> >>> quiet discard.
> >>
> >> See the documentation for PQexec(): all but the last query result is
> >> discarded.
> >
> > Sure. That is developer-level answer to "why", although it does not really
> > say why the developer chose PQexex over PQsendQuery. At the user-level the
> > behavior is still pretty surprising.
> 
> ​Lets try putting it this way...
> 
> As a psql user I want some way to choose whether I send my query via
> "PQexec" or "PQsendQuery".  I'm not sure why the "PQexec" access point is
> undocumented but this "\;" syntax, vis-a-vis ";" provides me that choice.

psql splits the input string on semicolons and submits each resulting
part separately using PQexec.  Since \; defeats the splitting efforts,
what happens is that the whole tihng is submitted via PQexec() as a
single unit instead.  PQsendQuery is never used by psql.

Now PQexec is documented to return only the last resultset if you send
more than one query through it; so that part seems okay since it's been
documented this way forever.  However, psql is not documented to use
PQexec, it just happens to use it.

Now, I think requesting psql not to split query strings is a good
feature, but having it depend on using \; instead of ; seems way too
obscure.  If we want to offer that choice, I propose we do it via some
properly designed mechanism rather than being a random emergent
characteristic falling out of a bunch of historical coincidences.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Upcoming PG release schedule

2016-07-13 Thread Tom Lane
Current plans are to produce a 9.6beta3 release next week (ie, wrap
Monday Jul 18 for announcement Thursday Jul 21).  Keep in mind also
that we expect to make routine back-branch update releases on the
previously-announced schedule (wrap Monday Aug 8 for announcement
Thursday Aug 11).  We'll probably include a 9.6beta4 in that set
of releases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread David G. Johnston
On Wed, Jul 13, 2016 at 5:44 PM, Fabien COELHO  wrote:

>
> Hello Tom,
>
> Although "\;" behavior is not documented, I would have expected both
>>> results to be shown one after the other, or having a an error, but not a
>>> quiet discard.
>>>
>>
>> See the documentation for PQexec(): all but the last query result is
>> discarded.
>>
>
> Sure. That is developer-level answer to "why", although it does not really
> say why the developer chose PQexex over PQsendQuery. At the user-level the
> behavior is still pretty surprising.


​Lets try putting it this way...

As a psql user I want some way to choose whether I send my query via
"PQexec" or "PQsendQuery".  I'm not sure why the "PQexec" access point is
undocumented but this "\;" syntax, vis-a-vis ";" provides me that choice.

David J.


Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread David G. Johnston
On Wed, Jul 13, 2016 at 5:44 PM, Fabien COELHO  wrote:

>
>> I do not think changing this is appropriate.  All you are likely to
>> accomplish is breaking code that does what its author wanted.
>>
>
> Hmmm... My 0.02€: Currently this feature is NOT documented, so somehow it
> is not supported, and relying on it seems risky, as it is really a side
> effect of the current implementation. If it becomes documented, it could be
> made to behave sanely at the same time...


​To me it has sane and well-defined behavior - if maybe rarely useful.

Why would you choose to execute "SELECT 1 \; SELECT 2;" instead of "SELECT
1; SELECT 2;"​ in a setup where the behavior of both strings is identical?
Or, rather, how would they differ?

David J.


Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread Fabien COELHO


Hello Tom,


Although "\;" behavior is not documented, I would have expected both
results to be shown one after the other, or having a an error, but not a
quiet discard.


See the documentation for PQexec(): all but the last query result is
discarded.


Sure. That is developer-level answer to "why", although it does not really 
say why the developer chose PQexex over PQsendQuery. At the user-level the 
behavior is still pretty surprising.



I would suggest that:
  - the \; psql feature should be documented somewhere
  - all results should be shown, not just the last one



Any opinion?


I do not think changing this is appropriate.  All you are likely to
accomplish is breaking code that does what its author wanted.


Hmmm... My 0.02€: Currently this feature is NOT documented, so somehow it 
is not supported, and relying on it seems risky, as it is really a side 
effect of the current implementation. If it becomes documented, it could 
be made to behave sanely at the same time...


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread Tom Lane
Fabien COELHO  writes:
> Hello devs,

> Although this is probably a "feature", it is a weird one:

>   $ psql
>   psql (9.6beta2)
>   fabien=# SELECT 1 AS one \;
>   fabien-# SELECT 2 AS two ;
>two
>   -
>  2
>   (1 row)

> Where is my "1"?

> Although "\;" behavior is not documented, I would have expected both 
> results to be shown one after the other, or having a an error, but not a 
> quiet discard.

See the documentation for PQexec(): all but the last query result is
discarded.

> I would suggest that:
>   - the \; psql feature should be documented somewhere
>   - all results should be shown, not just the last one

> Any opinion?

I do not think changing this is appropriate.  All you are likely to
accomplish is breaking code that does what its author wanted.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread Fabien COELHO


Hello David,

At least we aggree that having a documentation would be an improvement:-)

On the second point:


 - all results should be shown, not just the last one


disagree

# select 1 ; select 2 ;


vs


# select 1 \; select 2 ;

Result in identical behavior seems undesirable.


In both cases there is the two same queries, so having the same results 
does not strike me as "undesirable", on the contrary.


At least now if you want to discard all intermediate work and just show 
the last statement you can do so without going to any great lengths. If 
you really want both results don't use "\;".  This makes even more sense 
when the earlier statements are DML instead of SELECT.


Hmmm. I do not buy this "\; executes a statement but does not show the 
results" as a sane and expected behavior.


I think that the underlying and only reason it behaves like this is that 
at the protocol level one can send a batch of queries in one go, but for 
the simple "PQexec" function just one result is returned, the last one was 
chosen probably as a marker that they were all executed, and that is all.


So I see this as a low-level simplified API detail which has an unforeseen 
user impact.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread David G. Johnston
On Wed, Jul 13, 2016 at 4:02 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO 
> wrote:
> >> If someone thinks that "gset" is a good idea for pgbench, which I
> don't, it
> >> could be implemented. I think that an "into" feature, like PL/pgSQL &
> ECPG,
> >> makes more sense for scripting.
>
> > I agree: I like \into.
>
> > But:
>
> >> SELECT 1, 2 \; SELECT 3;
> >> \into one two three
>
> > I think that's pretty weird.
>
> Yeah, that's seriously nasty action-at-a-distance in my view.  I'd be okay
> with
>
> SELECT 1, 2 \into one two
> SELECT 3 \into three
>
> but I do not think that a metacommand on a following line should
> retroactively affect the execution of a prior command, much less commands
> before the last one.


You need a test and a definition for:

​SELECT 1, 2;
SELECT 3;
\into x, y, z

It should fail - "too many variables" - right?

David J.
​


Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread Fabien COELHO


Hello Tom,


Sending a batch of requests is a feature of libpq which is accessible
through pgbench by using "\;", although the fact is not documented. It
makes sense for a client to send independent queries together so as to
reduce latency.


You're putting an awful lot of weight on an unsupported assertion about 
latency.


For support, I would submit that many applications today are web/mobile 
apps which are quite sensitive to latency. See for instance the Fast 2016 
white paper by people at Google which discusses in depth "tail latency" as 
a key measure of quality for IO systems used for live services, or the new 
HTTP2 protocole (based on Google spdy) which aims at reducing latency 
through multiple features (compression, serveur push, pipelining...).



If a user cares about that, why would she not simply merge the
commands into "SELECT 1, 2, 3 \into one two three" ?


Because the code would look pretty awful:

  SELECT (SELECT first data FROM ... JOIN ... WHERE ... ),
 (SELECT second data FROM ... JOIN ... WHERE ...),
 (SELECT third data FROM ... JOIN ... WHERE ...);


And I still say that what you're proposing might be easy right now, but
it might also be next door to impossible in a refactored implementation.


I do not understand. There is one "multi" sql-command followed by a meta 
command, and somehow a refactor implementation would have troubled with 
that?



I don't think we should go there on the basis of a weak argument about
latency.  \into should retrieve data only from the last PGresult.


This looks pretty arbitrary: Why not the first one, as I asked for it 
first? Anyway, why allowing to send several queries if you are not allowed 
to extract their results.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Document that vacuum can't truncate if old_snapshot_threshold >= 0

2016-07-13 Thread Andres Freund
Hi,

Currently, if old_snapshot_threshold is enabled, vacuum is prevented
from truncating tables:
static bool
should_attempt_truncation(LVRelStats *vacrelstats)
{
BlockNumber possibly_freeable;

possibly_freeable = vacrelstats->rel_pages - 
vacrelstats->nonempty_pages;
if (possibly_freeable > 0 &&
(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
  possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION) 
&&
old_snapshot_threshold < 0)
return true;
else
return false;
}

(note the old_snapshot_threshold < 0 condition).

That appears to not be mentioned in a comment, the commit message or the
the docs. I think this definitely needs to be prominently documented.

FWIW, afaics that's required because new pages don't have an LSN, so we
can't necessarily detect that a truncated and re-extended relation,
wouldn't be valid. Although I do wonder if there isn't a less invasive
way to do that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Andres Freund
On 2016-07-13 15:57:02 -0500, Kevin Grittner wrote:
> A short answer is that a normal table's heap doesn't go through
> systable_getnext_ordered().  That function is used both for cases
> where the check should not be made, like toast_delete_datum(), and
> cases where it should, like toast_fetch_datum().

It *has* to be be made for toast_delete_datum(). Otherwise we could end
up deleting a since reused toast id. Or am I missing something?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Tom Lane
Andres Freund  writes:
> On 2016-07-13 16:39:58 -0400, Tom Lane wrote:
>> I think there's a lot, but I'm afraid most of them are in contexts
>> (pun intended) where aset.c already works pretty well, ie it's a
>> short-lived context anyway.

> FWIW, hacking up the aset/mcxt.c to use a trivial allocator with less
> overhead (i.e. just hand out sections out of a continuous block of
> memory) results in a noticeable speedup in parse heavy workloads. It's a
> bit ugly though, because of the amount of retail pfrees in random
> places.

Yeah, both the parser and planner are already in the "palloc a lot and
don't worry too much about pfrees" camp.  I think they could both benefit
from an allocator that ignores pfree and doesn't round up request sizes
(except for MAXALIGN).  I'm not sure how much of the rest of the system
would like that behavior, though.

The design intention behind mcxt.c/aset.c was always that we'd have
multiple allocators (multiple context types).  I'm surprised that we've
gone this long without following through on that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying the interface of UpdateMinRecoveryPoint

2016-07-13 Thread Michael Paquier
On Wed, Jul 13, 2016 at 10:31 PM, Simon Riggs  wrote:
> On 12 July 2016 at 23:49, Michael Paquier  wrote:
>> Hence why not simplifying its interface and remove the force flag?
>
> Is this change needed as part of a feature? If not, I see no reason for
> change.
>
> If we all work towards meaningful features the code can be cleaned up as we
> go.

That's just refactoring. The interactions between the two arguments of
this routine is plain.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Header and comments describing routines in incorrect shape in visibilitymap.c

2016-07-13 Thread Michael Paquier
On Thu, Jul 14, 2016 at 4:29 AM, Robert Haas  wrote:
> On Fri, Jul 8, 2016 at 7:14 AM, Michael Paquier
>  wrote:
>>> OK, that removes comment duplication. Also, what about replacing
>>> "bit(s)" by "one or more bits" in the comment terms where adapted?
>>> That's bikeshedding, but that's what this patch is about.
>>
>> Translating my thoughts into code, I get the attached.
>
> Seems reasonable, but is the last hunk a whitespace-only change?

Yes, sorry for that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread David G. Johnston
On Wed, Jul 13, 2016 at 4:47 PM, Fabien COELHO  wrote:

>
> I would suggest that:
>  - the \; psql feature should be documented somewhere
>

​agreed
​


>  - all results should be shown, not just the last one
>

disagree

# select 1 ; select 2 ;
?column?
--
1
(1 row)

?column?
-
2
(1 row)


​Having

# select 1 \; select 2 ;

Result in identical behavior seems undesirable.  At least now if you want
to discard all intermediate work and just show the last statement you can
do so without going to any great lengths.  If you really want both results
don't use "\;".  This makes even more sense when the earlier statements are
DML instead of SELECT.

David J.


​


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Kevin Grittner
On Wed, Jul 13, 2016 at 12:48 PM, Andres Freund  wrote:
> On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote:
>> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila  wrote:
>>> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  wrote:
 On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:

> I'm a bit confused, why aren't we simply adding LSN interlock
> checks for toast? Doesn't look that hard? Seems like a much more
> natural course of fixing this issue?

 I took some time trying to see what you have in mind, and I'm
 really not "getting it".
>>>
>>> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
>>> when old_snapshot_threshold > 0 and add a check for
>>> HeapTupleSatisfiesToast in TestForOldSnapshot()?
>>
>> With that approach, how will we know *not* to generate an error
>> when reading the chain of tuples for a value we are deleting.  Or
>> positioning to modify an index on toast data.  Etc., etc. etc.
>
> I'm not following. How is that different in the toast case than in the
> heap case?

A short answer is that a normal table's heap doesn't go through
systable_getnext_ordered().  That function is used both for cases
where the check should not be made, like toast_delete_datum(), and
cases where it should, like toast_fetch_datum().

Since this keeps coming up, I'll produce a patch this way.  I'm
skeptical, but maybe it will look better than I think it will.  I
should be able to post that by Friday.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Andres Freund
On 2016-07-13 16:39:58 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Wed, Jul 13, 2016 at 1:10 PM, Tomas Vondra
> >  wrote:
> > What's not clear to me is to what extent slowing down pfree is an
> > acceptable price for improving the behavior in other ways.  I wonder
> > how many of the pfree calls in our current codebase are useless or
> > even counterproductive, or could be made so.
> 
> I think there's a lot, but I'm afraid most of them are in contexts
> (pun intended) where aset.c already works pretty well, ie it's a
> short-lived context anyway.

FWIW, hacking up the aset/mcxt.c to use a trivial allocator with less
overhead (i.e. just hand out sections out of a continuous block of
memory) results in a noticeable speedup in parse heavy workloads. It's a
bit ugly though, because of the amount of retail pfrees in random
places.


> The areas where we're having pain are
> where there are fairly long-lived contexts with lots of pfree traffic;
> certainly that seems to be the case in reorderbuffer.c.  Because they're
> long-lived, you can't just write off the pfrees as ignorable.

That's a problem too.


> I wonder whether we could compromise by reducing the minimum "standard
> chunk header" to be just a pointer to owning context, with the other
> fields becoming specific to particular mcxt implementations.  That would
> be enough to allow contexts to decide that pfree was a no-op, say, or that
> they wouldn't support GetMemoryChunkSpace(), without having to decree that
> misuse can lead to crashes.  But that's still more than zero overhead
> per-chunk.

I think that's a sensible compromise for some use-cases (e.g. parsing,
parse analysis, potentially expression contexts).

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-13 Thread Petr Jelinek

On 13/07/16 21:06, Robert Haas wrote:



We have much to discuss in terms of security, the way it should work and
what options to support and a sidetrack into syntax isn't warranted at this
early stage. Please lets discuss those important things first, then return
to whether DDL makes sense or not; it may do, or may not, or more likely
which parts of it need DDL and which do not.


We've sort of hijacked this whole thread which was originally about
something different, so maybe it would be better to start a new thread
specifically to talk about the design of logical replication.  For my
money, though, I don't find the designs I've seen so far to be
particularly compelling - and I think that the problem is that we tend
to think about this from the point of view of the capabilities that
must be available within a single instance.
...

>

Similarly, for logical replication, users will want to do things like
(1) spin up a new logical replication slave out of thin air,
replicating an entire database or several databases or selected
replication sets within selected databases; or (2) subscribe an
existing database to another server, replicating an entire database or
several databases; or (3) repoint an existing subscription at a new
server after a master change or dump/reload, resynchronizing table
contents if necessary; or (4) stop replication, either with or without
dropping the local copies of the replicated tables.  (This is not an
exhaustive list, I'm sure.)



Well this all can be done using pglogical so I don't really understand 
what you mean when you say that you don't like the design or what's the 
actual problem here (although I don't plan to implement everything in 
the first patch submission).



I don't mean to imply that the existing designs are bad as far as they
go.  In each case, the functionality that has been built is good.  But
it's all focused, as it seems to me, on providing capabilities rather
than on providing a way for users to manage a group of database
servers using high-level primitives.  That higher-level stuff largely
gets left to add-on tools, which I don't think is serving us
particularly well.  Those add-on tools often find that the core
support doesn't quite do everything they'd like it to do: that's why
WAL-E and repmgr, for example, end up having to do some creative
things to deliver certain features.  We need to start thinking of
groups of servers rather than individual servers as the unit of
deployment.



You can't build the highlevel management parts without first having the 
per node lower level parts done. It would be nice to have highlevel 
parts as well but nobody wrote that so far. I hope you don't expect 
logical replication patch to do all that, because if you do, you'll be 
disappointed.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] unexpected psql "feature"

2016-07-13 Thread Fabien COELHO


Hello devs,

Although this is probably a "feature", it is a weird one:

 $ psql
 psql (9.6beta2)
 fabien=# SELECT 1 AS one \;
 fabien-# SELECT 2 AS two ;
  two
 -
2
 (1 row)

Where is my "1"?

Although "\;" behavior is not documented, I would have expected both 
results to be shown one after the other, or having a an error, but not a 
quiet discard.


My guess is that psql uses PQexec which just returns the last result. 
Using PQsendQuery/PQgetResult would result in a much better behavior.


 fabien=# CREATE TABLE foo(id TEXT);
 CREATE TABLE
 fabien=# INSERT INTO foo VALUES('calvin') \;
 fabien-# INSERT INTO foo VALUES('hobbes') ;
 INSERT 0 1
 fabien=# SELECT * FROM foo;
id
 
  calvin
  hobbes
 (2 rows)

I would suggest that:
 - the \; psql feature should be documented somewhere
 - all results should be shown, not just the last one

Any opinion?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-13 Thread Christopher Browne
On 13 July 2016 at 15:06, Robert Haas  wrote:

> On Thu, Jul 7, 2016 at 9:25 PM, Simon Riggs  wrote:
> > I note also that replication slots aren't backed up by pg_dump; I see
> > analogy here and think that at least some parts of logical replication
> will
> > be similar and not require DDL at all, just as slots do not.
>
> I agree with that.  Of course, it's *impossible* to usefully back up a
> slot because the key ingredient in a slot is the LSN after which WAL
> should be preserved - and it's meaningless to preserve that across a
> dump and restore cycle.  But, for example, replication set definitions
> can be preserved across a dump and restore and I am quite sure users
> will find it very unfortunate if they aren't.
>

There should be some way of dumping and restoring these sorts of structures,
and if I were thinking of the name of a tool to dump them, it seems to me
that pg_dump is a pretty good name for it...  (Look for slonikconfdump.sh
for the latest iteration of the Slony variation, if interested...)

I have implemented "slony_dump" a couple of times; if that had become a
built-in, I sure hope a pg_dump flag could have been the thing to request
such.

The same seems likely true of FDW configuration; it sure would be nice to
be able to dump that in a consistent, reusable way.  Again, nice to have
that be an extension of pg_dump.

Replication configuration should be able to be dumped out in a form that
can be readily loaded somewhere else.  It might not be something to have
pg_dump do by default, but it should sure be somewhere; if it isn't, then
that's a reasonably serious shortcoming.  Slony didn't have such until
2009; a serious implementation of Logical Replication shouldn't wait
that long.

If what gets spit out is a series of
  select replicate_these_relations('['public']');
requests, well, I can actually live with that.

In the long run, it's preferable to have
  ALTER SCHEMA PUBLIC ENABLE REPLICATION;

but if the desired syntax isn't clear, at the start, we can surely live with
having functions, initially, as long as:
a) We know that's not intended as being the ultimate solution;
b) There's *some* sort of upgrade path that is helpful to indicate the
syntax that falls out;
c) There's tooling to dump out cluster information, whatever the syntax
form.

I'm getting quoted on being OK with not having syntax, initially...
I'm still fine with that, but take the above caveats to see my intent.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jul 13, 2016 at 1:10 PM, Tomas Vondra
>  wrote:
> What's not clear to me is to what extent slowing down pfree is an
> acceptable price for improving the behavior in other ways.  I wonder
> how many of the pfree calls in our current codebase are useless or
> even counterproductive, or could be made so.

I think there's a lot, but I'm afraid most of them are in contexts
(pun intended) where aset.c already works pretty well, ie it's a
short-lived context anyway.  The areas where we're having pain are
where there are fairly long-lived contexts with lots of pfree traffic;
certainly that seems to be the case in reorderbuffer.c.  Because they're
long-lived, you can't just write off the pfrees as ignorable.

I wonder whether we could compromise by reducing the minimum "standard
chunk header" to be just a pointer to owning context, with the other
fields becoming specific to particular mcxt implementations.  That would
be enough to allow contexts to decide that pfree was a no-op, say, or that
they wouldn't support GetMemoryChunkSpace(), without having to decree that
misuse can lead to crashes.  But that's still more than zero overhead
per-chunk.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sslmode=require fallback

2016-07-13 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jul 13, 2016 at 3:16 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Suppose we changed the default to "require".  How crazy would that be?

>> You mean, aside from the fact that it breaks every single installation
>> that hasn't configured with SSL?

> No, including that.

I cannot imagine such a plan surviving the first wave of villagers with
torches and pitchforks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread Tom Lane
Fabien COELHO  writes:
> Sending a batch of requests is a feature of libpq which is accessible 
> through pgbench by using "\;", although the fact is not documented. It 
> makes sense for a client to send independent queries together so as to 
> reduce latency.

You're putting an awful lot of weight on an unsupported assertion about
latency.  If a user cares about that, why would she not simply merge the
commands into "SELECT 1, 2, 3 \into one two three" ?

And I still say that what you're proposing might be easy right now, but
it might also be next door to impossible in a refactored implementation.
I don't think we should go there on the basis of a weak argument about
latency.  \into should retrieve data only from the last PGresult.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sslmode=require fallback

2016-07-13 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 16, 2016 at 3:42 AM, Magnus Hagander  wrote:
>> The default mode of "prefer" is ridiculous in a lot of ways. If you are
>> using SSL in any shape or form you should simply not use "prefer". That's
>> really the only answer at this point, unfortunately.

> Suppose we changed the default to "require".  How crazy would that be?

You mean, aside from the fact that it breaks every single installation
that hasn't configured with SSL?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread Fabien COELHO


Hello Robert,


I agree: I like \into.


Great!


But:


SELECT 1, 2 \; SELECT 3;
 \into one two three


I think that's pretty weird.


I agree that it is weird, but I do not think that it is bad.

Sending a batch of requests is a feature of libpq which is accessible 
through pgbench by using "\;", although the fact is not documented. It 
makes sense for a client to send independent queries together so as to 
reduce latency.


From pgbench perspective, I would find it pretty weird as well that one 
can send several queries together but could only read the answer from... 
say the first one, and the others would be lost.


From an implementation perspective doing it is straightforward, and 

rejecting it would require some more logic.

An obvious nicer feature would be to allow intermixing \into & \; but ISTM 
that it would require to rethink deeply pgbench lexing/parsing which has 
just been merged with psql by Tom and others...


If I had not pointed out the fact that it works, maybe no one would have 
noticed... so a compromise could be not to advertise the fact that it 
works (as the \; feature is not advertised anyway), but letting the 
implementation do it because it is simple and may be useful, and rephrase 
the documentation so that it is just about the previous select and not 
previous select*S*?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sslmode=require fallback

2016-07-13 Thread Robert Haas
On Wed, Jul 13, 2016 at 3:16 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Jun 16, 2016 at 3:42 AM, Magnus Hagander  wrote:
>>> The default mode of "prefer" is ridiculous in a lot of ways. If you are
>>> using SSL in any shape or form you should simply not use "prefer". That's
>>> really the only answer at this point, unfortunately.
>
>> Suppose we changed the default to "require".  How crazy would that be?
>
> You mean, aside from the fact that it breaks every single installation
> that hasn't configured with SSL?

No, including that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Robert Haas
On Wed, Jul 13, 2016 at 1:10 PM, Tomas Vondra
 wrote:
> I think the MemoryContext API may not be right abstraction for this.

+1. The MemoryContext API is little short of an absolute bar to
implementing an allocator that works significantly differently than
aset.c, and that's a shame because aset.c is great for little tiny
contexts that don't live very long and sucks for everything else.  In
particular, it sucks for anything that does a large number of
allocations.  It's also hostile to any kind of context stored inside a
DSM segment, since those don't have a fixed addresses across all
backends that share them.

The alternative that I played with a while back when I was working on
the sb_alloc allocator was to set things up so that we could identify
the memory context based on the pointer address rather than the chunk
header.  That's a whole lot better for memory efficiency since you can
squeeze out the chunk headers, but it inevitably slows down pfree.
What's not clear to me is to what extent slowing down pfree is an
acceptable price for improving the behavior in other ways.  I wonder
how many of the pfree calls in our current codebase are useless or
even counterproductive, or could be made so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jul 13, 2016 at 3:02 PM, Tom Lane  wrote:
>> I note also that we were talking a couple months ago
>> about trying to align psql and pgbench backslash commands more closely.
>> This would not be a good step in that direction.

> True, but I'd still argue that \into is a lot more readable than
> \gset.  Maybe both programs should support both commands.

Meh, personal preference there no doubt.  I'd be okay with both programs
supporting both commands, but the \into command as described here would
be quite unreasonable to implement in psql.  It needs to act more like
a semicolon-substitute, as \g and the rest of its family do.  (I think
I'd also lobby for spelling it \ginto.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sslmode=require fallback

2016-07-13 Thread Robert Haas
On Thu, Jun 16, 2016 at 3:42 AM, Magnus Hagander  wrote:
> You would think so.
>
> The default mode of "prefer" is ridiculous in a lot of ways. If you are
> using SSL in any shape or form you should simply not use "prefer". That's
> really the only answer at this point, unfortunately.

Suppose we changed the default to "require".  How crazy would that be?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread Robert Haas
On Wed, Jul 13, 2016 at 3:02 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO  wrote:
>>> If someone thinks that "gset" is a good idea for pgbench, which I don't, it
>>> could be implemented. I think that an "into" feature, like PL/pgSQL & ECPG,
>>> makes more sense for scripting.
>
>> I agree: I like \into.
>
>> But:
>
>>> SELECT 1, 2 \; SELECT 3;
>>> \into one two three
>
>> I think that's pretty weird.
>
> Yeah, that's seriously nasty action-at-a-distance in my view.  I'd be okay
> with
>
> SELECT 1, 2 \into one two
> SELECT 3 \into three
>
> but I do not think that a metacommand on a following line should
> retroactively affect the execution of a prior command, much less commands
> before the last one.  Even if this happens to be easy to do in pgbench's
> existing over-contorted logic, it's tremendously confusing to the user;
> and it might be much less easy if we try to refactor that logic.
>
> And I'm with Pavel on this: it should work exactly like \gset.  Inventing
> \into to do almost the same thing in a randomly different way exhibits a
> bad case of NIH syndrome.  Sure, you can argue about how it's not quite
> the same use-case and so you could micro-optimize by doing it differently,
> but that's ignoring the cognitive load on users who have to remember two
> different commands.  Claiming that plpgsql's SELECT INTO is a closer
> analogy than psql's \gset is quite bogus, too: the environment is
> different (client side vs server side, declared vs undeclared target
> variables), and the syntax is different (backslash or not, commas or not,
> just for starters).  I note also that we were talking a couple months ago
> about trying to align psql and pgbench backslash commands more closely.
> This would not be a good step in that direction.

True, but I'd still argue that \into is a lot more readable than
\gset.  Maybe both programs should support both commands.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove checkpoint_warning

2016-07-13 Thread Robert Haas
On Tue, Jul 12, 2016 at 3:19 AM, Magnus Hagander  wrote:
> +1, this is a useful warning.

+1.

I'd like to see more people turn log_checkpoints=on, and I often ask
customers to do that when they're having systemic performance
problems.  But this warning regularly alerts me to cases where I've
failed to configure max_wal_size to an adequately large value.  It's
true that the new values are less conservative than the old ones, but
they're still pretty conservative.  Anybody who still thinks that 1GB
is a lot of disk space might be due for a hardware upgrade.  I
regularly need to increase that value substantially on my *laptop*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread Tom Lane
Robert Haas  writes:
> On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO  wrote:
>> If someone thinks that "gset" is a good idea for pgbench, which I don't, it
>> could be implemented. I think that an "into" feature, like PL/pgSQL & ECPG,
>> makes more sense for scripting.

> I agree: I like \into.

> But:

>> SELECT 1, 2 \; SELECT 3;
>> \into one two three

> I think that's pretty weird.

Yeah, that's seriously nasty action-at-a-distance in my view.  I'd be okay
with

SELECT 1, 2 \into one two
SELECT 3 \into three

but I do not think that a metacommand on a following line should
retroactively affect the execution of a prior command, much less commands
before the last one.  Even if this happens to be easy to do in pgbench's
existing over-contorted logic, it's tremendously confusing to the user;
and it might be much less easy if we try to refactor that logic.

And I'm with Pavel on this: it should work exactly like \gset.  Inventing
\into to do almost the same thing in a randomly different way exhibits a
bad case of NIH syndrome.  Sure, you can argue about how it's not quite
the same use-case and so you could micro-optimize by doing it differently,
but that's ignoring the cognitive load on users who have to remember two
different commands.  Claiming that plpgsql's SELECT INTO is a closer
analogy than psql's \gset is quite bogus, too: the environment is
different (client side vs server side, declared vs undeclared target
variables), and the syntax is different (backslash or not, commas or not,
just for starters).  I note also that we were talking a couple months ago
about trying to align psql and pgbench backslash commands more closely.
This would not be a good step in that direction.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-13 Thread Robert Haas
On Tue, Jul 12, 2016 at 9:18 AM, Tom Lane  wrote:
> amatv...@bitec.ru writes:
>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>> sessions between thread"?
>
> No, not really.  The amount of overhead that would add --- eg, the need
> for locking on what used to be single-use caches --- makes the benefit
> highly questionable.  Also, most people who need this find that sticking
> a connection pooler in front of the database solves their problem, so
> there's not that much motivation to do a ton of work inside the database
> to solve it there.

I agree that there's not really a plan to implement this, but I don't
agree that connection pooling solves the whole problem.  Most people
can't get by with statement pooling, so in practice you are looking at
transaction pooling or session pooling.  And that means that you can't
really keep the pool size as small as you'd like because backends can
be idle in transaction for long enough to force the pool size to be
pretty large.  Also, pooling causes the same backends to get reused
for different sessions which touch different relations and different
functions so that, for example, the relcache and the PL/pgsql function
caches grow until every one of those sessions has everything cached
that any client needs.  That can cause big problems.

So, I actually think it would be a good idea to think about this.  The
problem, of course, is that as long as we allow arbitrary parts of the
code - including extension code - to declare global variables and
store arbitrary stuff in them without any coordination, it's
impossible to imagine hibernating and resuming a session without a
risk of things going severely awry.  This was a major issue for
parallel query, but we've solved it, mostly, by designating the things
that rely on global variables as parallel-restricted, and there
actually aren't a ton of those.  So I think it's imaginable that we
can get to a point where we can, at least in some circumstances, let a
backend exit and reconstitute its state at a later time.  It's not an
easy project, but I think it is one we will eventually need to do.
Insisting that the current model is working is just sticking our head
in the sand.  It's mostly working, but there are workloads where it
fails badly - and competing database products survive a number of
scenarios where we just fall on our face.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread Robert Haas
On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO  wrote:
> If someone thinks that "gset" is a good idea for pgbench, which I don't, it
> could be implemented. I think that an "into" feature, like PL/pgSQL & ECPG,
> makes more sense for scripting.

I agree: I like \into.

But:

> SELECT 1, 2 \; SELECT 3;
>  \into one two three

I think that's pretty weird.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Header and comments describing routines in incorrect shape in visibilitymap.c

2016-07-13 Thread Robert Haas
On Fri, Jul 8, 2016 at 7:14 AM, Michael Paquier
 wrote:
>> OK, that removes comment duplication. Also, what about replacing
>> "bit(s)" by "one or more bits" in the comment terms where adapted?
>> That's bikeshedding, but that's what this patch is about.
>
> Translating my thoughts into code, I get the attached.

Seems reasonable, but is the last hunk a whitespace-only change?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in batch tuplesort memory CLUSTER case (9.6 only)

2016-07-13 Thread Peter Geoghegan
On Wed, Jul 13, 2016 at 12:24 PM, Tom Lane  wrote:
>> I am happy with the adjustment. Please commit the adjusted patch.
>
> Done with minor adjustments.

Thanks. I'm pleased that we found a way forward that addressed every concern.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-13 Thread Robert Haas
On Fri, Jul 8, 2016 at 5:47 AM, Craig Ringer  wrote:
>> DDL is our standard way of getting things into the system catalogs.
>> We have no system catalog metadata that is intended to be populated by
>> any means other than DDL.
>
> Replication slots? (Arguably not catalogs, I guess)
>
> Replication origins?

Those things aren't catalogs, are they?  I mean, as I said in the
other email I just sent in reply to Simon, if you did a pg_dump and a
pg_restore, I don't think it would be useful to preserve replication
slot LSNs afterwards.  If I'm wrong, and that is a useful thing to do,
then we should have a pg_dump flag to do it.  Either way, I think we
do have some work to do figuring out how you can dump, restore, and
then resume logical replication, probably by establishing a new slot
and then incrementally resynchronizing without having to copy
unchanged rows.

That having been said, I think the choice not to use DDL for slots was
somewhat unfortunate.  We now have CREATE_REPLICATION_SLOT that can be
used via the replication protocol but there is no corresponding CREATE
REPLICATION SLOT for the regular protocol; I think that's kinda
strange.

>> If you want to add a column to a table, you
>> say ALTER TABLE .. ADD COLUMN.  If you want to add a column to an
>> extension, you say ALTER EXTENSION .. ADD TABLE.   If you want to add
>> an option to a foreign table, you say ALTER FOREIGN TABLE .. OPTIONS
>> (ADD ..).  Therefore, I think it is entirely reasonable and obviously
>> consistent with existing practice that if you want to add a table to a
>> replication set, you should write ALTER REPLICATION SET .. ADD TABLE.
>> I don't understand why logical replication should be the one feature
>> that departs from the way that all of our other features work.
>
> Because unlike all the other features, it can work usefully *across
> versions*.

So what?

> We have no extension points for DDL.
>
> For function interfaces, we do.
>
> That, alone, makes a function based interface overwhelmingly compelling
> unless there are specific things we *cannot reasonably do* without DDL.

I don't understand this.  We add new DDL in new releases, and we avoid
changing the meaning existing of DDL.  Using function interfaces won't
make it possible to change the meaning of existing syntax, and it
won't make it any more possible to add new syntax.  It will just make
replication commands be spelled differently from everything else.

> In many cases it's actively undesirable to dump and restore logical
> replication state. Most, I'd say. There probably are cases where it's
> desirable to retain logical replication state such that restoring a dump
> resumes replication, but I challenge you to come up with any sensible and
> sane way that can actually be implemented. Especially since you must
> obviously consider the possibility of both upstream and downstream being
> restored from dumps.

Yes, these issues need lots of thought, but I think that replication
set definitions, at least, are sensible to dump and reload.

> IMO the problem mostly devolves to making sure dumps taken of different DBs
> are consistent so new replication sessions can be established safely. And
> really, I think it's a separate feature to logical replication its self.

I think what is needed has more to do with coping with the situation
when the snapshots aren't consistent.  Having a way to make sure they
are consistent is a great idea, but there WILL be situations when
replication between two 10TB databases gets broken and it will not be
good if the only way to recover is to reclone.

> To what extent are you approaching this from the PoV of wanting to use this
> in FDW sharding? It's unclear what vision for users you have behind the
> things you say must be done, and I'd like to try to move to more concrete
> ground. You want DDL? OK, what should it look like? What does it add over a
> function based interface? What's cluster-wide and what's DB-local? etc.

I've thought about that question, a little bit, but it's not really
what underlies my concerns here.  I'm concerned about dump-and-restore
preserving as much state as is usefully possible, because I think
that's critical for the user experience, and I'm concerned with having
the commands we use to manage replication not be spelled totally
differently than our other commands.

However, as far as sharding is concerned, no matter how it gets
implemented, I think logical replication is a key feature.
Postgres-XC/XL has the idea of "replicated" tables which are present
on every data node, and that's very important for efficient
implementation of joins.  If you do a join between a little table and
a big sharded table, you want to be able to push that down to the
shards, and you can only do that if the entirety of the little table
is present on every shard or by creating a temporary copy on every
shard.  In many cases, the former will be preferable.  So, think it's
important for sharding that 

Re: [HACKERS] Bug in batch tuplesort memory CLUSTER case (9.6 only)

2016-07-13 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jul 13, 2016 at 11:44 AM, Tom Lane  wrote:
>> If you're good with that adjustment, I'm happy to commit this.

> I am happy with the adjustment. Please commit the adjusted patch.

Done with minor adjustments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-13 Thread Robert Haas
On Thu, Jul 7, 2016 at 9:25 PM, Simon Riggs  wrote:
> I note also that replication slots aren't backed up by pg_dump; I see
> analogy here and think that at least some parts of logical replication will
> be similar and not require DDL at all, just as slots do not.

I agree with that.  Of course, it's *impossible* to usefully back up a
slot because the key ingredient in a slot is the LSN after which WAL
should be preserved - and it's meaningless to preserve that across a
dump and restore cycle.  But, for example, replication set definitions
can be preserved across a dump and restore and I am quite sure users
will find it very unfortunate if they aren't.

> We have much to discuss in terms of security, the way it should work and
> what options to support and a sidetrack into syntax isn't warranted at this
> early stage. Please lets discuss those important things first, then return
> to whether DDL makes sense or not; it may do, or may not, or more likely
> which parts of it need DDL and which do not.

We've sort of hijacked this whole thread which was originally about
something different, so maybe it would be better to start a new thread
specifically to talk about the design of logical replication.  For my
money, though, I don't find the designs I've seen so far to be
particularly compelling - and I think that the problem is that we tend
to think about this from the point of view of the capabilities that
must be available within a single instance.  Physical replication has
the same issue.  Users don't want to configure archive_command and
wal_keep_segments and max_wal_senders and wal_level and set up an
archive and create recovery.conf on the standby.  They want to spin up
a new standby - and we don't provide any way to just do that.
pg_basebackup's -X stream and -R options represent significant
progress in that direction, but I don't think we've really taken it as
far as it can go yet, which is not to say I know exactly what's
missing.  Similarly, when the master fails, users want to promote a
standby (either one they choose or the one that is determined to be
furthest ahead) and remaster the others and that's not something you
can "just do".

Similarly, for logical replication, users will want to do things like
(1) spin up a new logical replication slave out of thin air,
replicating an entire database or several databases or selected
replication sets within selected databases; or (2) subscribe an
existing database to another server, replicating an entire database or
several databases; or (3) repoint an existing subscription at a new
server after a master change or dump/reload, resynchronizing table
contents if necessary; or (4) stop replication, either with or without
dropping the local copies of the replicated tables.  (This is not an
exhaustive list, I'm sure.)

I don't mean to imply that the existing designs are bad as far as they
go.  In each case, the functionality that has been built is good.  But
it's all focused, as it seems to me, on providing capabilities rather
than on providing a way for users to manage a group of database
servers using high-level primitives.  That higher-level stuff largely
gets left to add-on tools, which I don't think is serving us
particularly well.  Those add-on tools often find that the core
support doesn't quite do everything they'd like it to do: that's why
WAL-E and repmgr, for example, end up having to do some creative
things to deliver certain features.  We need to start thinking of
groups of servers rather than individual servers as the unit of
deployment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-13 Thread Joshua D. Drake

On 07/07/2016 01:01 PM, Robert Haas wrote:


There was an unconference session on this topic at PGCon and quite a
number of people there stated that they found DDL to be an ease-of-use
feature and wanted to have it.


Yeah, I haven't meet anyone yet that would like to have:

select replicate_these_relations('['public']);

vs:

ALTER SCHEMA public ENABLE REPLICATION;

(or something like that).


Sincerely,

JD







--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in batch tuplesort memory CLUSTER case (9.6 only)

2016-07-13 Thread Peter Geoghegan
On Wed, Jul 13, 2016 at 11:44 AM, Tom Lane  wrote:
> What I don't much like is that it enlarges cluster.out with 200K of
> random-looking, hard-to-manually-verify data.  May I suggest that
> we replace the SELECTs with
>
> select * from
> (select hundred, lag(hundred) over () as lhundred,
> thousand, lag(thousand) over () as lthousand,
> tenthous, lag(tenthous) over () as ltenthous from clstr_4) ss
> where row(hundred, thousand, tenthous) <= row(lhundred, lthousand, ltenthous);
>  hundred | lhundred | thousand | lthousand | tenthous | ltenthous
> -+--+--+---+--+---
> (0 rows)

It independently occurred to me that I should have done something like
this afterwards. I agree.

> If you're good with that adjustment, I'm happy to commit this.

I am happy with the adjustment. Please commit the adjusted patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-13 Thread Robert Haas
On Thu, Jul 7, 2016 at 8:53 PM, Simon Riggs  wrote:
>> I thought I sat through, at least, most of it, but you barely gave
>> anyone else a chance to talk, which kind of misses the point of an
>> unconference.  The portion which I attended was not about how to move
>> the development of the feature forward, but just involved describing
>> it.  I thought it was a shame that the time wasn't used better.
>
> I think the problem was that I gave everybody an even shot at commenting,
> rather than focusing on a few key developers.

If that had been what happened, I wouldn't consider it a problem, but
I don't think that's what happened.

>> I really don't think that's accurate.  There might have been 50% of
>> people who thought that not having DDL was acceptable, but I think
>> there were very few people who found it preferable.
> Without being in the room, its kinda hard for you to know, right?

I was in the room for that part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in batch tuplesort memory CLUSTER case (9.6 only)

2016-07-13 Thread Tom Lane
Peter Geoghegan  writes:
> Attached patch adds a CLUSTER external sort test case to the
> regression tests, as discussed.

I took a quick look at this patch.

> This makes a hardly noticeable difference on the run time of the
> CLUSTER tests, at least for me. Consider the following:

I tried the patch on prairiedog's host.  That's not the slowest
machine in the buildfarm, but it's right down there.  The patch
increases the runtime of the "cluster" test from ~1 sec to ~3 sec,
which I agree is pretty negligible (total time for the standard
regression tests is ~5 min on this machine).  That seems a cheap
price to pay for a significant improvement in code coverage.

What I don't much like is that it enlarges cluster.out with 200K of
random-looking, hard-to-manually-verify data.  May I suggest that
we replace the SELECTs with

select * from
(select hundred, lag(hundred) over () as lhundred,
thousand, lag(thousand) over () as lthousand,
tenthous, lag(tenthous) over () as ltenthous from clstr_4) ss
where row(hundred, thousand, tenthous) <= row(lhundred, lthousand, ltenthous);
 hundred | lhundred | thousand | lthousand | tenthous | ltenthous 
-+--+--+---+--+---
(0 rows)

If you're good with that adjustment, I'm happy to commit this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Tomas Vondra

On 07/13/2016 07:37 PM, Tom Lane wrote:

Peter Geoghegan  writes:

On Wed, Jul 13, 2016 at 7:53 AM, Tomas Vondra
 wrote:

In the thread [1] dealing with hashjoin bug introduced in 9.5, Tom voiced
his dislike of dense_alloc. I kinda agree with him that introducing "local
allocators" may not be the best idea, and as dense_alloc was introduced by
me I was playing with the idea to wrap this into a regular memory context,
perhaps with some restrictions (e.g. no pfree). But I'm having trouble with
that approach ...



I think that the "no pfree()" restriction would be necessary to get
the same benefit. But, doesn't that undermine the whole idea of making
it a memory context?


The other thing that doesn't seem to square at all with a general-purpose
memory context is the desire to walk through the stored tuples directly,
knowing that they are adjacent.  That means nothing else can be allocated
via the same mechanism.  So I tend to agree that if we accept Tomas' three
requirements as non-negotiable, then trying to make the allocator match
the MemoryContext API is probably impractical.

My feeling at this point is that we should leave it alone until/unless
we see similar requirements elsewhere, and then look to see if we can
derive a common abstraction.  I always find that it's easier to design
APIs based on concrete use-cases than on guesses about what will be
needed.


I agree with both points.

I think the MemoryContext API may not be right abstraction for this. 
Given a hammer big enough it would probably work in the end, but it'd 
probably require changes to the public MemoryContext API (e.g. relaxing 
the StandardChunkHeader requirement). And that seems a bit too risky.


So we probably need a new independent abstraction for this, but doing 
that based on a single use case is a bit silly.




I wonder though if we don't already have another similar use-case in
the ad-hoc "slab allocators" in reorderbuffer.c.  We already know that
that code has performance issues, cf bug #14231, so I suspect there's
a redesign in its future anyway.



I'm not sure - I'm not familiar with reorderbuffer.c, but it seems to do 
a fair number of pfrees and such. Also, pfrees seem to be the root of 
the performance issue. I suspect the slab allocator (or rather the 
allocation strategy in general) may need rethinking, but let's discuss 
that in that thread.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-07-13 Thread Corey Huinker
On Mon, Jul 11, 2016 at 8:35 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 7/9/16 4:00 PM, Andrew Gierth wrote:
>
>> How about
>>
>> Time: 1234567.666 ms (20m 34.6s)
>>
>
> That's similar to what I had in mind, so I'd be happy with that.
>
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Ok, here's what I came up with (with time to test it).

If the query took less than a second, timing display is unchanged.
Otherwise, print the ms time, followed by a more human readable form
accurate to 0.001s.

# select 1; select pg_sleep(1); select pg_sleep(71); select pg_sleep
(3601); select pg_sleep(24*3601);
 ?column?
--
1
(1 row)

Time: 1.575 ms
 pg_sleep
--

(1 row)

Time: 1002.568 ms (1.003s)
 pg_sleep
--

(1 row)

Time: 71041.022 ms (1m 11.041s)
 pg_sleep
--

(1 row)

Time: 3601083.544 ms (1h 0m 1.084s)
 pg_sleep
--

(1 row)

Time: 86424018.416 ms (1d 0h 0m 24.018s)



As-is, there isn't much that could be done for regression or documentation
changes, so I'll just leave this here.
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 2450b9c..7a87314 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -529,6 +529,58 @@ ClearOrSaveResult(PGresult *result)
}
 }
 
+/*
+ * Print the time elapsed in both raw milliseconds and if the time elapsed 
+ * was at least 1 second, also print a format normalized to days, hours,
+ * minutes, and seconds.
+ */
+static void
+PrintTiming(double elapsed_msec)
+{
+   double seconds;
+   int minutes;
+   int hours;
+   int days;
+
+   if (elapsed_msec < 1000.0)
+   {
+   printf(_("Time: %.3f ms\n"), elapsed_msec);
+   return;
+   }
+
+   seconds = elapsed_msec / 1000.0;
+   if (seconds < 60.0)
+   {
+   printf(_("Time: %.3f ms (%.3fs)\n"), elapsed_msec, 
seconds);
+   return;
+   }
+
+   minutes = (int)seconds / 60;
+   seconds = seconds - (60.0 * minutes);
+   if (minutes < 60)
+   {
+   printf(_("Time: %.3f ms (%dm %.3fs)\n"),
+   elapsed_msec, minutes, seconds);
+   return;
+   }
+
+   hours = minutes / 60;
+   minutes = minutes % 60;
+
+   if (hours < 24)
+   {
+   printf(_("Time: %.3f ms (%dh %dm %.3fs)\n"),
+   elapsed_msec, hours, minutes, seconds);
+   return;
+   }
+
+   days = hours / 24;
+   hours = hours % 24;
+
+   printf(_("Time: %.3f ms (%dd %dh %dm %.3fs)\n"),
+   days, elapsed_msec, hours, minutes, seconds);
+}
+
 
 /*
  * PSQLexec
@@ -678,7 +730,7 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt)
 
/* Possible microtiming output */
if (pset.timing)
-   printf(_("Time: %.3f ms\n"), elapsed_msec);
+   PrintTiming(elapsed_msec);
 
return 1;
 }
@@ -1328,7 +1380,7 @@ SendQuery(const char *query)
 
/* Possible microtiming output */
if (pset.timing)
-   printf(_("Time: %.3f ms\n"), elapsed_msec);
+   PrintTiming(elapsed_msec);
 
/* check for events that may occur during query execution */
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Tom Lane
Andres Freund  writes:
> On 2016-07-13 13:37:55 -0400, Tom Lane wrote:
>> We already know that
>> that code has performance issues, cf bug #14231, so I suspect there's
>> a redesign in its future anyway.

> Note that it's not the slab allocators that is having problems, it's
> aset.c, iterating through all allocated blocks.

Well, my point is that that code is using allocation patterns that aset.c
isn't very well suited for.  The slab allocator logic attempts to paper
over one part of that impedance mismatch, but we're seeing there's more.
I haven't looked at it closely enough to have a solution proposal.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Andres Freund
On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote:
> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila  wrote:
> > On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  wrote:
> >> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:
> >>
> >>> I'm a bit confused, why aren't we simply adding LSN interlock
> >>> checks for toast? Doesn't look that hard? Seems like a much more
> >>> natural course of fixing this issue?
> >>
> >> I took some time trying to see what you have in mind, and I'm
> >> really not "getting it".
> >
> > Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
> > when old_snapshot_threshold > 0 and add a check for
> > HeapTupleSatisfiesToast in TestForOldSnapshot()?
> 
> With that approach, how will we know *not* to generate an error
> when reading the chain of tuples for a value we are deleting.  Or
> positioning to modify an index on toast data.  Etc., etc. etc.

I'm not following. How is that different in the toast case than in the
heap case?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Andres Freund
On 2016-07-12 10:04:45 -0500, Kevin Grittner wrote:
> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:
> 
> > I'm a bit confused, why aren't we simply adding LSN interlock
> > checks for toast? Doesn't look that hard? Seems like a much more
> > natural course of fixing this issue?
> 
> I took some time trying to see what you have in mind, and I'm
> really not "getting it".  I definitely applaud you for spotting the
> problem, but this suggestion for solving it doesn't seem to be
> useful.

...

> Basically, after turning this suggestion every way I could, I see
> two alternative ways to implement it.

What I was actually getting at was to perform TestForOldSnapshot() in
the HeapTupleSatisfiesToast case as well. That'd require minor amounts
of work to keep the lsn up2date, but otherwise should be fairly easy to
implement.  It seems much more logical to use the same mechanism we use
for heap for toast as well, rather than implementing something separate.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Andres Freund
On 2016-07-13 13:37:55 -0400, Tom Lane wrote:
> I wonder though if we don't already have another similar use-case in
> the ad-hoc "slab allocators" in reorderbuffer.c.

That seems to call more for a general slab allocator design, than for
something like here. After all, there's plenty of freeing ther.e

> We already know that
> that code has performance issues, cf bug #14231, so I suspect there's
> a redesign in its future anyway.

Note that it's not the slab allocators that is having problems, it's
aset.c, iterating through all allocated blocks.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jul 13, 2016 at 7:53 AM, Tomas Vondra
>  wrote:
>> In the thread [1] dealing with hashjoin bug introduced in 9.5, Tom voiced
>> his dislike of dense_alloc. I kinda agree with him that introducing "local
>> allocators" may not be the best idea, and as dense_alloc was introduced by
>> me I was playing with the idea to wrap this into a regular memory context,
>> perhaps with some restrictions (e.g. no pfree). But I'm having trouble with
>> that approach ...

> I think that the "no pfree()" restriction would be necessary to get
> the same benefit. But, doesn't that undermine the whole idea of making
> it a memory context?

The other thing that doesn't seem to square at all with a general-purpose
memory context is the desire to walk through the stored tuples directly,
knowing that they are adjacent.  That means nothing else can be allocated
via the same mechanism.  So I tend to agree that if we accept Tomas' three
requirements as non-negotiable, then trying to make the allocator match
the MemoryContext API is probably impractical.

My feeling at this point is that we should leave it alone until/unless
we see similar requirements elsewhere, and then look to see if we can
derive a common abstraction.  I always find that it's easier to design
APIs based on concrete use-cases than on guesses about what will be
needed.

I wonder though if we don't already have another similar use-case in
the ad-hoc "slab allocators" in reorderbuffer.c.  We already know that
that code has performance issues, cf bug #14231, so I suspect there's
a redesign in its future anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-13 Thread Stephen Frost
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 7/12/16 7:11 PM, Stephen Frost wrote:
> > I'm curious how it's useful and in what way \sf does not accomplish what
> > you use \df+ for.
> 
> One main use is to see multiple related functions next to each other and
> compare their source code.  But also because one is used to \df and
> wants to see everything there and not in a different format like \sf.

Except that you don't actually get to see related functions next to each
other with \df+, you see them one after each other, which is not a very
useful diff comparison.  I don't see any value in the "because that's
how it's always been" argument.

> So ways to consolidate that would be supporting wildcards and multiple
> results in \sf, and/or the option to show a truncated version of the
> source code in \df+, or perhaps a \df++.

I don't mind adding wildcard support to \sf if there is interest.  I
dislike the "\df++" idea.  I have no idea how a "truncated version"
would ever be anything but noise for non-C/internal functions.

> > We've already had to change the structure of \df+; I'm not convinced
> > that avoiding doing so further now, just to do so again in the next
> > release, is actually a better answer than changing it now.
> 
> We added a new column related to a new feature, which is hardly changing
> the structure.

I disagree.  Adding a column is certainly changing the structure, as is
removing one.  This certainly hasn't changed my opinion that it's
worthwhile to consider this change, even at this point in the release
cycle, given we need to make a change regardless.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Peter Geoghegan
On Wed, Jul 13, 2016 at 7:53 AM, Tomas Vondra
 wrote:
> In the thread [1] dealing with hashjoin bug introduced in 9.5, Tom voiced
> his dislike of dense_alloc. I kinda agree with him that introducing "local
> allocators" may not be the best idea, and as dense_alloc was introduced by
> me I was playing with the idea to wrap this into a regular memory context,
> perhaps with some restrictions (e.g. no pfree). But I'm having trouble with
> that approach ...

I think that the "no pfree()" restriction would be necessary to get
the same benefit. But, doesn't that undermine the whole idea of making
it a memory context?

In my view, these "local allocators" are not so bad. They're a bit
ugly, but that seems to be worth it so far, and I don't think that
there is that much incidental complexity that could be encapsulated.
For a few modules, including tuplesort.c, the hash join code,
tuplestore.c, and possibly a couple of others, having precise control
over memory just seems like a good idea to me (i.e. doling it out from
some initial large batch palloc() allocations according to some
considerations about the relevant data structures, leaving a
cache-friendly layout).

I suspect that there are not that many places where it is worth it to
even contemplate batch or dense allocators, so I doubt that what we
will see all that many more instances of "local allocators".

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-13 Thread Tom Lane
Peter Eisentraut  writes:
> On 7/12/16 7:11 PM, Stephen Frost wrote:
>> I'm curious how it's useful and in what way \sf does not accomplish what
>> you use \df+ for.

> One main use is to see multiple related functions next to each other and
> compare their source code.  But also because one is used to \df and
> wants to see everything there and not in a different format like \sf.

Well, how about my suggestion of moving source code to a footer?
I had just been experimenting to see how painful that would be, and
it doesn't seem awful --- see attached.

regards, tom lane

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 27be102..f5dfd83 100644
*** a/src/bin/psql/describe.c
--- b/src/bin/psql/describe.c
*** describeFunctions(const char *functypes,
*** 294,308 
  	bool		showNormal = strchr(functypes, 'n') != NULL;
  	bool		showTrigger = strchr(functypes, 't') != NULL;
  	bool		showWindow = strchr(functypes, 'w') != NULL;
  	bool		have_where;
  	PQExpBufferData buf;
  	PGresult   *res;
  	printQueryOpt myopt = pset.popt;
  	static const bool translate_columns[] = {false, false, false, false, true, true, true, false, true, false, false, false, false};
  
- 	/* No "Parallel" column before 9.6 */
- 	static const bool translate_columns_pre_96[] = {false, false, false, false, true, true, false, true, false, false, false, false};
- 
  	if (strlen(functypes) != strspn(functypes, "antwS+"))
  	{
  		psql_error("\\df only takes [antwS+] as options\n");
--- 294,316 
  	bool		showNormal = strchr(functypes, 'n') != NULL;
  	bool		showTrigger = strchr(functypes, 't') != NULL;
  	bool		showWindow = strchr(functypes, 'w') != NULL;
+ 	bool		have_parallel;
  	bool		have_where;
  	PQExpBufferData buf;
  	PGresult   *res;
+ 	printTableContent cont;
  	printQueryOpt myopt = pset.popt;
+ 	int			nfields,
+ r,
+ c;
+ 	const int	schema_col = 0;
+ 	const int	proname_col = 1;
+ 	const int	proargs_col = 3;
+ 	const int	parallel_col = 6;
+ 	const int	lanname_col = 10;
+ 	const int	prosrc_col = 11;
  	static const bool translate_columns[] = {false, false, false, false, true, true, true, false, true, false, false, false, false};
  
  	if (strlen(functypes) != strspn(functypes, "antwS+"))
  	{
  		psql_error("\\df only takes [antwS+] as options\n");
*** describeFunctions(const char *functypes,
*** 323,328 
--- 331,344 
  			showWindow = true;
  	}
  
+ 	/*
+ 	 * proparallel only exists in server versions >= 9.6.  Before that, we
+ 	 * retrieve a null "parallel" column so as to keep column numbering
+ 	 * consistent in the query result, and then skip adding that column to the
+ 	 * printed table.
+ 	 */
+ 	have_parallel = (pset.sversion >= 90600);
+ 
  	initPQExpBuffer();
  
  	printfPQExpBuffer(,
*** describeFunctions(const char *functypes,
*** 424,430 
  		  gettext_noop("stable"),
  		  gettext_noop("volatile"),
  		  gettext_noop("Volatility"));
! 		if (pset.sversion >= 90600)
  			appendPQExpBuffer(,
  			  ",\n CASE\n"
  			  "  WHEN p.proparallel = 'r' THEN '%s'\n"
--- 440,446 
  		  gettext_noop("stable"),
  		  gettext_noop("volatile"),
  		  gettext_noop("Volatility"));
! 		if (have_parallel)
  			appendPQExpBuffer(,
  			  ",\n CASE\n"
  			  "  WHEN p.proparallel = 'r' THEN '%s'\n"
*** describeFunctions(const char *functypes,
*** 435,440 
--- 451,459 
  			  gettext_noop("safe"),
  			  gettext_noop("unsafe"),
  			  gettext_noop("Parallel"));
+ 		else
+ 			appendPQExpBufferStr(,
+  ",\n NULL as \"Parallel\"");
  		appendPQExpBuffer(,
  	   ",\n pg_catalog.pg_get_userbyid(p.proowner) as \"%s\""
   ",\n CASE WHEN prosecdef THEN '%s' ELSE '%s' END AS \"%s\"",
*** describeFunctions(const char *functypes,
*** 449,455 
  		  ",\n p.prosrc as \"%s\""
  ",\n pg_catalog.obj_description(p.oid, 'pg_proc') as \"%s\"",
  		  gettext_noop("Language"),
! 		  gettext_noop("Source code"),
  		  gettext_noop("Description"));
  	}
  
--- 468,474 
  		  ",\n p.prosrc as \"%s\""
  ",\n pg_catalog.obj_description(p.oid, 'pg_proc') as \"%s\"",
  		  gettext_noop("Language"),
! 		  gettext_noop("Internal name"),
  		  gettext_noop("Description"));
  	}
  
*** describeFunctions(const char *functypes,
*** 543,569 
  	appendPQExpBufferStr(, "ORDER BY 1, 2, 4;");
  
  	res = PSQLexec(buf.data);
- 	termPQExpBuffer();
  	if (!res)
  		return false;
  
! 	myopt.nullPrint = NULL;
! 	myopt.title = _("List of functions");
! 	myopt.translate_header = true;
! 	if (pset.sversion >= 90600)
  	{
! 		myopt.translate_columns = translate_columns;
! 		myopt.n_translate_columns = lengthof(translate_columns);
  	}
! 	else
  	{
! 		myopt.translate_columns = translate_columns_pre_96;
! 		myopt.n_translate_columns = lengthof(translate_columns_pre_96);
  	}
  
! 	

Re: [HACKERS] UPSERT/RETURNING -> ON CONFLICT SELECT?

2016-07-13 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jul 13, 2016 at 2:49 AM, Bjørnar Ness  wrote:
>> But with upsert/do nothing, this will not work as "needed".
>> 
>> Would it be possible to introduce a "ON CONFLICT SELECT" argument:
>> 
>> with _foo as (
>> insert into foo(i) values(1)
>> on conflict select returning id
>> ) insert into bar(foo_id,i)
>> select id,2 from _foo;

> I gather that the point of this pseudo SQL is to show how you might be
> able to project and select the values not successfully inserted. Can't
> you just pipeline together some CTEs instead?

What's "needed" seems a little ill-defined here, anyway.  Would the SELECT
be expected to return values from the failed-to-be-inserted row, or from
the existing conflicting row?  (Is there certain to be only one
conflicting row?  With exclusion constraints I'd think not.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT/RETURNING -> ON CONFLICT SELECT?

2016-07-13 Thread Peter Geoghegan
On Wed, Jul 13, 2016 at 2:49 AM, Bjørnar Ness  wrote:
> But with upsert/do nothing, this will not work as "needed".
>
> Would it be possible to introduce a "ON CONFLICT SELECT" argument:
>
> with _foo as (
>   insert into foo(i) values(1)
>   on conflict select returning id
> ) insert into bar(foo_id,i)
>   select id,2 from _foo;

I gather that the point of this pseudo SQL is to show how you might be
able to project and select the values not successfully inserted. Can't
you just pipeline together some CTEs instead?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] application_name in process name?

2016-07-13 Thread Tom Lane
Mike Blackwell  writes:
> There are times when it would be useful to have the application_name
> connection parameter displayed in the process name - and thus in ps and
> pg_top - in addition to the user and database name.

> Would there be any downside to this?

In a lot of situations ("top" for instance) only a limited number of
characters can be displayed from a process title.  I'm hesitant to add
fields to that string that we don't really need.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-13 Thread Peter Eisentraut
On 7/12/16 7:11 PM, Stephen Frost wrote:
> I'm curious how it's useful and in what way \sf does not accomplish what
> you use \df+ for.

One main use is to see multiple related functions next to each other and
compare their source code.  But also because one is used to \df and
wants to see everything there and not in a different format like \sf.

So ways to consolidate that would be supporting wildcards and multiple
results in \sf, and/or the option to show a truncated version of the
source code in \df+, or perhaps a \df++.

> We've already had to change the structure of \df+; I'm not convinced
> that avoiding doing so further now, just to do so again in the next
> release, is actually a better answer than changing it now.

We added a new column related to a new feature, which is hardly changing
the structure.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] application_name in process name?

2016-07-13 Thread Mike Blackwell
There are times when it would be useful to have the application_name
connection parameter displayed in the process name - and thus in ps and
pg_top - in addition to the user and database name.

Would there be any downside to this?  If it were done, are there any
suggestions on how it could be added the process name so as to minimize
impact on anything that might be trying to parse that line?

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com



* *


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Kevin Grittner
On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila  wrote:
> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  wrote:
>> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:
>>
>>> I'm a bit confused, why aren't we simply adding LSN interlock
>>> checks for toast? Doesn't look that hard? Seems like a much more
>>> natural course of fixing this issue?
>>
>> I took some time trying to see what you have in mind, and I'm
>> really not "getting it".
>
> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
> when old_snapshot_threshold > 0 and add a check for
> HeapTupleSatisfiesToast in TestForOldSnapshot()?

With that approach, how will we know *not* to generate an error
when reading the chain of tuples for a value we are deleting.  Or
positioning to modify an index on toast data.  Etc., etc. etc.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation fix for CREATE FUNCTION

2016-07-13 Thread Tom Lane
Albe Laurenz  writes:
> I just noticed that the documentation for CREATE FUNCTION still mentions
> that the temporary namespace is searched for functions even though that
> has been removed with commit aa27977.

The example you propose to correct was introduced by that same commit,
which should make you think twice about whether it really was invalidated
by that commit.

I believe the reason for forcing pg_temp to the back of the path is to
prevent unqualified table names from being captured by pg_temp entries.
This risk exists despite the rule against searching pg_temp for functions
or operators.  A maliciously named temp table could at least prevent
a security definer function from doing what it was supposed to, and
could probably hijack control entirely via triggers or rules.

Possibly the documentation should be more explicit about why this is
being done, but the example code is good as-is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] rethinking dense_alloc (HashJoin) as a memory context

2016-07-13 Thread Tomas Vondra

Hi,

In the thread [1] dealing with hashjoin bug introduced in 9.5, Tom 
voiced his dislike of dense_alloc. I kinda agree with him that 
introducing "local allocators" may not be the best idea, and as 
dense_alloc was introduced by me I was playing with the idea to wrap 
this into a regular memory context, perhaps with some restrictions (e.g. 
no pfree). But I'm having trouble with that approach ...


Let me quickly explain the idea behind dense_alloc. When building the 
tuple hash table in hash join, we simply allocate large chunk of memory 
using palloc (~32kB), and then store the tuples into the chunk on our 
own without calling palloc for each tuple. Each tuple already has length 
in the header, so we don't need chunk header. Also, we don't do the 2^k 
chunk sizes and instead store the tuples densely.


This means we can't do repalloc or pfree on the tuples, but fine. We 
never did repalloc in hashjoin anyway, and pfree is only needed when 
increasing the number of batches. But with dense_alloc we can simply 
walk through the tuples as stored in the allocated chunks, which has the 
nice benefit that it's sequential, making memory prefetching more 
efficient than with the old code (walking through buckets). Also, no 
freelists and such.


So the dense_alloc has several benefits:

(a) memory reduction thanks to eliminating StandardChunkHeader (which is 
16B, and quite noticeable for narrow tuples)


(b) memory reduction thanks to dense packing tuples (not leaving free 
space in each chunk)


(c) improving efficiency by sequential memory accesses (compared to 
random accesses caused by access through buckets)


Per the measurements done in thread [2], (a) and (b) may reduce memory 
requirements by 50% in some cases. I also vaguely remember doing 
benchmarks for (c) and seeing measurable improvements, but I don't see 
the numbers in the thread, so either it was posted somewhere else or not 
at all :-/


Anyway, I'm explaining this because I think it's important the new 
reworked code achieves the same benefits. But when trying to implement 
it as a special memory context, I quickly ran into the requirement that 
each chunk has a chunk header [3] which would prevent (a).


I know it was proposed to only include the chunk header when compiled 
with asserts, but I don't like the idea of having a reusable code that 
depends on that (and fails with a segfault without it).


If I have to choose between a memory context that is essentially meant 
to be reused, but likely to fail unexpectedly in non-assert builds, and 
a special local allocator isolated to a single node, I choose the 
latter. Perhaps I'd see this differently had there been other places 
that could use the new memory context, but I can't think of one.


Opinions?


[1] 
https://www.postgresql.org/message-id/flat/7034.1454722453%40sss.pgh.pa.us


[2] https://www.postgresql.org/message-id/flat/53B4A03F.3070409%40fuzzy.cz

[3] 
https://github.com/postgres/postgres/blob/master/src/include/utils/memutils.h#L49


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying the interface of UpdateMinRecoveryPoint

2016-07-13 Thread Simon Riggs
On 12 July 2016 at 23:49, Michael Paquier  wrote:


> Hence why not simplifying its interface and remove the force flag?


Is this change needed as part of a feature? If not, I see no reason for
change.

If we all work towards meaningful features the code can be cleaned up as we
go.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Simplifying the interface of UpdateMinRecoveryPoint

2016-07-13 Thread Michael Paquier
On Wed, Jul 13, 2016 at 8:27 PM, Amit Kapila  wrote:
> On Wed, Jul 13, 2016 at 9:19 AM, Michael Paquier
>  wrote:
>> Hence why not simplifying its interface and remove the force flag?
>
> One point to note is that the signature and usage of
> UpdateMinRecoveryPoint() is same as it was when it got introduced in
> commit-cdd46c76.  Now the only reasons that come to my mind for
> introducing the force parameter was (a) it looks cleaner that way to
> committer (b) they have some usecase for the same in mind (c) it got
> have overlooked.  Now, if it got introduced due to (c), then your
> patch does the right thing by removing it.  Personally, I feel
> overloading the parameter for multiple purposes makes code less
> maintainable, so retaining as it is in HEAD has some merits.

There is no way to tell what that is, but perhaps Heikki recalls
something on the matter. I am just adding him in CC.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disable WAL completely - Performance and Persistency research

2016-07-13 Thread Netanel Katzburg
Your patch is very helpful, I'm still checking it on different file-systems.
I really liked the idea of using only the edge checkpoints.
Thanks.

On Mon, Jul 11, 2016 at 9:26 PM, Jeff Janes  wrote:

> On Thu, Jul 7, 2016 at 1:01 AM, Netanel Katzburg 
> wrote:
> > Hi All,
> >
> > As part of my masters at TAU, I'm currently conducting some research
> > regarding new persistent memory technology.
> > I'm using PG for this research and would like to better understand some
> of
> > the performance bottlenecks.
> > For this reason I'm trying to disable the WAL completely, using some
> hacks
> > on the source code and compiling my own version.
> >
> > So what I'm actually looking for, is some guidance about a simple way to:
> >
> > 1. Disable the WAL by not writing anything to the xlog directory. I don't
> > care about recovery/fault tolerance or PITR/ replication etc at the
> moment.
> > I'm aware that the WAL and checkpoint are bind in many ways and are
> crucial
> > for PG core features.
> > I tried changing the status of all tables to "unlogged" tables by
> changing
> > RelationNeedsWAL MACRO, as well as "needs_wal" parameter at storage.c.
> > But, got no performance benefit, so I guess this was the wrong place to
> > change.
> >
> > 2. Cancel the locking around WAL files  - I don't care about corrupted
> files
> > at the moment, I just want to see what is the maximum performance benefit
> > that I can get without lock contention.
> >
> > Any guidance on how to do so would be appreciated :)
>
> I have a very old patch which introduces a config variable (JJNOWAL)
> that skips all WAL, except for the WAL of certain checkpoints (which
> are needed for initdb and to restart the server after a clean
> shutdown).
>
> I have rebased it up to HEAD.  It seems to work, but I haven't tested
> thoroughly that it still does the correct thing in every corner case.
> (a lot of changes have been made to xlog code since last time I used
> this.)
>
> Obviously if the server goes down uncleanly while this setting is
> active, it will not be usable anymore.
>
> Cheers,
>
> Jeff
>


[HACKERS] UPSERT/RETURNING -> ON CONFLICT SELECT?

2016-07-13 Thread Bjørnar Ness
The new upsert feature is a great addition, but in some cases is not
as usable as
I and seems lots of others would like it to be, take an example with
circular references:

create table foo (
  id serial references bar(foo_id) on delete cascade,
  i int
);

create table bar (
  foo_id integer references foo(id) on delete cascade,
  i int
);

A insert here would be:

with _foo as (
  insert into foo(i) values(1) returning id
) insert into bar(foo_id,i)
  select id,2 from _foo;

But with upsert/do nothing, this will not work as "needed".

Would it be possible to introduce a "ON CONFLICT SELECT" argument:

with _foo as (
  insert into foo(i) values(1)
  on conflict select returning id
) insert into bar(foo_id,i)
  select id,2 from _foo;

-- 
Bj(/)rnar


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Amit Kapila
On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  wrote:
> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:
>
>> I'm a bit confused, why aren't we simply adding LSN interlock
>> checks for toast? Doesn't look that hard? Seems like a much more
>> natural course of fixing this issue?
>
> I took some time trying to see what you have in mind, and I'm
> really not "getting it".
>

Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
when old_snapshot_threshold > 0 and add a check for
HeapTupleSatisfiesToast in TestForOldSnapshot()?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup wish list

2016-07-13 Thread Amit Kapila
On Tue, Jul 12, 2016 at 10:23 PM, Jeff Janes  wrote:
> I've been having some adventures with pg_basebackup lately, and had
> some suggestions based on those.
>
> The --help message for pg_basebackup says:
>
> -Z, --compress=0-9 compress tar output with given compression level
>
> But -Z0 is then rejected as 'invalid compression level "0"'.  The real
> docs do say 1-9, only the --help message has this bug.  Trivial patch
> attached.
>
> These ones I have not written code for yet:
>
> The progress reporting for pg_basebackup is pretty terse:
>
>  858117/7060099 kB (12%), 0/1 tablespace
>
> I think we should at least add a count-up timer showing the seconds it
> has been running.  I can always use my own stopwatch, but that is not
> very friendly and easy to forget to start.
>

Another possibility is to enhance -P option as -P sec, such that it
will display progress after ever 'sec' seconds.  Something like we
have for pgbench.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying the interface of UpdateMinRecoveryPoint

2016-07-13 Thread Amit Kapila
On Wed, Jul 13, 2016 at 9:19 AM, Michael Paquier
 wrote:
> Hi all,
>
> As of now UpdateMinRecoveryPoint() is using two arguments:
> - lsn, to check if the minimum recovery point should be updated to that
> - force, a boolean flag to decide if the update should be enforced or not.
> However those two arguments are correlated. If lsn is
> InvalidXlogRecPtr, the minimum recovery point update will be enforced.

Right.

> Hence why not simplifying its interface and remove the force flag?

One point to note is that the signature and usage of
UpdateMinRecoveryPoint() is same as it was when it got introduced in
commit-cdd46c76.  Now the only reasons that come to my mind for
introducing the force parameter was (a) it looks cleaner that way to
committer (b) they have some usecase for the same in mind (c) it got
have overlooked.  Now, if it got introduced due to (c), then your
patch does the right thing by removing it.  Personally, I feel
overloading the parameter for multiple purposes makes code less
maintainable, so retaining as it is in HEAD has some merits.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Constraint merge and not valid status

2016-07-13 Thread Amit Langote

Hi,

Consider a scenario where one adds a *valid* constraint on a inheritance
parent which is then merged with a child table's *not valid* constraint
during inheritance recursion.  If merged, the constraint is not checked
for the child data even though it may have some.  Is that an oversight?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-13 Thread Ashutosh Bapat
> Seems odd to me.  Both relations use the same user mapping as before, so
> the join should be pushed down.
>
> Another thing I noticed is that the code in plancache.c would invalidate a
> cached plan with a foreign join due to user mapping changes even in the
> case where user mappings are meaningless to the FDW.
>
> To fix the first, I'd like to propose (1) replacing the existing
> has_foreign_join flag in the CachedPlan data structure with a new flag, say
> uses_user_mapping, that indicates whether a cached plan uses any user
> mapping regardless of whether the cached plan has foreign joins or not
> (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2)
> invalidating the cached plan if the uses_user_mapping flag is true.  I
> thought that we invalidate the cached plan if the flag is true and there is
> a possibility of performing foreign joins, but it seems to me that updates
> on the corresponding catalog are infrequent enough that such a detailed
> tracking is not worth the effort.


That way we will have plan cache invalidations even when simple foreign
tables scans (not join) are involved, which means all the plans involving
any reference to a foreign table with valid user mapping associated with
it. That can be a huge cost as compared to the current solution where
sub-optimal plan will be used only when a user mapping is changed while a
statement has been prepared. That's a rare scenario and somebody can work
around that by preparing the statement again. IIRC, we had discussed this
situation when implementing the cache invalidation logic. But there's no
workaround for your solution.


> One benefit of using the proposed approach is that we could make the FDW's
> handling of user mappings in BeginForeignScan more efficient; currently,
> there is additional overhead caused by catalog re-lookups to obtain the
> user mapping information for the simple-foreign-table-scan case where user
> mappings mean something to the FDW as in postgres_fdw (probably, updates on
> the catalogs are infrequent, though), but we could improve the efficiency
> by using the validated user mapping information created at planning time
> for that case as well as for the foreign-join case.  For that, I'd like to
> propose storing the validated user mapping information into ForeignScan,
> not fdw_private.


postgres_fdw to fetches user mapping in some cases but never remembers it.
If what you are describing is a better way, it should have been implemented
before join pushdown was implemented. Refetching a user mapping is not that
expensive given that there is a high chance that it will be found in the
syscache, because it was accessed at the time of planning. Effect of plan
cache invalidation is probably worse than fetching the value from a sys
cache again.


> There is a discussion about using an ExtensibleNode [1] for that, but the
> proposed approach would make the FDW's job much simpler.
>
> I don't think the above change is sufficient to fix the second.  The root
> reason for that is that since currently, we allow the user mapping OID
> (rel->umid) to be InvalidOid in two cases: (1) user mappings mean something
> to the FDW but it can't get any user mapping at planning time and (2) user
> mappings are meaningless to the FDW, we cannot distinguish these two cases.


The way to differentiate between these two is to look at the serverid. If
server id is invalid it's the case 1, otherwise 2. For a simple table,
there will always be a serverid associated with it. A user mapping will
always be associated with a simple table if there is one i.e. FDW requires
it. Albeit, there will be a case when FDW requires a user mapping but it's
not created, in which case the table will be useless for fetching remote
data anyway. I don't think we should be programming for that case.


> So, I'd like to introduce a new callback routine to specify that user
> mappings mean something to the FDW as proposed by Tom [2], and use that to
> reject the former case, which allows us to set the above uses_user_mapping
> flag appropriately, ie, set the flag to true only if user mapping changes
> require forcing a replan.  This would change the postgres_fdw's behavior
> that it allows to prepare statements involving foreign tables without any
> user mappings as long as those aren't required at planning time, but I'm
> not sure that it's a good idea to keep that behavior because ISTM that such
> a behavior would make people sloppy about creating user mappings, which
> could lead to latent security problems [2].
>
> Attached is a proposed patch for that.
>
>
This routine is meaningless unless the core (or FDW) does not allow a user
mapping to be created for such FDWs. Without that, core code would get
confused as to what it should do when it sees a user mapping for an FDW
which says user mappings are meaningless. If we do that, we might not set
hasForeignJoin flag in create_foreignscan_plan() when the user mapping for
pushed down join is 

Re: [HACKERS] pgbench - compute & show latency consistently

2016-07-13 Thread Fabien COELHO


Hello Peter,


 number of transactions per client: 1000
-latency average = 15.844 ms
+latency average: 15.844 ms
 tps = 618.764555 (including connections establishing)


I think what you have here is that colons separate input parameters and
equal signs separate result output.  So I think it's OK the way it is.


Hmmm... Then other measures displayed are not all consistent with this 
theory.


Also there is still the bug under -t which displays a 0 latency.

The attached patch still fixes that and make it consistent the other way 
around, i.e. by using "=" for latency. I switched to use ":" for weight 
which is an input parameter. I let ":" when there is a long sentence to 
describe the figure displayed, more on aesthetical grounds.



Maybe a better improvement would be introducing section headings like
"test parameters" and "test results".


This would add more lines to the report, to sure how desirable it is.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 87fb006..af1169a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3291,9 +3291,9 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", >latency);
 	else
-		/* only an average latency computed from the duration is available */
-		printf("latency average: %.3f ms\n",
-			   1000.0 * duration * nclients / total->cnt);
+		/* no measure, show average latency computed from run time */
+		printf("latency average = %.3f ms\n",
+			   1000.0 * time_include * nclients / total->cnt);
 
 	if (throttle_delay)
 	{
@@ -3319,7 +3319,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		{
 			if (num_scripts > 1)
 printf("SQL script %d: %s\n"
-	   " - weight = %d (targets %.1f%% of total)\n"
+	   " - weight: %d (targets %.1f%% of total)\n"
 	   " - " INT64_FORMAT " transactions (%.1f%% of total, tps = %f)\n",
 	   i + 1, sql_script[i].desc,
 	   sql_script[i].weight,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - minor fix for meta command only scripts

2016-07-13 Thread Fabien COELHO


The attached patch provides a solution which ensures the return in the right 
condition and after the stat collection. The code structure requires another 
ugly boolean to proceed so as to preserve doing the reconnection between the 
decision that the return must be done and the place where it can be done, 
after reconnecting.


Ooops, the attached patched was the right content but wrongly named:-(

Here it is again with a consistent name.

Sorry for the noise.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 87fb006..8c5df14 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1766,7 +1766,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 {
 	PGresult   *res;
 	Command   **commands;
-	bool		trans_needs_throttle = false;
+	bool		trans_needs_throttle = false,
+return_before_next_trans = false;
 	instr_time	now;
 
 	/*
@@ -1849,6 +1850,8 @@ top:
 
 	if (st->listen)
 	{			/* are we receiver? */
+		bool listened_a_meta = commands[st->state]->type == META_COMMAND;
+
 		if (commands[st->state]->type == SQL_COMMAND)
 		{
 			if (debug)
@@ -1892,6 +1895,7 @@ top:
 			/*
 			 * Read and discard the query result; note this is not included in
 			 * the statement latency numbers.
+			 * Should this be done before recording the statement stats?
 			 */
 			res = PQgetResult(st->con);
 			switch (PQresultStatus(res))
@@ -1913,6 +1917,7 @@ top:
 		{
 			if (is_connect)
 			{
+/* Should transaction stats recorded above count this time? */
 PQfinish(st->con);
 st->con = NULL;
 			}
@@ -1942,12 +1947,17 @@ top:
 			 * listen back to true.
 			 */
 			st->listen = false;
+
+			if (listened_a_meta)
+return_before_next_trans = true;
+
 			trans_needs_throttle = (throttle_delay > 0);
 		}
 	}
 
 	if (st->con == NULL)
 	{
+		/* Why is connection time is out of transaction time stats? */
 		instr_time	start,
 	end;
 
@@ -1969,6 +1979,10 @@ top:
 		memset(st->prepared, 0, sizeof(st->prepared));
 	}
 
+	/* ensure that meta-only scripts sometimes return */
+	if (return_before_next_trans)
+		return true;
+
 	/*
 	 * This ensures that a throttling delay is inserted before proceeding with
 	 * sql commands, after the first transaction. The first transaction

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - minor fix for meta command only scripts

2016-07-13 Thread Fabien COELHO


Hello Tom,


Ok. Here is an updated version, with a better suffix and a simplified
comment.


Doesn't this break the handling of latency calculations, or at least make
the results completely different for the last metacommand than what they
would be for a non-last command?  It looks like it needs to loop back so
that the latency calculation is completed for the metacommand before it
can exit.  Seems to me it would probably make more sense to fall out at
the end of the "transaction finished" if-block, around line 1923 in HEAD.


Indeed, it would trouble a little bit the stats computation by delaying 
the recording of the end of statement & transaction.


However line 1923 is a shortcut for ending pgbench, but at the end of a 
transaction more stuff must be done, eg choosing the next script and 
reconnecting, before exiting. The solution is more contrived.


The attached patch provides a solution which ensures the return in the 
right condition and after the stat collection. The code structure requires 
another ugly boolean to proceed so as to preserve doing the reconnection 
between the decision that the return must be done and the place where it 
can be done, after reconnecting.



(The code structure in here seems like a complete mess to me, but probably
now is not the time to refactor it.)


I fully agree that the code structure is a total mess:-( Maybe I'll try to 
submit a simpler one some day.


Basically the doCustom function is not resilient, you cannot exit from 
anywhere and hope that re-entring would achieve a consistent behavior.


While reading the code to find a better place for a return, I noted some 
possible inconsistencies in recording stats, which are noted as comments 
in the attached patch.


Calling chooseScript is done both from outside for initialization and from 
inside doCustom, where it could be done once and more clearly in doCustom.


Boolean listen is not reset because the script is expected to execute 
directly the start of the next statement. I succeeded in convincing myself 
that it actually works, but it is unobvious to spot why. I think that a 
simpler pattern would be welcome. Also, some other things (eg prepared) 
are not reset in all cases, not sure why.


The goto should probably be replaced by a while.

...

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 87fb006..8c5df14 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1766,7 +1766,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 {
 	PGresult   *res;
 	Command   **commands;
-	bool		trans_needs_throttle = false;
+	bool		trans_needs_throttle = false,
+return_before_next_trans = false;
 	instr_time	now;
 
 	/*
@@ -1849,6 +1850,8 @@ top:
 
 	if (st->listen)
 	{			/* are we receiver? */
+		bool listened_a_meta = commands[st->state]->type == META_COMMAND;
+
 		if (commands[st->state]->type == SQL_COMMAND)
 		{
 			if (debug)
@@ -1892,6 +1895,7 @@ top:
 			/*
 			 * Read and discard the query result; note this is not included in
 			 * the statement latency numbers.
+			 * Should this be done before recording the statement stats?
 			 */
 			res = PQgetResult(st->con);
 			switch (PQresultStatus(res))
@@ -1913,6 +1917,7 @@ top:
 		{
 			if (is_connect)
 			{
+/* Should transaction stats recorded above count this time? */
 PQfinish(st->con);
 st->con = NULL;
 			}
@@ -1942,12 +1947,17 @@ top:
 			 * listen back to true.
 			 */
 			st->listen = false;
+
+			if (listened_a_meta)
+return_before_next_trans = true;
+
 			trans_needs_throttle = (throttle_delay > 0);
 		}
 	}
 
 	if (st->con == NULL)
 	{
+		/* Why is connection time is out of transaction time stats? */
 		instr_time	start,
 	end;
 
@@ -1969,6 +1979,10 @@ top:
 		memset(st->prepared, 0, sizeof(st->prepared));
 	}
 
+	/* ensure that meta-only scripts sometimes return */
+	if (return_before_next_trans)
+		return true;
+
 	/*
 	 * This ensures that a throttling delay is inserted before proceeding with
 	 * sql commands, after the first transaction. The first transaction

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Documentation fix for CREATE FUNCTION

2016-07-13 Thread Albe Laurenz
I just noticed that the documentation for CREATE FUNCTION still mentions
that the temporary namespace is searched for functions even though that
has been removed with commit aa27977.

Attached is a patch to fix that.

Yours,
Laurenz Albe


0001-Fix-mention-of-pg_temp-in-CREATE-FUNCTION-documentat.patch
Description: 0001-Fix-mention-of-pg_temp-in-CREATE-FUNCTION-documentat.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dumping database privileges broken in 9.6

2016-07-13 Thread Michael Paquier
On Wed, Jul 13, 2016 at 5:18 AM, Stephen Frost  wrote:
> Attached is a patch to address this.
>
> After much consideration and deliberation, I went with the simpler
> solution to simply dump out the database privileges based on what a new
> creation of those privileges would yield, resulting in output similar to
> pre-9.6.  We document that template1 is allowed to be dropped/recreated,
> which greatly complicates using pg_init_privs to record and produce a
> delta against the initdb-time values, as we lose the connection between
> pg_init_privs and the "template1" database as soon as it is dropped
> (something which can't be done with objects in that catalog).

+"(SELECT pg_catalog.array_agg(acl) FROM (SELECT
pg_catalog.unnest(coalesce(datacl,pg_catalog.acldefault('d',datdba)))
AS acl "
+"EXCEPT SELECT
pg_catalog.unnest(pg_catalog.acldefault('d',datdba))) as foo)"
+"AS datacl,"
+"(SELECT pg_catalog.array_agg(acl) FROM (SELECT
pg_catalog.unnest(pg_catalog.acldefault('d',datdba)) AS acl "
+"EXCEPT SELECT
pg_catalog.unnest(coalesce(datacl,pg_catalog.acldefault('d',datdba
as foo)"
+"AS rdatacl,"
It took me some time to understand that those are the GRANT and REVOKE
ACLs separated into two columns to get advantage of buildACLCommands..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers